WINDAGs Blog

Technical deep dives into multi-agent orchestration, skill engineering, and the craft of building systems that learn.

Skill Quality · Part 1

80% concepts. 20% procedure. We measured it.

Why Declarative Knowledge Isn't Enough: The Procedural Gap in AI Agent Skills

We audited 469 AI agent skills and found 80% of the content is declarative knowledge — concepts, definitions, terminology. Only 20% is procedural — the decision trees, failure modes, and quality gates that let agents actually execute. Here's the cognitive science, the data, and the tools to fix it.

Your AI agent skills are probably full of declarative knowledge — "what things are" — and starved of procedural knowledge — "how to decide what to do." We proved it by auditing 469 skills against a rubric grounded in 70 years of cognitive science. The results:

skillscognitive-scienceknowledge-engineeringqualityevaluation
Read full post

Type /next-move. Get a parallelized execution plan. Accept, modify, or reject.

/next-move: Your AI Already Knows What's Next

We built a slash command that reads your project's git state, conversation context, and skill catalog — then predicts the highest-impact sequence of agents to run. Zero API cost. One command.

You're deep in a session. You've been refactoring auth, writing tests, fixing edge cases. You pause. What should I do next?

skillsmeta-dagplanningnext-movedecision-making
Read full post
Skill Compression · Part 2

Less is more — especially when Claude already knows.

WinDAGSZip — Compressing Skills with Embeddings

A 23MB embedding model finds 25% of tokens are self-duplicating in code-heavy skills — for free, no API calls. An LLM judge finds another 20-40% overlaps with training data. We built the tool, measured everything across 10 skills, and shipped it.

In The 191-Skill Quality Pass, we graded every skill in our library against a 10-axis rubric and fixed the universal gap (163 skills were missing output contracts). That pass made skills better. This one makes them smaller.

skillscompressionembeddingsevaluationrate-distortion
Read full post

What happens when two skill-improvers improve each other?

When Meta-Skills Collide

We built a cross-evaluation agent, pointed Anthropic's skill-creator and our skill-architect at each other, and recorded what happened. Real transcripts. Real diffs. Real analysis of what two different philosophies of skill-building value.

TL;DR: We ran two competing AI skill-evaluation tools against each other — Anthropic's open-source skill-creator and our skill-architect. Both violated their own rules. Neither caught what the other caught. Self-evaluation has a structural blind spot that cross-evaluation fills. The experiment converged to A- from both directions.

skillsmetaanthropicevaluationrecursion
Read full post
Skill Compression · Part 1

163 skills were missing the same section.

The 191-Skill Quality Pass

We graded every Claude Code skill in our library against a 10-axis rubric. 163 were missing the same section. We fixed all of them — with hand-crafted content, not templates. Here's the rubric, the data, and the one section every skill should have.

We have 191 Claude Code skills. They cover everything from Drizzle migrations to Jungian psychology, from drone inspection to wedding photography, from pixel art to HIPAA compliance.

skillsqualityevaluationmetaautomation
Read full post