WINDAGs Blog

Technical deep dives into multi-agent orchestration, skill engineering, and the craft of building systems that learn.

Books teach what. People know when. We built tools for both.

May 10, 2026

The Reasoning Books Don't Have

Every team has the senior engineer who 'just knows.' That instinct doesn't fit in a doc — books can't write it, Wikipedia can't index it, and AI doesn't ship with it. We built two tools this weekend: one to pull it out of the people who have it, one to generate it when no one does.

Every engineering team has at least one of these people. The senior who glances at a graph and says "it's not the database, it's head-of-line blocking on the upstream queue." The PM who reads a feature spec in 20 seconds and points at the one constraint everyone missed. The designer who looks at a layout you've stared at for a week and says "the eyeline is wrong."

skillsexpertiseinterviewingbrainstormingknowledge-capture

Read full post

Skill Quality · Part 3

Plain Sonnet sounds confident. Skills make sure it's right — so you don't burn Saturday debugging a hallucination. Two judges from two vendors agreed.

April 28, 2026

Skills Actually Help: The Numbers

Plain Sonnet sounds confident. Sometimes it's confidently wrong — and you're the one debugging it Saturday afternoon. WinDAGs' Skill Graft pulls 4 senior specialists into Sonnet's context and lets the agent load deeper references on demand. Two judges from two vendors (Anthropic Opus 4.7, OpenAI gpt-5.5) read 50 prompts blind on a 5-criterion rubric. Both picked the grafted answer. Opus 4.7 picked it 70% of the time.

It's 9pm on a Tuesday. You've got two hours before bed and a Stripe webhook that double-charges customers under retries. You ask Sonnet. It answers — confidently, fluently, with a code block and everything. Sometimes it's confidently wrong, in the specific way that ships Friday and pages you Tuesday. That's the tax on confident-but-generic AI: not the hallucinations you catch, the ones you don't.

benchmarksskillsevaluationretrievalanthropic

Read full post

brew install curiositech/windags/windags. That's the install.

April 27, 2026

Graft Skills Into Any Agent: The WinDAGs MCP

One brew command. 551 skills. Your agent goes from generic to specialist on demand. The WinDAGs MCP exposes skill_search, skill_graft, and skill_reference to Claude Code, Claude Desktop, Cursor, Codex, and any MCP client. Zero API keys for the read-only tools.

It's Saturday. You're four hours into a side project. You ask your AI to "set up a Stripe webhook handler with idempotency and a retry policy that doesn't blow up under bursts." It writes you something that looks fine. It misses the signature verification, picks a backoff strategy that'll silently double-charge, and uses an API shape that was renamed in 2024.

mcpskillsclaudecursoragents

Read full post

Skill Quality · Part 2

Five stages. One pipeline. Every caller goes through it.

April 27, 2026

The Skill Matching Cascade: How WinDAGs Picks the Right Expert

We replaced keyword matching with a six-stage retrieval cascade — BM25, Tool2Vec, RRF fusion, cross-encoder rerank, local attribution k-NN, and cross-user global priors. Here's why each stage exists, what it costs, and how to tell which one is doing the work.

You ask your AI to "design a paginated GraphQL endpoint that won't fall over at scale." It reaches for the textbook answer — LIMIT and OFFSET. Which works fine, right up until your dataset hits a million rows and every page-2 request starts scanning the whole table.

skillsretrievalbm25tool2vecembeddings

Read full post

Skill Quality · Part 2

$0.14 per book. 88% compression. Zero decision-critical information lost.

March 25, 2026

How to Distill a Book Into a Skill

We turned 77 books and research papers into AI agent skills — at $0.14 per book. Here's the 3-pass pipeline, the 88% compression ratio, the five ways to screw it up, and what you get when it works.

Every team has the senior engineer who read the book. The one who can tell you, in twenty seconds, why you should not use that pattern, where the gotchas are, and which page the load-bearing argument starts on. They're an institutional asset. They're also a single point of failure — when they leave, the team relearns everything the hard way.

skillsknowledge-engineeringdistillationbooksmethodology

Read full post

Skill Quality · Part 1

80% concepts. 20% procedure. We measured it.

March 23, 2026

Why Declarative Knowledge Isn't Enough: The Procedural Gap in AI Agent Skills

We audited 469 AI agent skills and found 80% of the content is declarative knowledge — concepts, definitions, terminology. Only 20% is procedural — the decision trees, failure modes, and quality gates that let agents actually execute. Here's the cognitive science, the data, and the tools to fix it.

You roll out an AI tooling guide to the team. It's well-organized, well-written, has examples. Three weeks later, you watch a junior dev follow it step-by-step and produce something that compiles, passes the tests you wrote, and is quietly wrong in a way only a senior would catch on review. The guide explained what things are. It did not say how to decide what to do when the obvious answer is the wrong one.

skillscognitive-scienceknowledge-engineeringqualityevaluation

Read full post

Type /next-move. Get a parallelized execution plan. Accept, modify, or reject.

March 15, 2026

/next-move: Your AI Already Knows What's Next

We built a slash command that reads your project's git state, conversation context, and skill catalog — then predicts the highest-impact sequence of agents to run. Zero API cost. One command.

It's Tuesday at 9:42 AM. You open the laptop. The repo is right there. You stare at it.

skillsmeta-dagplanningnext-movedecision-making

Read full post

Skill Compression · Part 2

Less is more — especially when Claude already knows.

March 15, 2026

WinDAGSZip — Compressing Skills with Embeddings

A 23MB embedding model finds 25% of tokens are self-duplicating in code-heavy skills — for free, no API calls. An LLM judge finds another 20-40% overlaps with training data. We built the tool, measured everything across 10 skills, and shipped it.

It's late. You've been on a long agent session. Your agent took twelve seconds to think before answering an easy question and you watched the spinner. The thing is loaded with skill prompts — 200-line expert documents, dozens of them, every one of them eating context before the model can even read the question. And you start to wonder how much of those tokens is the skill teaching the model something it doesn't know, vs. the skill telling the model things it had already seen a thousand times during pretraining.

skillscompressionembeddingsevaluationrate-distortion

Read full post

What happens when two skill-improvers improve each other?

March 6, 2026

When Meta-Skills Collide

We built a cross-evaluation agent, pointed Anthropic's skill-creator and our skill-architect at each other, and recorded what happened. Real transcripts. Real diffs. Real analysis of what two different philosophies of skill-building value.

TL;DR: We ran two competing AI skill-evaluation tools against each other — Anthropic's open-source skill-creator and our skill-architect. Both violated their own rules. Neither caught what the other caught. Self-evaluation has a structural blind spot that cross-evaluation fills. The experiment converged to A- from both directions.

skillsmetaanthropicevaluationrecursion

Read full post

Skill Compression · Part 1

163 skills were missing the same section.

March 5, 2026

The 191-Skill Quality Pass

We graded every Claude Code skill in our library against a 10-axis rubric. 163 were missing the same section. We fixed all of them — with hand-crafted content, not templates. Here's the rubric, the data, and the one section every skill should have.

We have 191 Claude Code skills. They cover everything from Drizzle migrations to Jungian psychology, from drone inspection to wedding photography, from pixel art to HIPAA compliance.

skillsqualityevaluationmetaautomation

Read full post