What WinDAGs is, in code.
One page. Subsystems with file links, real numbers, what's shipped vs in flight, open questions. If you'd rather read code than copy, start here.
Numbers
Each row is a number I can produce on demand from the repo or a one-liner. No vanity metrics — only ones a reviewer can verify.
Subsystems
Every row links to the file in the repo. Status reflects what's actually running in the bundled plugin and the desktop app today, not what's designed.
DAG executor
SHIPPEDTopologically sorts subtasks into waves, runs each wave's nodes as parallel claude -p processes, propagates aborts via SIGTERM → SIGKILL. Zero token overhead per node.
Skill retrieval cascade
SHIPPEDSix stages: BM25 → Tool2Vec → RRF fusion → cross-encoder rerank → local attribution k-NN → cross-user global priors. One scoring path; the DAG executor, MCP server, slash skill, and shortlister all call SkillSearchService.search(). The cascade is what gets the agent's first response to look like the one a senior engineer would have written.
Tool2Vec cache generator
SHIPPEDPre-computes per-skill task-space vectors. For each skill: ask Haiku (or gpt-4o-mini / gemini-2.5-flash, whichever key is present) for 15 diverse task descriptions; embed each via Xenova/all-MiniLM-L6-v2 (384d, local Transformers.js) — same model used at runtime so the cosine is honest; average the 15 vectors; L2-normalize. Provider chain (Anthropic → OpenAI → Gemini) auto-falls-through on quota/auth errors. .env.local autoload up the directory tree.
MCP server (bundled)
SHIPPEDSelf-contained MCP server distributed with the windags-skills plugin. As of v2.8.0 it exposes 9 tools: the four retrieval primitives (windags_skill_search/graft/reference/history with the full local cascade — BM25 + MiniLM + RRF + cross-encoder + per-user attribution k-NN + cross-user global priors) plus five planner-grade tools (skill_search_batch and skill_graft_batch for N-in-one round-trips, node_requirements with provider-native model IDs, validate_dag for schema-checking before save, and estimate_cost for planning-time cost surfacing). Reads are local + cheap aggregates from api.windags.ai (no API key); writes are anonymized telemetry, opt-out via WINDAGS_TELEMETRY=off.
Cost gate + human-in-the-loop
SHIPPEDCostCalculator returns a per-node + per-wave estimate before execution. The onCostEstimate callback in DAGExecutorOptions can return false to abort. PhaseOrchestrator forwards the gate to each phase's inner executor.
Attribution + skill sharpening
PARTIALFive orthogonal signals per graft (completion, accept, downstream rating, self-confidence, shipped) keyed by query embedding in ~/.windags/skill-state.db. Stage 1 (deterministic Floor + Wall + Envelope) runs after every node. Stage 2 (Haiku-based eval) is wired but the haikuCallFn isn't configured by default.
Streaming execution events
SHIPPEDExecutionEvent → DAGStateEvent via event-mapper. WebSocket server with rooms, buffering, heartbeat, sequence numbers + replay. useDAGStateEvents hook dispatches to a Zustand reducer. Handles node_status_change, wave_transition, cost_update, human_gate_waiting, evaluation, mutation.
Topology dispatch
PARTIALSeven topologies built: DAG, Team Loop, Swarm, Blackboard, Team Builder, Recurring, Workflow. Core executors exist for all seven. /api/execute today only dispatches DAG; rolling out the others is gated on UI controls.
Workers API (api.windags.ai)
IN FLIGHTCloudflare Worker over D1 FTS5. Implements /v1/graft today (server-side BM25, returns full skill bodies). Tool2Vec via KV + Workers AI cross-encoder is the next-release plan — eliminates the 80 MB local model dependency entirely.
What's next
Concrete commitments, not aspirations. Each row corresponds to a specific commit or release.
| Surface | What |
|---|---|
| Plugin v2.9.0 | /next-move in the MCP — prompt capability + a headless `windags_run_pipeline` tool so non-Claude clients can invoke the full 5-agent meta-DAG without owning a slash-command runtime. |
| Workers API | Optional Workers AI rerank as a fallback when the local ONNX bundle isn't installed; same scores, different supply chain. |
| Bench | Vanilla Sonnet 4.7 vs Sonnet + WinDAGs on a 50-prompt held-out set; report at /engineering/benchmarks. |
| Eval | Haiku-based Stage 2 evaluator; needs a default haikuCallFn wired into createQualityHooks(). |
Open questions
Things I haven't resolved. If you have an opinion on any of these, I'd actually like to hear it.
- Cross-encoder rerank served from Workers AI vs. shipping a signed local ONNX bundle via brew tap — same UX, very different supply-chain story.
- How much the global-priors blend (Stage 6) actually changes cascade picks once the network has volume. Today the blend weight is capped at 0.2 and decomposed across three nested tiers (manifest / exact-version / any-version). The empirical question: does the cross-user signal beat the local k-NN often enough to justify the 6h cache + adaptive refresh, or do most installs converge to picks that local k-NN would have made anyway?
- How much the Tool2Vec cache drifts as the catalog evolves. Content-hash invalidation works per-skill; the empirical question is how often skills get edited enough to invalidate.
- Whether the bandit framing (deleted) was actually wrong, or whether we just hadn't built enough attribution data to make it work. Current bet: it was a category error. Open to being shown otherwise.
Built by Curiositech. Source available under BSL 1.1. Public catalog at curiositech/windags-skills.