Why Factory

AI can write code faster than ever. The hard part is no longer generation — it's trusting what was generated, governing how it was decided, and proving it works. Factory is the layer that does that, around whatever coding agent you use.

Where we sit

The AI software-development market is splitting into layers. Most money and noise is in code generation — and it’s commoditizing fast (Copilot, Cursor, Claude Code, Codex, Devin). Factory deliberately does not compete there. We sit in the layer around the coding agents:

GovernPFactory — plan with live-infra context + review gates + human approval

BuildAIFactory — spec-first execution; or delegate to any coding agent

VerifyTFactory — generate & grade tests on a 5-signal verdict

ObserveCFactory — thread, watch and steer the whole pipeline

The opportunity is real and growing: the AI code-tools market is roughly $8–10B in 2025–26, heading to $30–91B by 2033–35 (~25–28% CAGR), and the agentic slice grows fastest (~52% CAGR). But the value is migrating from “write the code” to “can I trust, govern and verify the code” — which is exactly the layer Factory occupies.

Honest note: Factory is an early, open project. We don’t claim market share or revenue — we claim a position: the trust, governance and observability layer for the agentic SDLC, built as composable open products.

The problem we solve

The 2026 data is blunt about where the pain is:

84% / 29%developers using AI coding tools vs those who trust their output (trust down from ~70% in 2023)

66%frustrated by AI output that's "almost right, but not quite"

45%say debugging AI-generated code takes longer than writing it

Aug 2 2026EU AI Act high-risk rules: logging, human oversight & audit become mandatory

The bottleneck has moved from writing to verifying and governing. That maps directly onto the family:

The pain	What addresses it
“Almost-right” code, rework, low trust	TFactory — tests graded on coverage delta, stability, mutation, lint & semantic relevance: meaningful tests, not a green bar
Ungoverned, context-blind planning (“shadow AI”)	PFactory — planning grounded in live cloud/Backstage context, review gates with citations, human approval before work is emitted
No view of what agents are doing across the SDLC	CFactory — one cockpit threading plan → code → test, with an advise-and-confirm copilot
Compliance: logging, human oversight, audit trail	The spine — human-approval gates, HMAC-anchored audit logs, completion-event records — the evidence the EU AI Act asks for

Where we’re going

Near term: finish the PARR spine (one shared correlation key — the GitHub issue — threading plan→code→test, a normalized event schema, a clean port map), then ship the CFactory cockpit in phases (read-only board → agentic copilot → advise-and-confirm actions → multi-tenant hardening). See the roadmap.

The bet: code generation will keep commoditizing; the durable advantage is being the governance + verification + observability layer that makes agentic development trustworthy and auditable. That’s where we’re investing.

Real-life scenarios

Illustrative — these show how a team would use Factory, not named customers.

Solo dev / tiny startup

A founder building an MVP wants AI speed without a trail of untested code. They run AIFactory to turn issues into branches, and TFactory to auto-generate and grade tests on every feature — so “done” means “verified,” not “it compiled.” Local/BYO models keep cost and data under control. AIFactory + TFactory

Scale-up · 10–40 engineers

Keep velocity as the team grows

Merge conflicts, inconsistent specs and “where is this feature?” chaos creep in. They add PFactory so every work item starts from a reviewed, context-grounded plan, and CFactory so the whole team sees plan→code→test on one board. The lead steers stuck work from the cockpit instead of chasing three dashboards. + PFactory + CFactory

Mid-market · platform team

Make the golden path the easy path

A platform team encodes standards once: PFactory reads their Backstage catalog and golden-path templates, grounding plans in real infrastructure and flagging drift. Engineers keep their existing editors — Factory plugs in over MCP — and the team delegates the coding phase to the agent they already pay for, while keeping governance and verification in-house. PFactory + MCP + delegation

Large regulated enterprise · bank / health

AI velocity that passes audit

Under the EU AI Act, AI-assisted delivery needs human oversight, logging and evidence on demand. Factory provides the controls: human-approval gates before code is emitted, HMAC-anchored audit logs and completion-event records, tenant isolation and SAML/SCIM, BYO / air-gapped LLMs with egress auditing, and a credential broker (Vault / cloud secret managers). CFactory gives risk and engineering one auditable view of every AI action. full family + governance spine

How we use LLMs & AI

Factory is model-agnostic by design — the intelligence is in the workflow, not a single model:

Claude Agent SDK at the core, with a multi-provider factory routing by model string across Claude, OpenAI, Gemini, Ollama, vLLM, Codex and the Copilot CLI.
Per-phase model selection — a heavy reasoning model to plan, a fast one to code or run QA — so you spend tokens where they matter.
Patterns over vibes — spec-first execution, review gates with citations, and TFactory’s 5-signal verdict turn raw model output into governed, verified work.
Enterprise controls — a LiteLLM gateway for per-org budgets, rate limits, allow-lists and PII-redacted audit logs; BYO / air-gapped models with an egress-audit badge for sensitive environments.

Integrating newcomers — and adding the next one

The AI tool landscape changes monthly. Factory is built so a new model or agent is a small adapter, not a rewrite. Four extension seams:

New model provider → a thin adapter in the provider factory; route to it by model string. No pipeline changes.
MCP interop → any MCP-aware editor or agent (Claude Code, Cursor, Continue, …) plugs into the control plane today — stdio locally, plus a remote HTTP+SSE server with scoped API keys.
Executor delegation → slot a new coding agent (e.g. GitHub Copilot Coding Agent, GitLab Duo) in as AIFactory’s builder while keeping governance and verification in the family.
BYO / local → point at your own endpoint through the credential broker (Vault, Azure Key Vault, AWS/GCP secret managers, sops/age).

When the next breakout model or agent ships, you adopt it by configuration — and keep the trust layer you already have.

Why choose us

We complement the coding agents you already use — we don’t ask you to replace them:

Layer	Tools in this layer	Factory's role
Code generation	Copilot, Cursor, Claude Code, Codex, Devin	Orchestrate & wrap them — AIFactory runs or delegates to them, spec-first and isolated
Spec / planning	Spec-Kit, Kiro, Tessl, BMAD	PFactory adds live-infra grounding, review gates with citations, and a human-approval gate
Test / QA	XBOW, Momentic, self-healing test tools	TFactory grades tests on a 5-signal verdict — meaningful coverage, not a green bar
Observability	LangSmith, Langfuse, AgentOps	CFactory threads the cross-product pipeline and adds advise-and-confirm control
Governance / audit	(mostly manual today)	The spine — human gates, audit logs, EU AI Act-ready evidence

The one-liner: Factory is the governance, verification and observability layer for the agentic SDLC — so you can move at AI speed and still trust, prove and audit what ships.

Meet the products → · See the roadmap → · Read the blog →

Sources

Market size: SkyQuest · Precedence Research · Gartner — enterprise AI coding agents. Trust / verification gap: Uvik · Sonar State of Code 2026 · DigitalApplied. EU AI Act: European Commission · artificialintelligenceact.eu. Agentic SDLC: HCLTech · Microsoft. Autonomous QA: AgentMarketCap.