Mission Control

Artifacts

K
← Back to artifacts

@bpizzacalla agentic business playbook

OtherDraftCreated Apr 16, 202618 min readFull screen ↗

@bpizzacalla agentic business playbook

Confidence: Moderate-high. This reconstruction is grounded mainly in Brandon Pizzacalla's X posts, replies, threads, and quote-tweets from Mar 30, 2026 through Apr 15, 2026 Pacific time. Confidence would move up if Pete could inspect one real internal agent spec, one evaluator config, and one monitoring dashboard. Confidence would move down if the public posts are mostly promotional and the internal systems are much thinner than described.

1. CORE PHILOSOPHY

Findings summary: Pizzacalla is not selling "better prompting." He is building repeatable business harnesses: a domain constitution, boring memory files, explicit tools and data access, separate evaluators, and scheduled or event-driven runtime. His public claim is that the model is a replaceable component; the durable advantage is the surrounding harness and the documentation that feeds it (Apr 13, 2026, Apr 15, 2026, Mar 31, 2026).

Direct evidence

  • He states the thesis plainly: "the model is a component. the harness is the product" (Apr 13, 2026).
  • He says the same architecture keeps reappearing across coding, content, sales, renewals, and community work, which tells you he thinks in reusable operating systems, not one-off demos (Apr 15, 2026).
  • He repeatedly says each successful agent starts with a document that defines what good looks like before execution: "a constitution," a coding taste doc, a brand config, or a sales persona based on a real BDR's speech patterns (Apr 15, 2026).
  • He treats pre-existing documentation as leverage. His remote-company documentation habit "accidentally turned into our biggest AI advantage" because agents could inherit context that was already written down (Mar 31, 2026).
  • He frames the managerial shift as replacing a hiring reflex with a harness reflex: "we need to spin up an agent for this" instead of "we need to hire for this" (Apr 9, 2026).
  • He thinks the useful category is teammate, not assistant. His agents monitor channels, email, and calendars without being asked, which is his threshold for real autonomy (Apr 6, 2026, Apr 14, 2026).

Inference

  • His real unit of leverage is institutionalized context plus enforcement, not raw model IQ. The public language around "agent" is almost always paired with docs, configs, checks, rubrics, or triggers, not with a magic prompt (Apr 15, 2026, Apr 14, 2026, Apr 8, 2026).
  • He appears to view agent adoption as org design, not just software adoption. The recurring metaphors are BDR, COO, chief of staff, renewals, support, and teammate, which suggests he maps AI to roles with accountabilities, inputs, and review loops (Apr 2, 2026, Apr 1, 2026, Apr 15, 2026).

Explicit contradictions and tensions

  1. Harness over model, but model still matters at the action threshold. He says the harness matters more than the model (Apr 13, 2026), and says he rebuilt context files to be model-agnostic so a cheap local model beat Opus on half their tasks (Apr 3, 2026). But he also says a model swap changed behavior from passive analysis to actual autonomous action in email and ticket routing (Apr 13, 2026). Pete should read this as: optimize harness first, then test models where initiative and judgment matter.
  2. "Jobs change shape" versus clear labor displacement. He says jobs "don't just vanish" and describes one displaced BDR moving to customer success (Apr 9, 2026). But he also says they stopped hiring for roles, replaced a BDR with an agent, replaced support work, and that headcount is not going back up (Mar 31, 2026, Apr 4, 2026, Mar 29, 2026). The humane version of his system still assumes labor substitution in narrow domains.
  3. "Same architecture everywhere" versus domain-specific guardrail density. He says the architecture keeps repeating across domains (Apr 15, 2026). He also admits coding tasks are meticulous while renewal-pricing judgments get rushed unless a separate evaluator with a strict rubric intervenes (Apr 15, 2026, Apr 14, 2026). Same skeleton, different risk profile.

2. THE SYSTEM

LayerWhat he appears to runEvidenceWhat Pete should copy
ConstitutionA task-specific source-of-truth doc that encodes taste, anti-patterns, quality bars, tone, and persona before any work starts."a constitution" plus taste doc, brand config, and sales persona (Apr 15, 2026); style guide, opinions, and tone baked into writing agent (Mar 31, 2026); coding harness config file with design taste and tech stack opinions (Apr 7, 2026)Write one domain file before you build anything: sales_persona.md, briefing_quality_bar.md, content_brand.md, coding_taste.md.
MemoryPlain-text memory, not database-first memory: personality doc, long-term memory doc, daily notes, and a curated memory file the agent maintains.markdown vault with daily notes, curated memory files, project context (Apr 2, 2026); personality doc, long-term memory doc, daily notes (Apr 3, 2026); flat markdown files plus vector search and one curated memory file (Apr 5, 2026)Use files first. Keep daily/, memory.md, projects/, and a distilled operating_context.md. Add retrieval only after the files exist.
Tools, skills, dataEach agent gets an explicit contract for what data it can see, what tools it can call, and what skills it should apply.He gives a concrete sales-agent recipe: CRM access, Read AI transcripts, read-only email; tools like analyze_email, analyze_pipeline, analyze_transcript; skills like sales_coach, sales_strategy, pipeline_review (Apr 2, 2026)Build a matrix for each agent: inputs, tools, allowed writes, skills, and review owner.
Manager agentA higher-level COO or chief-of-staff agent that decomposes work, spins up subagents, and sometimes customizes tooling.COO agent breaks down a task and spins up a team of role-based agents (Apr 2, 2026); AI chief of staff that spins up sub-agents and customizes its own tooling (Apr 1, 2026)Do not start here. Add a manager agent only after two or more specialists already work on their own.
Specialist agentsNarrow operators mapped to business roles: BDR, renewals, support L1, coding, content, email triage, calendar, meeting prep, grocery.BDR replacement (Mar 31, 2026); renewals agent flags at-risk accounts (Apr 15, 2026); support agent resolves about 65% of level 1 tickets (Apr 8, 2026); email triage and meeting prep agents (Apr 14, 2026)Pick one operator role with clean metrics, then copy the harness pattern into the next function.
Evaluator layerA separate evaluator agent, strict rubrics, no self-grading, and real end-to-end tests.Writer and evaluator must be split, not self-graded (Apr 14, 2026); evaluator fails ambiguity, blog evaluator scores four rubrics independently, coding evaluator uses Playwright browser tests (Apr 14, 2026); enforcement with 116 banned-phrase checks and multi-layer sprint contracts (Apr 15, 2026)Treat evaluator engineering as first-class work. If the evaluator is weak, the agent is fake.
RuntimeSchedules, webhooks, overnight runs, and always-on swarms.Cron schedules and webhook triggers through OpenClaw (Apr 14, 2026); 15 parallel agents overnight on the codebase (Apr 1, 2026); about 40 agents total, mostly on cheap models (Apr 7, 2026)First get the task working interactively. Then add a schedule, then add event triggers.
Monitoring and economicsReview workflows, cost awareness, drift detection, and failure review.24/7 code swarms need monitoring (Apr 7, 2026); model update silently broke email routing for a week (Apr 13, 2026); top line-item token spend after 3 months (Apr 7, 2026); bad agents burn the most tokens (Apr 6, 2026)Track per-agent cost, failure rate, escalation rate, and silent-break windows.

Supplemental external corroboration, not primary evidence: his public coding-agent-harness repo describes a Planner -> Generator -> Evaluator pipeline with retry loops, a single taste.md, and budget caps per stage, which matches the public X pattern but should not be assumed to equal every internal harness exactly (GitHub repo).

3. TECHNIQUES & PATTERNS

3.1 Constitution before execution

Direct evidence: He says every domain needs a document that defines what good looks like before work starts, and gives concrete examples: coding taste docs, brand configs, and sales personas (Apr 15, 2026). He also says writing quality jumped once style guide, opinions, and tone were baked into the system (Mar 31, 2026).

Pete implementation: Before building an agent, write a one-page constitution with five fields: objective, allowed tools, forbidden behaviors, quality bars, and examples of good output. If that page is weak, the agent will be weak.

3.2 Enforcement beats polite prompting

Direct evidence: He is explicit that "please stay on task" is not enforcement. His examples of enforcement are sprint contracts at three code layers, 116 banned-phrase checks, and rate limits enforced in SQLite (Apr 15, 2026).

Pete implementation: Put rules into code and validators. Use schema validation, banned-pattern scanners, write-permission boundaries, retry ceilings, rubrics, and explicit escalation conditions. Assume any instruction that is not machine-checked will be broken eventually.

3.3 Producer and critic must be separate

Direct evidence: His "worst bug" was an agent grading its own work for six weeks. He fixed it by splitting writer and evaluator, forcing browser-based tests for code, and using four independent rubrics for content, with outputs sent back unless they clear all thresholds (Apr 14, 2026).

Pete implementation: Never let the same agent generate and approve. For research, have one agent draft and another verify sources. For content, have a rubric agent. For code, use end-to-end tests plus a reviewer agent.

3.4 Briefs beat vibes

Direct evidence: He says the same model performed very differently when given a two-line prompt versus a full brief about what he wanted and why (Apr 8, 2026).

Pete implementation: Use a standard brief template: user, objective, context, constraints, non-goals, acceptance test, and escalation triggers. Pete should not start a serious agent task from chat shorthand.

3.5 Memory should be boring

Direct evidence: He says they use markdown files, not complex memory systems: personality doc, long-term memory doc, daily notes, curated memory files, and plain-text logs with vector search if needed (Apr 3, 2026, Apr 5, 2026, Apr 2, 2026).

Pete implementation: Create a memory stack with three layers only: raw logs, a curated memory file, and stable operating docs. Do not start with a database schema or elaborate RAG pipeline.

3.6 Treat scripts as scripts

Direct evidence: He says 7 of 10 backlog items they called "agents" were really just API calls plus LLM formatting and a Slack message. The genuinely autonomous ones took 10x longer and cost more (Apr 14, 2026).

Pete implementation: Maintain two columns in the backlog: script and agent. Most things should stay in script until they need memory, tool choice, initiative, or multi-step planning.

3.7 Cheap models do most of the work

Direct evidence: He says he runs about 40 agents and sends 90% of the work to small cheap models, with frontier models representing about 10% of token spend (Apr 7, 2026). He also says a model-agnostic harness let cheap local models beat Opus on part of the workload (Apr 3, 2026).

Pete implementation: Route routine classification, extraction, and formatting to cheap models. Save premium models for planning, ambiguous synthesis, and hard-edge cases.

3.8 Promote from copilot to autonomy only after the metric proves out

Direct evidence: He says they spent months in the copilot phase because they assumed the human had to stay in the loop, and that assumption cost them once the BDR agent outperformed the human (Mar 31, 2026). He also says one narrow support workflow hit about 65% autonomous resolution, which implies he picks bounded processes and then pushes depth, not breadth (Apr 8, 2026).

Pete implementation: Start with human-supervised runs. Once the agent clears a metric for two straight weeks, remove a step from the human loop. Do not grant autonomy all at once.

3.9 Schedules and triggers turn agents into teammates

Direct evidence: He says OpenClaw runs email triage at 6 a.m. and repo monitoring on PR webhooks (Apr 14, 2026). He also says his teammate agents monitor channels, email, and calendars without being asked (Apr 6, 2026).

Pete implementation: If Pete wants teammate behavior, he needs scheduled pulls and event-driven execution, not just an on-demand chat window.

4. BUILDING THE AGENTIC TEAM

The org chart he appears to be building

LayerRole in his systemEvidencePete analog
ManagerCOO or chief-of-staff agent that decomposes work and spawns specialistsCOO agent with role-based team (Apr 2, 2026); chief of staff that spins up sub-agents (Apr 1, 2026)Briefing orchestrator, job-search orchestrator, AMRT ops orchestrator
RevenueBDR, renewals, sales coaching, pipeline reviewBDR replacement and booking lift (Mar 31, 2026); renewals agent finding $2M at-risk accounts (Apr 15, 2026); sales agent framework with tools, skills, data (Apr 2, 2026)Job outreach agent, networking follow-up agent
ServiceSupport triage and customer workflows65% of L1 support resolved autonomously (Apr 8, 2026)Inbox triage, volunteer comms triage
ProductionCoding harness and code swarms50+ commits/day from code swarms (Apr 4, 2026); open-source coding harness (Apr 7, 2026)Pete's local tools and automations
Personal opsEmail triage, meeting prep, calendar, groceryemail triage, meeting prep, shopping (Apr 14, 2026); 60% of prior work now runs on its own (Apr 14, 2026)Daily briefing, calendar prep, errands and outreach prep
Human layerHumans author context, review outputs, handle edge cases, and sometimes build the systems themselvessales team pushes code and sales reps set systems up themselves (Apr 6, 2026, Mar 30, 2026)Pete and one ops collaborator can author docs and review exceptions without hiring an ML team

What this means in practice

Direct evidence

  • He started small, then expanded into a "whole squad" across sales, support, and project tracking (Mar 30, 2026).
  • He treats org structure as quality infrastructure. Output quality improved when they gave agents a hierarchy and role separation instead of a flat swarm (Apr 2, 2026).
  • He believes C-suite leaders need hands-on builder fluency, not passive delegation (Apr 15, 2026).

Inference

  • His team design pattern is: human defines the playbook, manager agent decomposes, specialists execute, evaluator blocks low-quality work, human handles exceptions and architecture.
  • The human role shifts from doer to context author, system owner, and escalation point. He says he now spends more time writing context docs and reviewing output than he used to spend managing the human role the agent replaced (Apr 6, 2026).

How Pete should replicate the team design

  1. First hire no one. Replace one narrow, repetitive function with a harness before adding headcount. That matches Pizzacalla's repeated BDR example and keeps the learning loop tight (Mar 31, 2026).
  2. Give every agent a role, not a vague mission. BDR, renewals analyst, briefing coordinator, inbox triager, or research lead beats "general assistant" because it forces a tools and data contract (Apr 2, 2026).
  3. Have humans own docs and review queues. That is where the leverage moved in his system (Apr 6, 2026).
  4. Only add a manager agent after two specialists work reliably. Otherwise Pete will build orchestration before he has a stable unit of work.
  5. Train non-engineers to build. Pizzacalla's recurring point is that sales people and non-engineers can set this up once the harness pattern exists (Apr 8, 2026, Mar 30, 2026).

5. HIS "TELLS"

These are the repeated habits and word choices that reveal how he actually operates.

  1. He speaks in operating metrics, not vague hype. The repeated numbers are specific: 116 banned-phrase checks, 40 agents, 90% cheap models, 50+ commits per day, 200+ emails per day, 65% L1 support resolution, 0.17% of revenue for renewals, top-3 line-item token spend (Apr 15, 2026, Apr 7, 2026, Apr 4, 2026, Apr 13, 2026, Apr 8, 2026, Apr 13, 2026, Apr 7, 2026).
  2. He maps agents onto an org chart. He does not talk about generic assistants. He talks about BDRs, renewals, support, a COO, a chief of staff, and teammates (Apr 2, 2026, Apr 1, 2026, Apr 6, 2026).
  3. He anthropomorphizes only after autonomy shows up. He says "she booked more meetings" about the BDR agent, calls agents teammates and coworkers, and says you stop thinking of them as tools once they act without prompting (Mar 29, 2026, Apr 6, 2026, Apr 1, 2026). That is a tell that initiative, not chat quality, is his real standard.
  4. He prefers boring infrastructure over elegant theory. Markdown files, vector search, SQLite rate limits, Mac Minis, cron schedules, webhook triggers. This is not a RAG-first or framework-first worldview (Apr 5, 2026, Apr 15, 2026, Apr 3, 2026, Apr 14, 2026).
  5. He is obsessed with the ugly tail. He returns often to maintenance, model drift, silent failures, weird edge cases, and the 90-to-100% gap (Apr 13, 2026, Apr 15, 2026). That is the tell of somebody who has already been burned in production.
  6. He keeps reframing AI from feature to org capability. The recurring story is not "look at this demo." It is "our sales reps push code," "our sales guys set most of it up themselves," and "all C-suite should be building" (Apr 6, 2026, Mar 30, 2026, Apr 15, 2026).

6. REPLICATION ROADMAP

A practical rollout Pete can actually run

PhaseGoalConcrete deliverableWhy this matches Pizzacalla's system
Phase 1, 2 daysPick one workflow with a hard metricChoose weekly briefing packet or inbox triage. Define one success metric, such as minutes saved, escalation rate, or booked conversations.He repeatedly wins by going narrow first, BDR, support L1, briefing docs, email triage, not by boiling the ocean (Apr 8, 2026, Apr 15, 2026).
Phase 2, 1 dayWrite the constitutionCreate briefing_constitution.md with objective, tone, anti-patterns, required citations, escalation rules, and quality bars.His strongest repeated point is that good agents start with a constitution or taste doc (Apr 15, 2026).
Phase 3, 1 dayBuild boring memoryCreate personality.md, long_term_memory.md, daily_notes/, and curated_memory.md. Require the agent to update only the curated file after review.Matches his markdown-first memory pattern (Apr 3, 2026, Apr 5, 2026).
Phase 4, 2 daysWire tools, skills, dataFor the workflow, list exact sources, read or write permissions, and named tools. Example for briefing: browser read-only, web search, calendar read-only, docs read-only.He gives tools, skills, and data as the recurring framework (Apr 2, 2026).
Phase 5, 2 daysBuild executor and evaluator separatelyOne agent drafts, one agent checks citations, structure, and completeness. Reject ambiguous output automatically.This is his clearest production lesson: never self-grade (Apr 14, 2026).
Phase 6, 3 daysAdd enforcementAdd structural validators, banned-pattern checks, a retry cap, and a forced human escalation if citations or dates are missing.He says enforcement, not polite prompting, is what makes systems reliable (Apr 15, 2026).
Phase 7, 1 dayAdd schedule or triggerRun the agent on a morning schedule or on new-input events. Start with one run per day.This is how his agents become teammates instead of tools (Apr 14, 2026, Apr 6, 2026).
Phase 8, ongoingMonitor for tail failuresTrack cost, success rate, escalation rate, and silent failures after model or prompt changes.He keeps coming back to silent breakage, expensive bad agents, and the 90-to-100% tail (Apr 13, 2026, Apr 6, 2026, Apr 15, 2026).

The best first three agents for Pete

  1. Weekly briefing agent. This is the cleanest replication of Pizzacalla's pattern because the work is doc-heavy, citation-heavy, and easy to evaluate. Pete already asks for briefing packets. This is the closest thing to Pizzacalla's own briefing-doc example (Apr 15, 2026).
  2. Inbox triage and meeting prep agent. Pizzacalla keeps naming email triage, calendars, and meeting briefs as stable wins, which makes this a low-risk second build (Apr 14, 2026, Apr 14, 2026).
  3. Job-search or outreach agent. Pete's immediate business problem is pipeline generation. Pizzacalla's BDR example shows how much leverage sits in a single, repetitive, measurable revenue workflow (Mar 31, 2026, Apr 2, 2026).

One networking move Pete should make

Reach out to Ryan Wiggins. Pizzacalla explicitly highlights Wiggins's large document corpus as the part people are "going to sleep on," which aligns almost perfectly with the documentation-first memory strategy behind this whole system (Apr 15, 2026). Pete does not need a long call. One specific question is enough: *How did you choose what went into the corpus versus what stayed out?*

7. OPEN QUESTIONS & GAPS

  1. What exactly are the 116 banned-phrase checks? He cites the number often enough that it matters, but there is no public taxonomy of those checks, their failure rates, or whether they are regex, classifier, or evaluator-driven (Apr 15, 2026).
  2. What are the "three code layers" for sprint contracts? He references multi-layer enforcement but does not define whether the layers are prompt, application code, CI, tool wrapper, or evaluator gates (Apr 15, 2026).
  3. How do his multi-agent coding swarms avoid merge collisions and architecture drift? He publicly asks how people handle 6 to 8 coding agents in parallel on one repo, which suggests this is still an unsolved production issue for him too (Apr 13, 2026).
  4. How are permissions and security handled for browser, email, calendar, and phone calls? He says OpenClaw runs all of those surfaces for him, but there is little public detail on privilege separation or blast-radius control (Apr 9, 2026).
  5. How much is true autonomy versus strong automation? He himself says most backlog items are really scripts, not agents, while also describing teammate-like behavior. The exact line between automation and agency is still blurry in the public evidence (Apr 14, 2026, Apr 6, 2026).
  6. How representative is the public coding harness repo of the internal portfolio systems? The repo is useful, but there is not enough evidence to assume the internal sales, support, and renewals systems are implemented the same way.

8. EXTERNAL RESOURCES

These are supplemental. They are useful because they either come from Pizzacalla directly or line up tightly with the system he describes on X. They are not substitutes for the X evidence above.

  1. Pizzacalla's public coding harness repo: <https://github.com/srbdp/coding-agent-harness>. Useful because it exposes a public Planner -> Generator -> Evaluator pattern, a single taste.md, and retry budgets that fit his X claims about constitutions, evaluators, and harnesses.
  2. OpenClaw docs, agent harness SDK page: <https://docs.openclaw.ai/plugins/sdk-agent-harness>. Relevant because he says OpenClaw is the runtime layer for email, calendar, browser automation, calls, cron schedules, webhooks, and swarms (Apr 9, 2026, Apr 14, 2026).
  3. Claude Code routines announcement: <https://x.com/claudeai/status/2044095086460309790>. Useful as a contrast point because he frames it as native support for the cron and event-trigger model he already runs through OpenClaw (Apr 14, 2026).
  4. Playwright: <https://playwright.dev/>. Relevant because his coding evaluator reportedly tests through the real browser UI, not just unit tests (Apr 14, 2026).
  5. Ryan Wiggins's "Second Brain" thread or article. Track it down if Pete wants to copy the doc-corpus side of this system. Pizzacalla's comment suggests the document corpus, not just the model or wrapper, is the hidden leverage (Apr 15, 2026).