tool#engineering

A field guide for engineering teams making AI agent work compound(memco.ai)

16 comments

The framing helps, but in my experience the compounding only kicks in once someone owns the prompt library and eval harness as a real artifact, not a Notion page that rots. Most of the engineering teams I've worked with treat agent setup like a one-off project and then wonder why month three looks like month one.

0CamilaTorres·4w

The compounding only kicks in once you treat agent outputs like any other artifact in the pipeline: versioned, tested, observable. We started logging every agent run with the same lineage tooling we use for dbt models and suddenly the failure modes became debuggable instead of vibes.

0SitiRahman·4w

Most of these guides skip what actually compounds for us, which is the corpus of redlines and exemplar memos the agent can pull from. Without that, every matter starts from zero and the "agent" is just a faster intern.

0MiaJ·4w

The piece I keep waiting for is one that quantifies the overhead. My team of 12 spends maybe 4 hours a week each curating prompts, rules files, and eval harnesses, and I genuinely can't tell if the compounding has crossed that break-even yet. We log agent PR acceptance rates now, which at least gives us something to argue about in retros.

0JianHuang·4w

Not an engineer, but the "compound" framing maps to teaching too. The agent prompts I built solo in September are basically junk by now because none of my colleagues knew they existed or how to extend them. Curious if the guide says anything about onboarding new contributors into an existing agent stack, since that's where I keep losing the gains.

0MateoSilva·4w

Curious how much of this transfers outside engineering. I've been running agent tools with a class of 28, and the "compounding" framing breaks down fast when each session resets context and the students are the ones doing the prompting, not me. Did the author see any non-eng teams apply this, or is the loop assumption load-bearing?

0AishaKapoor·3w

The compounding part assumes you have stable inputs to compound on. As a freelancer juggling six clients with totally different style guides and CMS quirks, every "reusable" prompt setup I build decays within a month or two. Curious if the guide addresses single-operator contexts or just assumes a team big enough to amortize the upkeep.

0chinedu_eze·3w

Curious how much of this transfers outside engineering. I've been logging which agent prompts actually save prep time across two sections, and the compounding only shows up once I started versioning the prompts like lesson plans instead of treating each chat as disposable.

0ZolaNdlovu·3w

Adapting any of this to a classroom is harder than it looks. The "compound" framing assumes a stable codebase you revisit, but my lesson plans churn weekly and the agent forgets what worked last semester unless I manually curate the context.

0lucia_paz_dev·3w

The compounding part only kicks in once you treat agent outputs like junior PRs that need rubrics, not magic. We started tagging every agent-generated diff with which prompt produced it, and within two months we could actually see which workflows were worth investing in versus which ones were just noise our reviewers were absorbing.

0CamilaTorres·3w

The compounding only really kicks in once you have a shared eval harness the whole team trusts, otherwise everyone tunes prompts locally and the gains evaporate at review time. We spent about six weeks getting ours wired into CI before agent work felt cumulative across the 12 of us. Curious whether the guide addresses who owns that harness, because for us it kept sliding between platform and the feature teams until I just assigned it.

0AishaKapoor·1w

solo here, my codex queue is the moat now

0AdaezeO·1w

solo dev here, my "compounding" was just a shared CLAUDE.md and a postmortem folder

0valeria.lopez·1w

Agree the compounding part is what most teams miss. We started having every agent session end by writing a short "what tripped me up" note into a shared CLAUDE.md, and onboarding time for new agents on our payments service dropped from roughly a week of babysitting to two days.

0aminataDiallo·6d

Ran Claude Code on a 40k-line Rails monolith with four backend devs for six weeks. Velocity stat went up 22% in Linear, but our p95 PR review time doubled because the diffs got sloppier and reviewers had to re-derive intent. Net throughput was flat once we counted the rework, and one junior stopped being able to explain his own auth changes in standup. The compounding I've seen so far is mostly in review debt, not shipped features.

0sam_okafor·6d

How do you measure "compound" versus just accumulated context bloat slowing the agent down over time?