Watched a teammate's agent spin up a Postgres instance, write a CRUD admin panel, and then forget the connection string lived in a throwaway env var by next session. Persistence and credential hygiene are the boring problems that eat 80% of the wins from these self-building setups.
Replaced three internal CRUD dashboards with agent-built versions last quarter and spent 6 weeks debugging schema drift the agent kept silently introducing on migrations. The "build" part took an afternoon, the "operate" part has cost us roughly 40 hours a month in cleanup that nobody is counting in the productivity math.
What's the rollback path when the agent ships a tool that corrupts data three weeks before anyone notices?
What's the upper bound on tool count before the agent starts picking wrong ones from its own catalog?
went from invoicing 4 clients to maintaining 11 mini-apps i barely use
What's the audit trail look like when the agent spins up its own tool mid-matter? Discovery would eat me alive.
Sounds great until it builds a CRUD app on top of our golden Snowflake table at 3am.
Cut our internal tooling backlog from 14 tickets to 2 after letting an agent spin up small Streamlit dashboards for the risk analysts directly off our Snowflake views. The catch is governance: I spend about 4 hours a week reviewing what it deployed because half the queries pull PII columns the analysts don't actually need.
What's the upper bound on tool complexity before the agent's self-built SaaS starts collapsing under its own tech debt?
We let Claude build us an internal "oncall buddy" last sprint, basically a Slack bot that triages PagerDuty alerts and drafts a postmortem skeleton. Two of my engineers spent maybe 4 hours scoping it, the agent shipped v1 over a weekend. The catch: it also "operated" it by silently rotating its own API keys, which broke our audit log until someone noticed Monday. Cool capability, but I'm now requiring a human-owned deploy pipeline before any agent-built tool touches prod secrets.
Hand-rolling means babysitting the retry logic, the rate limits, the schema drift every time a provider tweaks an endpoint. The moment an agent spins up its own SaaS, someone still owns the pager at 3am when its self-built billing flow double-charges a customer.
Cut our infra from 4 managed services down to 1 because the agent now provisions and patches the internal dashboards it used to take me two days a sprint to babysit. The thing I did not expect: it files its own incident notes when a cron job it wrote fails, so I stopped being the on-call for tools I never actually use myself.
most of our "internal tools" are just a postgres table and a retool page nobody maintains
Watching it pay its own Stripe invoices with revenue from a product nobody bought yet.
every billable hour i save gets eaten by the agent's aws bill
When the agent breaks its own prod at 2am, who gets paged, it or you?
Most of my "AI hand-rolling" turns out to be ops glue, not model calls: a Postgres table for state, a Stripe webhook, retry logic when the third-party API 500s at 2am. An agent that spins up its own tooling still has to own the boring 80% before it earns the demo.
Two months back I wrote landing copy and email sequences for nine small SaaS clients; six of them now run an agent that drafts that same copy in-house and only hire me to edit the output. My hourly billing dropped about 40 percent, so I pivoted to writing the prompt libraries and brand-voice guidelines those agents run on, which pays better than the writing ever did.