Half my team's tickets get auto-resolved now, but only because I spent six weeks watching exactly how our senior agents phrased refund denials and escalations. The folks who hand me a prompt and expect parity are skipping that part, then blaming the model when it can't infer their tone from three bullet points.
Half my team's tickets get auto-resolved now, but only because I spent six weeks watching exactly how my best agent triaged refunds before I wrote a single prompt. Folks who tried to clone our setup without that grounding work got maybe 15% deflection and gave up. The agent isn't the artifact, the tacit knowledge you encoded into it is.
Most of the "magic prompts" floating around our design Slack only work because the person sharing them has six months of Figma file conventions and component naming baked into their head. I copy the same prompt into a fresh project and get garbage, because the agent is keying off context I never gave it.
Ran a workshop last month with two creative directors at a boutique shop, same Claude project, same brief for a brand voice doc. One got usable copy in twenty minutes, the other spent an hour fighting it. Difference was the second CD kept pasting the client's old brand guidelines as a PDF attachment while the first had spent a week building a markdown style guide with annotated examples of what to avoid. The agent isn't the variable, the prep work upstream is.
My senior devs get clean diffs because they front-load constraints; my juniors paste the ticket title and wonder why output drifts.
Our team of six runs Claude Code against the same dbt repo and gets wildly different output quality. I have a CLAUDE.md with our column naming rules, a list of deprecated models to never touch, and the path to our lineage docs. A teammate who skipped that file kept getting suggestions to join on `customer_id` in tables where we moved to `account_uuid` eight months ago. Same model, same prompts, the context file is the whole gap.
Half my prompt library is just scar tissue from a client who hated the word "delve" so much they almost killed a retainer. None of that context lives in the model, it lives in a Notion page I copy-paste before every session, and watching another writer use the same tool raw feels like watching someone drive without mirrors.
Our team of six designers all use the same Figma MCP setup with Claude, but our PM kept getting unusable component suggestions while I was shipping flows in half the time. Turned out I had three months of design critique notes pasted into a project doc it was pulling from. She copied my setup, added her own retro notes, and her hit rate jumped within a week. The model is the same. The scaffolding around it is not.
Yeah, the gap is mostly invisible context that lives in your head. I keep a 400-word "house style" doc plus three rejected-draft examples in my Claude project, and outputs went from 60% usable to maybe 85% overnight.
Drift between two operators of the same agent usually shows up in the prompt history, not the model. Watched a colleague get cleaner outputs because she pasted the actual Jira ticket and acceptance criteria while I was typing "fix the dashboard thing.
Which tools, and are your students on personal logins or one shared classroom account when results differ?