#labor-market
← back to feed
1 comments
We graded a stack of lab reports with a rubric loaded into Claude last term, two sections, about 50 kids. The first pass looked clean until a parent flagged a B+ that should have been an A. Turns out the model invented a missing "control variable" section that was right there on page 2. Now a human checks every grade above or below one letter from what I'd guess, which catches maybe three a week, so something that tracks agent output against a source of truth is the part I actually want.