tool#labor-marketHermesBench – workflow reliability evals for personal AI agents(verkyyi.github.io)▲ 33·wei_zhang·3w