Benchmarks
AI coding agent evaluation — full LLM input/output traces with performance KPIs.
Runs
Select a run from the list to view its LLM call trace.
AI coding agent evaluation — full LLM input/output traces with performance KPIs.
Select a run from the list to view its LLM call trace.