Welcome to Human Evaluation
This interface allows you to evaluate scientific visualization results generated by AI agents.
How It Works
- Fill in your information above
- Click "Start Evaluation Session" to begin
- Compare ground truth with agent result for each case
- Rate each metric on a scale of 0-10 (0 is the worst, 10 is the best)
- Add optional notes if needed
- Your evaluations are saved in real-time to Firebase ☁️
- You can pause and resume anytime - your progress is saved!
Case 1 of 15
Case Name
📝 Task Description
🤖 Agent Result
⚠️ No results available for this case
Case 1 of 15