Saved to Firebase

🔬 SciVisAgentBench Human Evaluation

0 / 0 cases evaluated

Welcome to Human Evaluation

This interface allows you to evaluate scientific visualization results generated by AI agents.

How It Works

  1. Fill in your information above
  2. Click "Start Evaluation Session" to begin
  3. Compare ground truth with agent result for each case
  4. Rate each metric on a scale of 0-10 (0 is the worst, 10 is the best)
  5. Add optional notes if needed
  6. Your evaluations are saved in real-time to Firebase ☁️
  7. You can pause and resume anytime - your progress is saved!