SciVisAgentBench - Human Evaluation

Welcome to Human Evaluation

This interface allows you to evaluate scientific visualization results generated by AI agents.

Full Name *

Institution *

Email *

📝 Task Description

All scoring must be based solely on comparisons between the Result Image and the Ground Truth Image.

For each visualization goal, assign scores according to the following criteria:

10 points (Perfect match): The result is visually identical or nearly identical to the ground truth for this criterion.
8-9 points (Excellent match): Only minor differences are present, with no meaningful impact on visualization quality.
6-7 points (Good match): Noticeable differences exist, but the core requirement is clearly satisfied.
4-5 points (Partial match): Significant differences are evident; the requirement is only partially fulfilled.
2-3 points (Poor match): Major discrepancies are present; the requirement is largely unmet.
0-1 points (No match): The result differs completely from the ground truth or fails to address the requirement altogether.

📝 Optional Notes: