Analysis
Understanding your evaluation results through various charts and metrics
Analysis Overview
After completing your evaluation, you can analyze the results through various charts and metrics. The analysis dashboard provides comprehensive insights into your evaluation data. Want to see these analysis features in action? Check out our sample evaluation analysis to explore a real-world example of how these charts and metrics come together.
Mean Score Analysis
Single Evaluation Charts
Mean Scores by Model
View the average scores for each model in your evaluation through a bar chart visualization. This chart helps you quickly compare the performance across different models.
Mean Scores by Tag
Analyze average scores grouped by tags you’ve assigned. This visualization helps identify patterns across different categories or attributes.
Double Evaluation Charts
Comparative Mean Scores by Model
For comparative evaluations, see which model was preferred in direct comparisons. This chart shows the preference distribution between Model A and Model B.
Comparative Mean Scores by Tag
Compare performances across different tags in paired evaluations.
Detailed Analysis
Response Distribution
Answer Frequency by Model
See how responses are distributed across different models. This horizontal bar chart shows the frequency of each response option per model.
Answer Frequency by Tag
Analyze response patterns based on tags.
Per-Query Response Distribution
View detailed response distributions for individual queries. This helps identify specific queries that received particular response patterns.
Query-Level Analysis
Query Summary
View a comprehensive list of all quries with their:
- Associated models
- Tags
- Mean scores
Query Details
Access detailed information about each evaluated file:
- Individual responses
- Evaluator demographics (nationality, gender)
- Associated script (if available)
Interactive Scatter Plot
Explore all query scores in an interactive scatter plot:
- Filter by model
- Customize chart views
- Identify patterns and outliers
Pro Tip: Make the most of your analysis by utilizing tags effectively. Tags allow you to slice and dice your data in various ways, providing deeper insights into specific aspects of your evaluation. Consider adding tags for characteristics like:
- Speaker demographics (gender, age group)
- Audio characteristics (noisy, clean)
- Content type (question, statement)
- Any other relevant categorization
Was this page helpful?