Metrics aggregator
Metrics aggregation for polarity-aware binary scoring.
This module provides aggregation of GEval metrics using AND-gate success logic, polarity-aware mean scoring, and overridable compute methods for subclasses.
AggregationResult(aggregate_success, aggregate_score)
dataclass
Result of metric aggregation.
Attributes:
| Name | Type | Description |
|---|---|---|
aggregate_success |
bool
|
True if all metrics passed (AND-gate), False otherwise. |
aggregate_score |
float
|
Mean quality score with polarity inversion applied. |
MetricsAggregator
Aggregator for GEval metrics.
Computes aggregate_success (AND-gate) and aggregate_score (polarity-aware mean). Subclass and override compute_success or compute_score to customize behavior per evaluator.
aggregate(named_results)
Aggregate GEval metric results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
named_results
|
dict[str, MetricResult]
|
Dictionary mapping metric names to MetricResult objects. |
required |
Returns:
| Type | Description |
|---|---|
AggregationResult
|
AggregationResult with aggregate_success and aggregate_score. |
AggregationResult
|
Empty dict returns aggregate_success=False and aggregate_score=0.0. |
compute_score(named_results)
Polarity-aware mean of metric scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
named_results
|
dict[str, MetricResult]
|
Dictionary mapping metric names to MetricResult objects. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Mean of quality-adjusted scores, or 0.0 if empty. |
compute_success(named_results)
AND-gate of all metric success flags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
named_results
|
dict[str, MetricResult]
|
Dictionary mapping metric names to MetricResult objects. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if every metric passed. False if any metric failed or if named_results is empty (no metrics evaluated implies no confidence). |
WeightedMetricsAggregator(weights, score_mapping)
Bases: MetricsAggregator
MetricsAggregator with weighted scoring.
Overrides compute_score to apply per-metric score mappings then a weighted sum.
IMPORTANT: Uses result.rubric_score (pre-threshold integer), not result.score (normalized float), because score_mapping keys are {1, 2, 3} rubric integers.
Attributes:
| Name | Type | Description |
|---|---|---|
weights |
Dictionary mapping metric names to their weights. |
|
score_mapping |
Dictionary mapping metric names to either: - dict[int, float]: Maps rubric score integers to normalized floats. - Callable[[float], float]: Function to transform rubric scores. |
Initialize WeightedMetricsAggregator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weights
|
dict[str, float]
|
Dictionary mapping metric names to their weights. |
required |
score_mapping
|
dict[str, dict[int, float] | Callable[[float], float]]
|
Dictionary mapping metric names to score transformations. |
required |
compute_score(named_results)
Compute weighted aggregate score using rubric_score lookups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
named_results
|
dict[str, MetricResult]
|
Dictionary mapping metric names to MetricResult objects. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Weighted aggregate score, or 0.0 if no results or zero total weight. |