Skip to content

Metrics aggregator

Metrics aggregation for polarity-aware binary scoring.

This module provides aggregation of GEval metrics using AND-gate success logic, polarity-aware mean scoring, and overridable compute methods for subclasses.

AggregationResult(aggregate_success, aggregate_score) dataclass

Result of metric aggregation.

Attributes:

Name Type Description
aggregate_success bool

True if all metrics passed (AND-gate), False otherwise.

aggregate_score float

Mean quality score with polarity inversion applied.

MetricsAggregator

Aggregator for GEval metrics.

Computes aggregate_success (AND-gate) and aggregate_score (polarity-aware mean). Subclass and override compute_success or compute_score to customize behavior per evaluator.

aggregate(named_results)

Aggregate GEval metric results.

Parameters:

Name Type Description Default
named_results dict[str, MetricResult]

Dictionary mapping metric names to MetricResult objects.

required

Returns:

Type Description
AggregationResult

AggregationResult with aggregate_success and aggregate_score.

AggregationResult

Empty dict returns aggregate_success=False and aggregate_score=0.0.

compute_score(named_results)

Polarity-aware mean of metric scores.

Parameters:

Name Type Description Default
named_results dict[str, MetricResult]

Dictionary mapping metric names to MetricResult objects.

required

Returns:

Name Type Description
float float

Mean of quality-adjusted scores, or 0.0 if empty.

compute_success(named_results)

AND-gate of all metric success flags.

Parameters:

Name Type Description Default
named_results dict[str, MetricResult]

Dictionary mapping metric names to MetricResult objects.

required

Returns:

Name Type Description
bool bool

True if every metric passed. False if any metric failed or if named_results is empty (no metrics evaluated implies no confidence).

WeightedMetricsAggregator(weights, score_mapping)

Bases: MetricsAggregator

MetricsAggregator with weighted scoring.

Overrides compute_score to apply per-metric score mappings then a weighted sum.

IMPORTANT: Uses result.rubric_score (pre-threshold integer), not result.score (normalized float), because score_mapping keys are {1, 2, 3} rubric integers.

Attributes:

Name Type Description
weights

Dictionary mapping metric names to their weights.

score_mapping

Dictionary mapping metric names to either: - dict[int, float]: Maps rubric score integers to normalized floats. - Callable[[float], float]: Function to transform rubric scores.

Initialize WeightedMetricsAggregator.

Parameters:

Name Type Description Default
weights dict[str, float]

Dictionary mapping metric names to their weights.

required
score_mapping dict[str, dict[int, float] | Callable[[float], float]]

Dictionary mapping metric names to score transformations.

required

compute_score(named_results)

Compute weighted aggregate score using rubric_score lookups.

Parameters:

Name Type Description Default
named_results dict[str, MetricResult]

Dictionary mapping metric names to MetricResult objects.

required

Returns:

Name Type Description
float float

Weighted aggregate score, or 0.0 if no results or zero total weight.