Skip to content

Aggregation

Aggregation module for GEval metrics.

This module provides metric aggregation utilities for polarity-aware pass/fail scoring. Extend MetricsAggregator and override compute_score, compute_issues, or compute_success to customise behaviour per evaluator.

AggregationResult(aggregate_success, aggregate_score, possible_issues=list()) dataclass

Result of metric aggregation.

Attributes:

Name Type Description
aggregate_success bool

AND-gate of all metric success values. Empty dict returns False.

aggregate_score float

Mean quality score with polarity inversion. Empty dict returns 0.0.

possible_issues list[str]

List of issue strings. Empty dict returns [Issue.EVAL_ISSUE].

MetricMappingMetricsAggregator(retrieval_metrics, generation_metrics)

Bases: MetricsAggregator

MetricsAggregator that maps failed metrics to Issue labels. Used by GEvalGenerationEvaluator.

Overrides compute_issues to emit Issue.RETRIEVAL_ISSUE or Issue.GENERATION_ISSUE based on which metric category failed.

Attributes:

Name Type Description
retrieval_metrics

Frozenset of metric names considered retrieval metrics.

generation_metrics

Frozenset of metric names considered generation metrics.

Initialize MetricMappingMetricsAggregator.

Parameters:

Name Type Description Default
retrieval_metrics frozenset[str]

Frozenset of retrieval metric names.

required
generation_metrics frozenset[str]

Frozenset of generation metric names.

required

compute_issues(named_results)

Compute issues based on metric success flags.

Parameters:

Name Type Description Default
named_results dict[str, GEvalMetricResult]

Dictionary mapping metric names to GEvalMetricResult objects.

required

Returns:

Type Description
list[str]

list[str]: List of Issue enum values for failed metric categories.

MetricsAggregator

Aggregator for GEval metrics.

Computes aggregate_success (AND-gate), aggregate_score (polarity-aware mean), and possible_issues (empty by default). Subclass and override any of the three compute methods to customize behavior per evaluator.

aggregate(named_results)

Aggregate GEval metric results.

Parameters:

Name Type Description Default
named_results dict[str, GEvalMetricResult]

Dictionary mapping metric names to GEvalMetricResult objects.

required

Returns:

Type Description
AggregationResult

AggregationResult with aggregate_success, aggregate_score, and possible_issues.

compute_issues(named_results)

Return list of possible issues. Empty by default; override in subclasses.

Parameters:

Name Type Description Default
named_results dict[str, GEvalMetricResult]

Dictionary mapping metric names to GEvalMetricResult objects.

required

Returns:

Type Description
list[str]

list[str]: Empty list.

compute_score(named_results)

Polarity-aware mean of metric scores.

Parameters:

Name Type Description Default
named_results dict[str, GEvalMetricResult]

Dictionary mapping metric names to GEvalMetricResult objects.

required

Returns:

Name Type Description
float float

Mean of quality-adjusted scores, or 0.0 if empty.

compute_success(named_results)

AND-gate of all metric success flags.

Parameters:

Name Type Description Default
named_results dict[str, GEvalMetricResult]

Dictionary mapping metric names to GEvalMetricResult objects.

required

Returns:

Name Type Description
bool bool

True if every metric passed, False otherwise.

WeightedMetricsAggregator(weights, score_mapping)

Bases: MetricsAggregator

MetricsAggregator with weighted scoring. Used by QTEvaluator.

Overrides compute_score to apply per-metric score mappings then a weighted sum.

IMPORTANT: Uses result.rubric_score (pre-threshold integer), not result.score (normalized float), because score_mapping keys are {1, 2, 3} rubric integers.

Attributes:

Name Type Description
weights

Dictionary mapping metric names to their weights.

score_mapping

Dictionary mapping metric names to either: - dict[int, float]: Maps rubric score integers to normalized floats. - Callable[[float], float]: Function to transform rubric scores.

Initialize WeightedMetricsAggregator.

Parameters:

Name Type Description Default
weights dict[str, float]

Dictionary mapping metric names to their weights.

required
score_mapping dict[str, dict[int, float] | Callable[[float], float]]

Dictionary mapping metric names to score transformations.

required

compute_score(named_results)

Compute weighted aggregate score using rubric_score lookups.

Parameters:

Name Type Description Default
named_results dict[str, GEvalMetricResult]

Dictionary mapping metric names to GEvalMetricResult objects.

required

Returns:

Name Type Description
float float

Weighted aggregate score, or 0.0 if no results or zero total weight.