Geval generation evaluator
GEval Generation Evaluator.
GEvalGenerationEvaluator(models=None, metrics=None, aggregation_method=None, max_concurrent_judges=None, run_parallel=True, refusal_metric=None, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS, metrics_aggregator=None, fallback_models=None)
Bases: BaseGenerationEvaluator
GEval Generation Evaluator.
This evaluator is used to evaluate the generation of the model.
Default expected input
- input (str): The input provided to the AI system or component (e.g., a query, prompt, or instruction).
- retrieved_context (str): Supporting context used during generation (e.g., retrieved documents).
- expected_output (str): The reference output used for comparison.
- actual_output (str): The output generated by the AI system or component to evaluate.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the evaluator. |
metrics |
List[BaseMetric]
|
The list of metrics to evaluate. |
run_parallel |
bool
|
Whether to run the metrics in parallel. |
metrics_aggregator |
MetricsAggregator
|
The aggregator for polarity-aware binary scoring. |
Initialize the GEval Generation Evaluator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
models
|
BaseLMInvoker | list[BaseLMInvoker] | None
|
Judge models for single-judge/multi-judge evaluation.
|
None
|
metrics
|
list[BaseMetric] | None
|
Metric instances to evaluate. If None, uses |
None
|
aggregation_method
|
AggregationSelector | None
|
Strategy used to aggregate judge results. Defaults to None. |
None
|
max_concurrent_judges
|
int | None
|
Maximum number of judges to run concurrently. Defaults to None. |
None
|
run_parallel
|
bool
|
Whether to run the metrics in parallel. |
True
|
refusal_metric
|
GEvalRefusalMetric | None
|
The refusal metric to use. If None, the default refusal metric will be used. Defaults to GEvalRefusalMetric. |
None
|
batch_status_check_interval
|
float
|
Time between batch status checks in seconds. Defaults to 30.0. |
BATCH_STATUS_CHECK_INTERVAL
|
batch_max_iterations
|
int
|
Maximum number of status check iterations before timeout. Defaults to 120 (60 minutes with default interval). |
BATCH_MAX_ITERATIONS
|
metrics_aggregator
|
MetricsAggregator | None
|
The aggregator for polarity-aware binary scoring. If None, a default MetricsAggregator is used. Defaults to None. |
None
|
fallback_models
|
list[BaseLMInvoker] | None
|
Ordered fallback invoker chain propagated to every metric. Defaults to None. |
None
|