Skip to content

Geval generation evaluator

GEval Generation Evaluator.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

NONE

GEvalGenerationEvaluator(metrics=None, enabled_metrics=None, model=DefaultValues.MODEL, model_credentials=None, model_config=None, run_parallel=True, rule_book=None, generation_rule_engine=None, judge=None, refusal_metric=None)

Bases: GenerationEvaluator

GEval Generation Evaluator.

This evaluator is used to evaluate the generation of the model.

Default expected input
  • query (str): The query to evaluate the generation of the model's output.
  • retrieved_context (str): The retrieved context to evaluate the generation of the model's output.
  • expected_response (str): The expected response to evaluate the generation of the model's output.
  • generated_response (str): The generated response to evaluate the generation of the model's output.

Attributes:

Name Type Description
name str

The name of the evaluator.

metrics List[BaseMetric]

The list of metrics to evaluate.

run_parallel bool

Whether to run the metrics in parallel.

rule_book RuleBook | None

The rule book.

generation_rule_engine GenerationRuleEngine | None

The generation rule engine.

Initialize the GEval Generation Evaluator.

Parameters:

Name Type Description Default
metrics List[BaseMetric] | None

The list of metrics to evaluate.

None
enabled_metrics List[type[BaseMetric] | str] | None

The list of enabled metrics.

None
model str | ModelId | BaseLMInvoker

The model to use for the metrics.

MODEL
model_credentials str | None

The model credentials to use for the metrics.

None
model_config dict[str, Any] | None

The model config to use for the metrics.

None
run_parallel bool

Whether to run the metrics in parallel.

True
rule_book RuleBook | None

The rule book.

None
generation_rule_engine GenerationRuleEngine | None

The generation rule engine.

None
judge MultipleLLMAsJudge | None

Optional multiple LLM judge for ensemble evaluation.

None
refusal_metric type[BaseMetric] | None

The refusal metric to use. If None, the default refusal metric will be used. Defaults to GEvalRefusalMetric.

None