Skip to content

Geval groundedness

GEval Groundedness Metric.

This metric is used to evaluate the groundedness of the generated output using GEval.

GEvalGroundednessMetric(*args, threshold=1.0, **kwargs)

Bases: DeepEvalGEvalMetric

GEval Groundedness Metric.

This metric is used to evaluate the groundedness of the generated output.

Available Fields
  • query (str): The query to evaluate the groundedness of the model's output.
  • generated_response (str): The generated response to evaluate the groundedness of the model's output.
  • retrieved_context (str): The retrieved context to evaluate the groundedness of the model's output.
Scoring
  • [0, 1] (Continuous): Normalized score range. Stored native 1-3 rubric value in rubric_score field.
Cookbook Example

Please refer to example_geval_groundedness.py in the gen-ai-sdk-cookbook repository.

Initializes the GEvalGroundednessMetric class.

Parameters:

Name Type Description Default
name str | None

The name of the metric. Defaults to "groundedness".

required
evaluation_params list[LLMTestCaseParams] | None

The evaluation parameters. Defaults to [INPUT, ACTUAL_OUTPUT, RETRIEVAL_CONTEXT].

required
models BaseLMInvoker | list[BaseLMInvoker] | None

The model invoker(s) to use for the metric.

required
criteria str | None

The criteria to use for the metric. Defaults to GROUNDEDNESS_CRITERIA.

required
evaluation_steps list[str] | None

The evaluation steps to use for the metric. Defaults to GROUNDEDNESS_EVALUATION_STEPS.

required
rubric list[Rubric] | None

The rubric to use for the metric. Defaults to GROUNDEDNESS_RUBRIC.

required
threshold float

The threshold to use for the metric. Defaults to 1.0. Must be between 0.0 and 1.0 inclusive.

1.0
additional_context str | None

Additional context like few-shot examples. Defaults to GROUNDEDNESS_FEW_SHOT.

required
batch_status_check_interval float

Time between batch status checks in seconds. Defaults to 30.0.

required
batch_max_iterations int

Maximum number of status check iterations before timeout. Defaults to 120.

required
strict_mode bool

If True, binarizes score to 1.0 or 0.0. Defaults to False.

required