Skip to content

Ragas

Ragas metric integration.

RAGASMetric(metric, name=None, callbacks=None, timeout=None, models=None, fallback_models=None, aggregation_method=DefaultValues.AGGREGATION_METHOD, max_concurrent_judges=None)

Bases: BaseMetric

RAGAS Metric.

RAGAS is a metric for evaluating the quality of RAG systems.

Available Fields
  • input (str): The query to evaluate the metric. Similar to user_input in SingleTurnSample.
  • actual_output (str, optional): The generated response to evaluate the metric. Similar to response in SingleTurnSample.
  • expected_output (str, optional): The expected response to evaluate the metric. Similar to reference in SingleTurnSample.
  • expected_context (str | list[str], optional): The expected retrieved context to evaluate the metric. Similar to reference_contexts in SingleTurnSample. If the expected retrieved context is a str, it will be converted into a list with a single element.
  • retrieved_context (str | list[str], optional): The retrieved context to evaluate the metric. Similar to retrieved_contexts in SingleTurnSample. If the retrieved context is a str, it will be converted into a list with a single element.
  • rubrics (dict[str, str], optional): The rubrics to evaluate the metric. Similar to rubrics in SingleTurnSample.
Scoring
  • 0.0-1.0 (Continuous): A score evaluating the RAG aspect being tested.

Initialize the RAGASMetric.

Parameters:

Name Type Description Default
metric SingleTurnMetric

The Ragas metric to use.

required
name str

The name of the metric. Default is the name of the metric.

None
callbacks Callbacks

The callbacks to use. Default is None.

None
timeout int

The timeout for the metric. Default is None.

None
models BaseLMInvoker | list[BaseLMInvoker] | None

The model invoker(s) to use for multi-judge evaluation. Defaults to None.

None
fallback_models list[BaseLMInvoker] | None

Ordered list of fallback invokers tried in sequence when the primary judge fails. Defaults to None.

None
aggregation_method AggregationSelector

The aggregation method to use for the metric. Defaults to DefaultValues.AGGREGATION_METHOD.

AGGREGATION_METHOD
max_concurrent_judges int | None

The maximum number of concurrent judges to use for the metric. Defaults to None.

None

evaluate(data) async

Evaluate with custom prompt lifecycle support.

Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.

For batch processing, uses efficient parallel processing when all items have the same custom prompts. Falls back to per-item processing when items have different custom prompts.

Parameters:

Name Type Description Default
data EvalInput | list[EvalInput]

Single data item or list of data items to evaluate.

required

Returns:

Type Description
MetricResult | list[MetricResult]

Evaluation results with scores namespaced by metric name.

build_ragas_llm(models, fallback_models=None)

Build a RAGAS LLM wrapper from configured model invokers.

Parameters:

Name Type Description Default
models BaseLMInvoker | list[BaseLMInvoker] | None

The model invoker(s) to use for evaluation. If a list is provided, the first invoker is used. If None or an empty list is provided, the default LM invoker is used.

required
fallback_models list[BaseLMInvoker] | None

Ordered fallback invokers. Defaults to None.

None

Returns:

Name Type Description
RagasLLMWrapper RagasLLMWrapper

The RAGAS-compatible LLM wrapper.