Ragas
Ragas metric integration.
RAGASMetric(metric, name=None, callbacks=None, timeout=None, models=None, fallback_models=None, aggregation_method=DefaultValues.AGGREGATION_METHOD, max_concurrent_judges=None)
Bases: BaseMetric
RAGAS Metric.
RAGAS is a metric for evaluating the quality of RAG systems.
Available Fields
- input (str): The query to evaluate the metric. Similar to
user_inputinSingleTurnSample. - actual_output (str, optional): The generated response to evaluate the metric. Similar to
responseinSingleTurnSample. - expected_output (str, optional): The expected response to evaluate the metric. Similar to
referenceinSingleTurnSample. - expected_context (str | list[str], optional): The expected retrieved context to evaluate the metric.
Similar to
reference_contextsinSingleTurnSample. If the expected retrieved context is a str, it will be converted into a list with a single element. - retrieved_context (str | list[str], optional): The retrieved context to evaluate the metric. Similar to
retrieved_contextsinSingleTurnSample. If the retrieved context is a str, it will be converted into a list with a single element. - rubrics (dict[str, str], optional): The rubrics to evaluate the metric. Similar to
rubricsinSingleTurnSample.
Scoring
- 0.0-1.0 (Continuous): A score evaluating the RAG aspect being tested.
Initialize the RAGASMetric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
SingleTurnMetric
|
The Ragas metric to use. |
required |
name
|
str
|
The name of the metric. Default is the name of the metric. |
None
|
callbacks
|
Callbacks
|
The callbacks to use. Default is None. |
None
|
timeout
|
int
|
The timeout for the metric. Default is None. |
None
|
models
|
BaseLMInvoker | list[BaseLMInvoker] | None
|
The model invoker(s) to use for multi-judge evaluation. Defaults to None. |
None
|
fallback_models
|
list[BaseLMInvoker] | None
|
Ordered list of fallback invokers tried in sequence when the primary judge fails. Defaults to None. |
None
|
aggregation_method
|
AggregationSelector
|
The aggregation method to use for the metric. Defaults to DefaultValues.AGGREGATION_METHOD. |
AGGREGATION_METHOD
|
max_concurrent_judges
|
int | None
|
The maximum number of concurrent judges to use for the metric. Defaults to None. |
None
|
evaluate(data)
async
Evaluate with custom prompt lifecycle support.
Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.
For batch processing, uses efficient parallel processing when all items have the same custom prompts. Falls back to per-item processing when items have different custom prompts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
EvalInput | list[EvalInput]
|
Single data item or list of data items to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
MetricResult | list[MetricResult]
|
Evaluation results with scores namespaced by metric name. |
build_ragas_llm(models, fallback_models=None)
Build a RAGAS LLM wrapper from configured model invokers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
models
|
BaseLMInvoker | list[BaseLMInvoker] | None
|
The model invoker(s) to use for evaluation. If a list is provided, the first invoker is used. If None or an empty list is provided, the default LM invoker is used. |
required |
fallback_models
|
list[BaseLMInvoker] | None
|
Ordered fallback invokers. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
RagasLLMWrapper |
RagasLLMWrapper
|
The RAGAS-compatible LLM wrapper. |