Ragas
Ragas metric integration.
References
[1] https://github.com/explodinggradients/ragas
RAGASMetric(metric, name=None, callbacks=None, timeout=None)
Bases: BaseMetric
RAGAS metric.
RAGAS is a metric for evaluating the quality of RAG systems.
Attributes:
| Name | Type | Description |
|---|---|---|
metric |
SingleTurnMetric
|
The Ragas metric to use. |
name |
str
|
The name of the metric. |
callbacks |
Callbacks
|
The callbacks to use. |
timeout |
int
|
The timeout for the metric. |
Available Fields:
- query (str): The query to evaluate the metric. Similar to user_input in SingleTurnSample.
- generated_response (str | list[str], optional): The generated response to evaluate the metric. Similar to
response in SingleTurnSample. If the generated response is a list, the responses are concatenated into
a single string. For multiple responses, use list[str].
- expected_response (str | list[str], optional): The expected response to evaluate the metric. Similar to
reference in SingleTurnSample. If the expected response is a list, the responses are concatenated
into a single string.
- expected_retrieved_context (str | list[str], optional): The expected retrieved context to evaluate the metric.
Similar to reference_contexts in SingleTurnSample. If the expected retrieved context is a str, it will be
converted into a list with a single element.
- retrieved_context (str | list[str], optional): The retrieved context to evaluate the metric. Similar to
retrieved_contexts in SingleTurnSample. If the retrieved context is a str, it will be converted into a
list with a single element.
- rubrics (dict[str, str], optional): The rubrics to evaluate the metric. Similar to rubrics in
SingleTurnSample.
Initialize the RAGASMetric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
SingleTurnMetric
|
The Ragas metric to use. |
required |
name
|
str
|
The name of the metric. Default is the name of the metric. |
None
|
callbacks
|
Callbacks
|
The callbacks to use. Default is None. |
None
|
timeout
|
int
|
The timeout for the metric. Default is None. |
None
|
evaluate(data)
async
Evaluate with custom prompt lifecycle support.
Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.
For batch processing, uses efficient parallel processing when all items have the same custom prompts. Falls back to per-item processing when items have different custom prompts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MetricInput | list[MetricInput]
|
Single data item or list of data items to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
MetricOutput | list[MetricOutput]
|
Evaluation results with scores namespaced by metric name. |