Deepeval
DeepEval Metric Integration.
DeepEvalMetric(metric, name, num_judges=DefaultValues.NUM_JUDGES, aggregation_method=DefaultValues.AGGREGATION_METHOD, max_concurrent_judges=None)
Bases: BaseMetric
DeepEval Metric.
A wrapper for DeepEval metrics.
Available Fields
- input (str): The query to evaluate the metric.
- actual_output (str, optional): The generated response to evaluate the metric.
- expected_output (str, optional): The expected response to evaluate the metric.
- expected_context (str | list[str], optional): The expected retrieved context to evaluate the metric. If a str, it will be converted into a list with a single element.
- retrieved_context (str | list[str], optional): The list of retrieved contexts to evaluate the metric. If a str, it will be converted into a list with a single element.
Scoring
- 0.0-1.0 (Continuous): Or Boolean depending on the DeepEval metric.
Initializes the DeepEvalMetric class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
BaseMetric
|
The DeepEval metric to wrap. |
required |
name
|
str
|
The name of the metric. |
required |
num_judges
|
int
|
The number of judges to use for the metric. Defaults to 1. |
NUM_JUDGES
|
aggregation_method
|
AggregationSelector
|
The aggregation method to use for the metric. Defaults to DefaultValues.AGGREGATION_METHOD. |
AGGREGATION_METHOD
|
max_concurrent_judges
|
int | None
|
The maximum number of concurrent judges to use for the metric. Defaults to None. |
None
|
DeepEvalMetricFactory(name, model, model_credentials, model_config, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS, num_judges=DefaultValues.NUM_JUDGES, aggregation_method=DefaultValues.AGGREGATION_METHOD, max_concurrent_judges=None, **kwargs)
Bases: DeepEvalMetric, ABC
DeepEval Metric Factory.
Abstract base class for creating DeepEval metrics with a shared model invoker.
Available Fields
- (Dynamic): Depends on the specific DeepEval metric being created.
Scoring
- (Dynamic): Depends on the specific DeepEval metric.
Initializes the metric, handling common model invoker creation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name for the metric. |
required |
model
|
Union[str, ModelId, BaseLMInvoker]
|
The model identifier or an existing LM invoker instance. |
required |
model_credentials
|
Optional[str]
|
Credentials for the model, required if |
required |
model_config
|
Optional[Dict[str, Any]]
|
Configuration for the model. |
required |
batch_status_check_interval
|
float
|
Time between batch status checks in seconds. Defaults to 30.0. |
BATCH_STATUS_CHECK_INTERVAL
|
batch_max_iterations
|
int
|
Maximum number of status check iterations before timeout. Defaults to 120. |
BATCH_MAX_ITERATIONS
|
aggregation_method
|
AggregationSelector
|
The aggregation method to use for repeated-judge evaluation. Defaults to DefaultValues.AGGREGATION_METHOD. |
AGGREGATION_METHOD
|
**kwargs
|
Additional arguments for the specific DeepEval metric constructor. |
{}
|