Skip to content

Deepeval

DeepEval Metric Integration.

DeepEvalMetric(metric, name, num_judges=DefaultValues.NUM_JUDGES, aggregation_method=DefaultValues.AGGREGATION_METHOD, max_concurrent_judges=None)

Bases: BaseMetric

DeepEval Metric.

A wrapper for DeepEval metrics.

Available Fields
  • input (str): The query to evaluate the metric.
  • actual_output (str, optional): The generated response to evaluate the metric.
  • expected_output (str, optional): The expected response to evaluate the metric.
  • expected_context (str | list[str], optional): The expected retrieved context to evaluate the metric. If a str, it will be converted into a list with a single element.
  • retrieved_context (str | list[str], optional): The list of retrieved contexts to evaluate the metric. If a str, it will be converted into a list with a single element.
Scoring
  • 0.0-1.0 (Continuous): Or Boolean depending on the DeepEval metric.

Initializes the DeepEvalMetric class.

Parameters:

Name Type Description Default
metric BaseMetric

The DeepEval metric to wrap.

required
name str

The name of the metric.

required
num_judges int

The number of judges to use for the metric. Defaults to 1.

NUM_JUDGES
aggregation_method AggregationSelector

The aggregation method to use for the metric. Defaults to DefaultValues.AGGREGATION_METHOD.

AGGREGATION_METHOD
max_concurrent_judges int | None

The maximum number of concurrent judges to use for the metric. Defaults to None.

None

DeepEvalMetricFactory(name, model, model_credentials, model_config, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS, num_judges=DefaultValues.NUM_JUDGES, aggregation_method=DefaultValues.AGGREGATION_METHOD, max_concurrent_judges=None, **kwargs)

Bases: DeepEvalMetric, ABC

DeepEval Metric Factory.

Abstract base class for creating DeepEval metrics with a shared model invoker.

Available Fields
  • (Dynamic): Depends on the specific DeepEval metric being created.
Scoring
  • (Dynamic): Depends on the specific DeepEval metric.

Initializes the metric, handling common model invoker creation.

Parameters:

Name Type Description Default
name str

The name for the metric.

required
model Union[str, ModelId, BaseLMInvoker]

The model identifier or an existing LM invoker instance.

required
model_credentials Optional[str]

Credentials for the model, required if model is a string.

required
model_config Optional[Dict[str, Any]]

Configuration for the model.

required
batch_status_check_interval float

Time between batch status checks in seconds. Defaults to 30.0.

BATCH_STATUS_CHECK_INTERVAL
batch_max_iterations int

Maximum number of status check iterations before timeout. Defaults to 120.

BATCH_MAX_ITERATIONS
aggregation_method AggregationSelector

The aggregation method to use for repeated-judge evaluation. Defaults to DefaultValues.AGGREGATION_METHOD.

AGGREGATION_METHOD
**kwargs

Additional arguments for the specific DeepEval metric constructor.

{}