Skip to content

Deepeval

DeepEval Metric Integration.

DeepEvalMetric(metric, name, models=None, aggregation_method=DefaultValues.AGGREGATION_METHOD, max_concurrent_judges=None)

Bases: BaseMetric

DeepEval Metric.

A wrapper for DeepEval metrics.

Available Fields
  • input (str): The query to evaluate the metric.
  • actual_output (str, optional): The generated response to evaluate the metric.
  • expected_output (str, optional): The expected response to evaluate the metric.
  • expected_context (str | list[str], optional): The expected retrieved context to evaluate the metric. If a str, it will be converted into a list with a single element.
  • retrieved_context (str | list[str], optional): The list of retrieved contexts to evaluate the metric. If a str, it will be converted into a list with a single element.
Scoring
  • 0.0-1.0 (Continuous): Or Boolean depending on the DeepEval metric.

Initializes the DeepEvalMetric class.

Parameters:

Name Type Description Default
metric BaseMetric

The DeepEval metric to wrap.

required
name str

The name of the metric.

required
models BaseLMInvoker | list[BaseLMInvoker] | None

The model invoker(s) to use for multi-judge evaluation. Defaults to None.

None
aggregation_method AggregationSelector

The aggregation method to use for the metric. Defaults to DefaultValues.AGGREGATION_METHOD.

AGGREGATION_METHOD
max_concurrent_judges int | None

The maximum number of concurrent judges to use for the metric. Defaults to None.

None

DeepEvalMetricFactory(name, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS, models=None, fallback_models=None, aggregation_method=DefaultValues.AGGREGATION_METHOD, max_concurrent_judges=None, **kwargs)

Bases: DeepEvalMetric, ABC

DeepEval Metric Factory.

Abstract base class for creating DeepEval metrics with a shared model invoker.

Available Fields
  • (Dynamic): Depends on the specific DeepEval metric being created.
Scoring
  • (Dynamic): Depends on the specific DeepEval metric.

Initializes the metric, handling common model invoker creation.

Parameters:

Name Type Description Default
name str

The name for the metric.

required
batch_status_check_interval float

Time between batch status checks in seconds. Defaults to 30.0.

BATCH_STATUS_CHECK_INTERVAL
batch_max_iterations int

Maximum number of status check iterations before timeout. Defaults to 120.

BATCH_MAX_ITERATIONS
models BaseLMInvoker | list[BaseLMInvoker] | None

The model invoker(s) to use for multi-judge evaluation.

  • None (default): single-judge mode using the default invoker.
  • [invoker] * N: homogeneous — same model N times.
  • [invoker_a, invoker_b]: heterogeneous — distinct models.
None
fallback_models list[BaseLMInvoker] | None

Ordered list of fallback invokers tried in sequence when the primary judge fails. Defaults to None.

None
aggregation_method AggregationSelector

The aggregation method to use for repeated-judge evaluation. Defaults to DefaultValues.AGGREGATION_METHOD.

AGGREGATION_METHOD
max_concurrent_judges int | None

The maximum number of concurrent judges to use for the metric. Defaults to None.

None
**kwargs

Additional arguments for the specific DeepEval metric constructor.

{}