Skip to content

Types

Types for the evaluator.

This module contains the types for the evaluator.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

NONE

AgentData

Bases: RAGData

Agent data.

A data for agent evaluation such as AgentEvals, etc. Extends RAGData to support both agent trajectory and generation evaluation.

Attributes:

Name Type Description
agent_trajectory list[dict[str, Any]]

The agent trajectory.

expected_agent_trajectory list[dict[str, Any]] | None

The expected agent trajectory.

query str | None

The query for generation evaluation.

expected_response str | list[str] | None

The expected response for generation evaluation.

generated_response str | list[str] | None

The generated response for generation evaluation.

retrieved_context str | list[str] | None

The retrieved context for generation evaluation.

AttachmentConfig

Bases: BaseModel, ABC

Base configuration for loading attachments.

Attributes:

Name Type Description
type str

The type of attachment source.

validate_config() abstractmethod

Validate configuration (can be overridden if needed).

EvaluationResult

Bases: TypedDict

Structured result from the evaluate function.

This provides a unified return type that includes both evaluation results and experiment tracker URLs/paths.

Attributes:

Name Type Description
run_id str

The run ID for this evaluation.

results list[list[EvaluationOutput]] | None

The evaluation results from all evaluators.

experiment_urls ExperimentUrls

URLs and paths for accessing experiment data.

dataset_name str | None

The name of the dataset that was evaluated.

timestamp str

The timestamp of the evaluation in ISO 8601 format.

num_samples int

The number of samples in the dataset.

metadata dict[str, Any]

The metadata of the evaluation.

ExperimentUrls

Bases: TypedDict

Experiment URLs and paths for different experiment trackers.

This TypedDict provides a unified interface for experiment tracker URLs/paths. Different trackers will populate different fields based on their capabilities.

Attributes:

Name Type Description
run_url str | None

URL to view the experiment results. For Langfuse: session URL. For Simple: local file path to experiment results.

leaderboard_url str | None

URL to view the leaderboard. For Langfuse: dataset run URL. For Simple: local file path to leaderboard CSV.

GoogleDriveAttachmentConfig

Bases: AttachmentConfig

Configuration for loading attachments from Google Drive.

Attributes:

Name Type Description
type Literal[GDRIVE]

Always "gdrive" for this implementation.

client_email str | None

Google service account client email.

private_key str | None

Google service account private key.

folder_id str

Google Drive folder ID.

service_account_file str | None

Path to service account JSON file (alternative to client_email/private_key).

validate_config()

Google Drive-specific validation.

LocalAttachmentConfig

Bases: AttachmentConfig

Configuration for loading attachments from local directory.

Attributes:

Name Type Description
type Literal[LOCAL]

Always "local" for this implementation.

local_directory str

Local directory path.

validate_config()

Local-specific validation.

MetricResult

Bases: BaseModel

Metric Output Pydantic Model.

A structured output for metric results with score and explanation.

Attributes:

Name Type Description
score MetricValue

The evaluation score. Can be continuous (float), discrete (int), or categorical (str).

explanation str | None

A detailed explanation of the evaluation result.

QAData

Bases: TypedDict

QA data.

A data for QA evaluation such as GenerationEvaluator, etc.

Attributes:

Name Type Description
query str | None

The query.

expected_response str | list[str] | None

The expected response.

generated_response str | list[str] | None

The generated response.

RAGData

Bases: QAData

RAG data.

Extends QAData.

Attributes:

Name Type Description
query str | None

The query.

expected_response str | None

The expected response. For multiple responses, use list[str].

generated_response str | None

The generated response. For multiple responses, use list[str].

retrieved_context str | list[str] | None

The retrieved context. If the retrieved context is concatenated from multiple contexts, use str. Otherwise, use list[str].

expected_retrieved_context str | list[str] | None

The expected retrieved context. If the expected retrieved context is concatenated from multiple contexts, use str. Otherwise, use list[str].

rubrics dict[str, str] | None

Evaluation rubric for the sample.

is_refusal bool | None

Whether the sample should be treated as a refusal response.

RetrievalData

Bases: TypedDict

Retrieval data.

A data for retrieval evaluation such as ClassicalRetrievalEvaluator, etc.

Attributes:

Name Type Description
retrieved_chunks dict[str, float]

The retrieved chunks and their scores.

ground_truth_chunk_ids list[str]

The ground truth chunk IDs.

S3AttachmentConfig

Bases: AttachmentConfig

Configuration for loading attachments from S3.

Attributes:

Name Type Description
type Literal[S3]

Always "s3" for this implementation.

s3_bucket str

S3 bucket name.

s3_prefix str | None

S3 prefix (optional).

aws_access_key_id str

AWS access key ID.

aws_secret_access_key str

AWS secret access key.

aws_region str

AWS region.

validate_config()

S3-specific validation (if needed beyond Pydantic's).

create_attachment_config(config_dict)

Factory function to create the appropriate AttachmentConfig.

Parameters:

Name Type Description Default
config_dict dict[str, Any]

Configuration dictionary with 'type' field.

required

Returns:

Type Description
AttachmentConfig

The appropriate AttachmentConfig subclass instance.

Raises:

Type Description
ValidationError

If configuration is invalid.

Example

config = create_attachment_config({ ... "type": "s3", ... "s3_bucket": "my-bucket", ... "aws_access_key_id": "...", ... "aws_secret_access_key": "...", ... "aws_region": "us-east-1" ... }) isinstance(config, S3AttachmentConfig) True

validate_metric_result(parsed_response)

Validate the response.

Parameters:

Name Type Description Default
parsed_response dict

The response to validate.

required

Returns:

Name Type Description
dict dict

The validated response.

Raises:

Type Description
ValueError

If the response is not valid.