Types
Types for the evaluator.
This module contains the types for the evaluator.
References
NONE
AgentData
Bases: RAGData
Agent data.
A data for agent evaluation such as AgentEvals, etc. Extends RAGData to support both agent trajectory and generation evaluation.
Attributes:
| Name | Type | Description |
|---|---|---|
agent_trajectory |
list[dict[str, Any]]
|
The agent trajectory. |
expected_agent_trajectory |
list[dict[str, Any]] | None
|
The expected agent trajectory. |
query |
str | None
|
The query for generation evaluation. |
expected_response |
str | list[str] | None
|
The expected response for generation evaluation. |
generated_response |
str | list[str] | None
|
The generated response for generation evaluation. |
retrieved_context |
str | list[str] | None
|
The retrieved context for generation evaluation. |
AttachmentConfig
Bases: BaseModel, ABC
Base configuration for loading attachments.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
str
|
The type of attachment source. |
validate_config()
abstractmethod
Validate configuration (can be overridden if needed).
EvaluationResult
Bases: TypedDict
Structured result from the evaluate function.
This provides a unified return type that includes both evaluation results and experiment tracker URLs/paths.
Attributes:
| Name | Type | Description |
|---|---|---|
run_id |
str
|
The run ID for this evaluation. |
results |
list[list[EvaluationOutput]] | None
|
The evaluation results from all evaluators. |
experiment_urls |
ExperimentUrls
|
URLs and paths for accessing experiment data. |
dataset_name |
str | None
|
The name of the dataset that was evaluated. |
timestamp |
str
|
The timestamp of the evaluation in ISO 8601 format. |
num_samples |
int
|
The number of samples in the dataset. |
metadata |
dict[str, Any]
|
The metadata of the evaluation. |
ExperimentUrls
Bases: TypedDict
Experiment URLs and paths for different experiment trackers.
This TypedDict provides a unified interface for experiment tracker URLs/paths. Different trackers will populate different fields based on their capabilities.
Attributes:
| Name | Type | Description |
|---|---|---|
run_url |
str | None
|
URL to view the experiment results. For Langfuse: session URL. For Simple: local file path to experiment results. |
leaderboard_url |
str | None
|
URL to view the leaderboard. For Langfuse: dataset run URL. For Simple: local file path to leaderboard CSV. |
GoogleDriveAttachmentConfig
Bases: AttachmentConfig
Configuration for loading attachments from Google Drive.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal[GDRIVE]
|
Always "gdrive" for this implementation. |
client_email |
str | None
|
Google service account client email. |
private_key |
str | None
|
Google service account private key. |
folder_id |
str
|
Google Drive folder ID. |
service_account_file |
str | None
|
Path to service account JSON file (alternative to client_email/private_key). |
validate_config()
Google Drive-specific validation.
LocalAttachmentConfig
Bases: AttachmentConfig
Configuration for loading attachments from local directory.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal[LOCAL]
|
Always "local" for this implementation. |
local_directory |
str
|
Local directory path. |
validate_config()
Local-specific validation.
MetricResult
Bases: BaseModel
Metric Output Pydantic Model.
A structured output for metric results with score and explanation.
Attributes:
| Name | Type | Description |
|---|---|---|
score |
MetricValue
|
The evaluation score. Can be continuous (float), discrete (int), or categorical (str). |
explanation |
str | None
|
A detailed explanation of the evaluation result. |
QAData
Bases: TypedDict
QA data.
A data for QA evaluation such as GenerationEvaluator, etc.
Attributes:
| Name | Type | Description |
|---|---|---|
query |
str | None
|
The query. |
expected_response |
str | list[str] | None
|
The expected response. |
generated_response |
str | list[str] | None
|
The generated response. |
RAGData
Bases: QAData
RAG data.
Extends QAData.
Attributes:
| Name | Type | Description |
|---|---|---|
query |
str | None
|
The query. |
expected_response |
str | None
|
The expected response. For multiple responses, use list[str]. |
generated_response |
str | None
|
The generated response. For multiple responses, use list[str]. |
retrieved_context |
str | list[str] | None
|
The retrieved context. If the retrieved context is concatenated from multiple contexts, use str. Otherwise, use list[str]. |
expected_retrieved_context |
str | list[str] | None
|
The expected retrieved context. If the expected retrieved context is concatenated from multiple contexts, use str. Otherwise, use list[str]. |
rubrics |
dict[str, str] | None
|
Evaluation rubric for the sample. |
is_refusal |
bool | None
|
Whether the sample should be treated as a refusal response. |
RetrievalData
Bases: TypedDict
Retrieval data.
A data for retrieval evaluation such as ClassicalRetrievalEvaluator, etc.
Attributes:
| Name | Type | Description |
|---|---|---|
retrieved_chunks |
dict[str, float]
|
The retrieved chunks and their scores. |
ground_truth_chunk_ids |
list[str]
|
The ground truth chunk IDs. |
S3AttachmentConfig
Bases: AttachmentConfig
Configuration for loading attachments from S3.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal[S3]
|
Always "s3" for this implementation. |
s3_bucket |
str
|
S3 bucket name. |
s3_prefix |
str | None
|
S3 prefix (optional). |
aws_access_key_id |
str
|
AWS access key ID. |
aws_secret_access_key |
str
|
AWS secret access key. |
aws_region |
str
|
AWS region. |
validate_config()
S3-specific validation (if needed beyond Pydantic's).
create_attachment_config(config_dict)
Factory function to create the appropriate AttachmentConfig.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_dict |
dict[str, Any]
|
Configuration dictionary with 'type' field. |
required |
Returns:
| Type | Description |
|---|---|
AttachmentConfig
|
The appropriate AttachmentConfig subclass instance. |
Raises:
| Type | Description |
|---|---|
ValidationError
|
If configuration is invalid. |
Example
config = create_attachment_config({ ... "type": "s3", ... "s3_bucket": "my-bucket", ... "aws_access_key_id": "...", ... "aws_secret_access_key": "...", ... "aws_region": "us-east-1" ... }) isinstance(config, S3AttachmentConfig) True
validate_metric_result(parsed_response)
Validate the response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parsed_response |
dict
|
The response to validate. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The validated response. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the response is not valid. |