Skip to content

Dataset

Dataset module for loading and managing evaluation datasets.

This module provides dataset classes and utilities for loading evaluation data from various sources including HuggingFace datasets, Langfuse, Google Sheets, CSV files, and JSONL files. It includes a factory pattern for automatic dataset type detection and creation.

Supported dataset types: - BaseDataset: Abstract base class for all datasets - DictDataset: Dictionary-based dataset implementation - HuggingFaceDataset: Load datasets from HuggingFace Hub - LangfuseDataset: Load datasets from Langfuse platform - SpreadsheetDataset: Load datasets from Google Sheets or CSV files

BaseDataset(version=None, hash=None, name=None, description=None, schema=None, additional_metadata=None, attachments_config=None)

Bases: ABC, Iterable

Base class for all datasets.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to evaluate.

version str | None

The version of the dataset.

hash str | None

The hash of the dataset.

name str | None

The name of the dataset.

description str | None

The description of the dataset.

schema type[BaseModel] | None

The schema of the dataset.

additional_metadata dict[str, Any] | None

Additional metadata of the dataset.

Initialize the dataset.

Parameters:

Name Type Description Default
version str | None

The version of the dataset. Defaults to None.

None
hash str | None

The hash of the dataset. Defaults to None.

None
name str | None

The name of the dataset. Defaults to None.

None
description str | None

The description of the dataset. Defaults to None.

None
schema type[BaseModel] | None

The schema of the dataset. Defaults to None.

None
additional_metadata dict[str, Any] | None

Additional metadata of the dataset. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

__getitem__(index)

Get the item at the given index.

Parameters:

Name Type Description Default
index int

The index of the item to get.

required

Returns:

Type Description
MetricInput | list[MetricInput]

MetricInput | list[MetricInput]: The item at the given index or a list of items if the index is a list.

__iter__()

Iterate over the dataset.

Returns:

Type Description
Iterator[MetricInput]

Iterator[MetricInput]: An iterator over the dataset.

__len__()

Get the length of the dataset.

Returns:

Name Type Description
int int

The length of the dataset.

filter(filter_fn)

Filter the dataset.

Parameters:

Name Type Description Default
filter_fn Callable[[MetricInput], bool]

The filter function.

required

load() abstractmethod

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

Raises:

Type Description
NotImplementedError

If the load method is not implemented.

map(map_fn)

Map the dataset.

Parameters:

Name Type Description Default
map_fn Callable[[MetricInput], MetricInput]

The map function.

required

prepare_row_for_inference(row, **kwargs) async

Prepare a single row for inference.

This method allows dataset-specific preprocessing before inference. For example, LangfuseDataset might create/sync dataset items here.

Parameters:

Name Type Description Default
row dict[str, Any]

The row to prepare.

required
**kwargs Any

Additional arguments (e.g., dataset_item_id for Langfuse).

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The prepared row.

sample(n=3)

Sample n items from the dataset.

Parameters:

Name Type Description Default
n int

The number of items to sample.

3

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The sampled items.

shuffle()

Shuffle the dataset.

to_standard_format()

Convert dataset to standard format for inference.

For most datasets, this is a no-op as they already return standard format. LangfuseDataset overrides this to convert from Langfuse format.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: Dataset in standard format.

validate() abstractmethod

Validate the dataset.

Raises:

Type Description
NotImplementedError

If the validate method is not implemented.

DictDataset(dataset, dataset_name=None, attachments_config=None)

Bases: BaseDataset

Dict-Based Dataset.

This class is a subclass of the BaseDataset class. It is used to store a dataset in a dictionary format.

Attributes:

Name Type Description
dataset list[dict]

The dataset to evaluate.

Initialize the DictDataset class.

Parameters:

Name Type Description Default
dataset list[MetricInput]

The dataset to use for the evaluation.

required
dataset_name str | None

The name of the dataset.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

from_csv(path, dataset_name=None, attachments_config=None, **kwargs) classmethod

Load a dataset from a CSV file.

Parameters:

Name Type Description Default
path str

The path to the CSV file.

required
dataset_name str | None

The name of the dataset. If None, defaults to filename. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None
**kwargs Any

Additional arguments to pass to pandas read_csv.

{}

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

from_jsonl(path, dataset_name=None, attachments_config=None, **kwargs) classmethod

Load a dataset from a JSONL file.

Parameters:

Name Type Description Default
path str

The path to the JSONL file.

required
dataset_name str | None

The name of the dataset. If None, defaults to filename. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None
**kwargs Any

Additional arguments to pass to the constructor (deprecated, use attachments_config instead).

{}

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

load()

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput.

HuggingFaceDataset(dataset, dataset_name=None, attachments_config=None)

Bases: BaseDataset

Hugging Face dataset class for the evaluator.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to use for the evaluation.

Initialize the HuggingFaceDataset class.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to use for the evaluation.

required
dataset_name str | None

The name of the dataset. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

from_hub(path_or_name, split, dataset_name=None, attachments_config=None, **kwargs) staticmethod

Create a HuggingFaceDataset from a Hugging Face dataset.

Parameters:

Name Type Description Default
path_or_name str

The path or name of the dataset.

required
split str

The split of the dataset.

required
dataset_name str | None

The name of the dataset. If None, defaults to path_or_name. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None
**kwargs Any

Additional arguments to pass to the load function.

{}

Returns:

Name Type Description
HuggingFaceDataset HuggingFaceDataset

The created dataset.

from_list(dataset, dataset_name=None, attachments_config=None) staticmethod

Create a HuggingFaceDataset from a list of MetricInput.

Parameters:

Name Type Description Default
dataset list[MetricInput]

The dataset to create.

required
dataset_name str | None

The name of the dataset. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

Returns:

Name Type Description
HuggingFaceDataset HuggingFaceDataset

The created dataset.

load()

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput.

LangfuseDataset(dataset, langfuse_client, dataset_name=None, expected_output_key='expected_response', mapping=None, attachments_config=None)

Bases: BaseDataset

Langfuse dataset class for the evaluator.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to use for the evaluation.

langfuse_client Langfuse

The Langfuse client instance.

dataset_name str

The name of the dataset in Langfuse.

expected_output_key str | None

The key for expected output. Defaults to "expected_response".

mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

Initialize the LangfuseDataset class.

Parameters:

Name Type Description Default
dataset List[MetricInput]

The dataset to use for the evaluation.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name Optional[str]

The name of the dataset in Langfuse.

None
expected_output_key str | None

The key for expected output. Defaults to "expected_response".

'expected_response'
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

convert_to_standard_dataset(expected_output_key=None, mapping=None)

Convert the dataset to standard data.

Parameters:

Name Type Description Default
expected_output_key str | None

The key for expected output. Defaults to None.

None
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None

Returns:

Type Description
List[MetricInput]

List[MetricInput]: The converted dataset.

from_csv(path, langfuse_client, dataset_name=None, dataset_description='', metadata=None, is_append=False, attachments_config=None, **kwargs) staticmethod

Create a LangfuseDataset from a CSV file.

Parameters:

Name Type Description Default
path str

The path to the CSV file.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name to register this dataset under in Langfuse. If None, defaults to the CSV filename without extension. Defaults to None.

None
dataset_description str

The description of the dataset. Defaults to an empty string.

''
metadata dict

Optional metadata for the dataset. Defaults to None.

None
is_append bool

If True, append items to existing dataset. If False, only create if dataset doesn't exist.

False
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None
**kwargs Any

Additional arguments to pass to pandas read_csv.

{}

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The created dataset.

from_dict(dataset, langfuse_client, dataset_name, dataset_description='', mapping=None, metadata=None, is_append=False, attachments_config=None) staticmethod

Create a LangfuseDataset from a list of MetricInput.

Parameters:

Name Type Description Default
dataset List[MetricInput]

The dataset to create.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

required
dataset_description str

The description of the dataset. Defaults to an empty string.

''
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None
metadata dict

Optional metadata for the dataset. Defaults to None.

None
is_append bool

If True, append items to existing dataset. If False, only create if dataset doesn't exist.

False
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The created dataset.

from_gsheets(sheet_id, worksheet_name, client_email, private_key, langfuse_client, dataset_name=None, dataset_description='', mapping=None, metadata=None, is_append=False, attachments_config=None) async staticmethod

Create a LangfuseDataset from Google Sheets.

Parameters:

Name Type Description Default
sheet_id str

The ID of the Google Sheet.

required
worksheet_name str

The name of the worksheet within the Google Sheet.

required
client_email str

The client email for Google Sheets API.

required
private_key str

Base64-encoded private key for Google Sheets API.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

None
dataset_description str

The description of the dataset. Defaults to an empty string.

''
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None
metadata dict

Optional metadata for the dataset. Defaults to None.

None
is_append bool

If True, append items to existing dataset. If False, only create if dataset doesn't exist.

False
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The created dataset.

from_jsonl(path, langfuse_client, dataset_name=None, dataset_description='', metadata=None, is_append=False, attachments_config=None, **kwargs) staticmethod

Create a LangfuseDataset from a JSONL file.

Parameters:

Name Type Description Default
path str

The path to the JSONL file.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

None
dataset_description str

The description of the dataset. Defaults to an empty string.

''
metadata dict

Optional metadata for the dataset. Defaults to None.

None
is_append bool

If True, append items to existing dataset. If False, only create if dataset doesn't exist.

False
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None
**kwargs Any

Additional arguments to pass to the constructor (deprecated, use attachments_config instead).

{}

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The created dataset.

from_langfuse(langfuse_client, dataset_name, mapping=None, attachments_config=None) staticmethod

Load a dataset from Langfuse.

Parameters:

Name Type Description Default
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

required
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The loaded dataset.

Raises:

Type Description
ValueError

If the dataset is not found or has no data.

load()

Load the dataset.

Returns:

Type Description
List[MetricInput]

List[MetricInput]: The loaded dataset with proper Langfuse structure.

prepare_row_for_inference(row, dataset_name=None, dataset_item_id=None, metadata=None, **kwargs) async

Prepare row for inference by syncing with Langfuse.

This creates or syncs the dataset item in Langfuse before inference.

Parameters:

Name Type Description Default
row dict[str, Any]

The row to prepare.

required
dataset_name str | None

Name of the Langfuse dataset. Uses self._dataset_name if None.

None
dataset_item_id str | None

Optional dataset item ID.

None
metadata dict[str, Any] | None

Additional metadata.

None
**kwargs Any

Additional arguments.

{}

Returns:

Type Description
dict[str, Any]

dict[str, Any]: The prepared row, potentially with item_id added.

to_standard_format()

Convert Langfuse format dataset to standard format.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: Dataset in standard format.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput or if required fields are missing.

SpreadsheetDataset(dataset, dataset_name=None, attachments_config=None)

Bases: BaseDataset

Spreadsheet dataset class for the evaluator.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to use for the evaluation.

Initialize the SpreadsheetDataset class.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to use for the evaluation.

required
dataset_name str | None

The name of the dataset. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

from_gsheets(sheet_id, worksheet_name, client_email, private_key, dataset_name=None, attachments_config=None) async staticmethod

Load the dataset from Google Sheets.

Parameters:

Name Type Description Default
sheet_id str

The ID of the Google Sheet.

required
worksheet_name str

The name of the worksheet within the Google Sheet.

required
client_email str

The client email for Google Sheets API.

required
private_key str

Base64-encoded private key for Google Sheets API.

required
dataset_name str | None

The name of the dataset. If None, defaults to worksheet_name. Defaults to None.

None
attachments_config AttachmentConfig | dict[str, Any] | None

Configuration for loading attachments. Defaults to None.

None

Returns:

Name Type Description
SpreadsheetDataset SpreadsheetDataset

The loaded dataset.

load()

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput.

get_dataset(dataset, **kwargs)

Detect the dataset type and create an instance.

Supported dataset string format
  • hf/ (Hugging Face dataset)
  • langfuse/ (Langfuse dataset) Required parameters: langfuse_client
  • gs/ (Google Sheets dataset) Required parameters: sheet_id, client_email, private_key
  • (JSONL file)
  • (CSV file)

Parameters:

Name Type Description Default
dataset str | BaseDataset

The dataset to detect.

required
**kwargs Any

Additional arguments to pass to the dataset constructor.

{}

Returns:

Name Type Description
BaseDataset BaseDataset

The detected dataset.

str str

The dataset name.

Raises:

Type Description
ValueError

If the dataset is not supported.

load_simple_agent_dataset()

Load the simple agent dataset from the local CSV file.

The dataset contains agent interaction data with trajectories, questions, and responses, suitable for both agent trajectory evaluation and generation quality evaluation.

Returns:

Name Type Description
DictDataset DictDataset

The loaded simple agent dataset containing MetricInput dictionaries with keys: 'query', 'generated_response', 'expected_response', 'agent_trajectory', 'expected_agent_trajectory'.

Raises:

Type Description
FileNotFoundError

If the CSV file doesn't exist.

ValueError

If the CSV file is empty or malformed.

load_simple_qa_dataset()

Load the simple QA dataset from the local CSV file.

The dataset contains question-answer pairs with generated responses and contexts, suitable for RAG evaluation and testing.

Returns:

Name Type Description
DictDataset DictDataset

The loaded simple QA dataset containing MetricInput dictionaries with keys: 'query', 'generated_response', 'expected_output', 'retrieved_context'.

Raises:

Type Description
FileNotFoundError

If the CSV file doesn't exist.

ValueError

If the CSV file is empty or malformed.