Dataset
Dataset module for loading and managing evaluation datasets.
This module provides dataset classes and utilities for loading evaluation data from various sources including HuggingFace datasets, Langfuse, Google Sheets, CSV files, and JSONL files. It includes a factory pattern for automatic dataset type detection and creation.
Supported dataset types: - BaseDataset: Abstract base class for all datasets - DictDataset: Dictionary-based dataset implementation - HuggingFaceDataset: Load datasets from HuggingFace Hub - LangfuseDataset: Load datasets from Langfuse platform - SpreadsheetDataset: Load datasets from Google Sheets or CSV files
BaseDataset(version=None, hash=None, name=None, description=None, schema=None, additional_metadata=None, attachments_config=None)
Bases: ABC, Iterable
Base class for all datasets.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[MetricInput]
|
The dataset to evaluate. |
version |
str | None
|
The version of the dataset. |
hash |
str | None
|
The hash of the dataset. |
name |
str | None
|
The name of the dataset. |
description |
str | None
|
The description of the dataset. |
schema |
type[BaseModel] | None
|
The schema of the dataset. |
additional_metadata |
dict[str, Any] | None
|
Additional metadata of the dataset. |
Initialize the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
version |
str | None
|
The version of the dataset. Defaults to None. |
None
|
hash |
str | None
|
The hash of the dataset. Defaults to None. |
None
|
name |
str | None
|
The name of the dataset. Defaults to None. |
None
|
description |
str | None
|
The description of the dataset. Defaults to None. |
None
|
schema |
type[BaseModel] | None
|
The schema of the dataset. Defaults to None. |
None
|
additional_metadata |
dict[str, Any] | None
|
Additional metadata of the dataset. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
__getitem__(index)
Get the item at the given index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index |
int
|
The index of the item to get. |
required |
Returns:
| Type | Description |
|---|---|
MetricInput | list[MetricInput]
|
MetricInput | list[MetricInput]: The item at the given index or a list of items if the index is a list. |
__iter__()
Iterate over the dataset.
Returns:
| Type | Description |
|---|---|
Iterator[MetricInput]
|
Iterator[MetricInput]: An iterator over the dataset. |
__len__()
Get the length of the dataset.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The length of the dataset. |
filter(filter_fn)
Filter the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filter_fn |
Callable[[MetricInput], bool]
|
The filter function. |
required |
load()
abstractmethod
Load the dataset.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The loaded dataset. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the load method is not implemented. |
map(map_fn)
Map the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
map_fn |
Callable[[MetricInput], MetricInput]
|
The map function. |
required |
prepare_row_for_inference(row, **kwargs)
async
Prepare a single row for inference.
This method allows dataset-specific preprocessing before inference. For example, LangfuseDataset might create/sync dataset items here.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
row |
dict[str, Any]
|
The row to prepare. |
required |
**kwargs |
Any
|
Additional arguments (e.g., dataset_item_id for Langfuse). |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: The prepared row. |
sample(n=3)
Sample n items from the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n |
int
|
The number of items to sample. |
3
|
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The sampled items. |
shuffle()
Shuffle the dataset.
to_standard_format()
Convert dataset to standard format for inference.
For most datasets, this is a no-op as they already return standard format. LangfuseDataset overrides this to convert from Langfuse format.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: Dataset in standard format. |
validate()
abstractmethod
Validate the dataset.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the validate method is not implemented. |
DictDataset(dataset, dataset_name=None, attachments_config=None)
Bases: BaseDataset
Dict-Based Dataset.
This class is a subclass of the BaseDataset class. It is used to store a dataset in a dictionary format.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[dict]
|
The dataset to evaluate. |
Initialize the DictDataset class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
list[MetricInput]
|
The dataset to use for the evaluation. |
required |
dataset_name |
str | None
|
The name of the dataset. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
from_csv(path, dataset_name=None, attachments_config=None, **kwargs)
classmethod
Load a dataset from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the CSV file. |
required |
dataset_name |
str | None
|
The name of the dataset. If None, defaults to filename. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
**kwargs |
Any
|
Additional arguments to pass to pandas read_csv. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
DictDataset |
DictDataset
|
The loaded dataset. |
from_jsonl(path, dataset_name=None, attachments_config=None, **kwargs)
classmethod
Load a dataset from a JSONL file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the JSONL file. |
required |
dataset_name |
str | None
|
The name of the dataset. If None, defaults to filename. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
**kwargs |
Any
|
Additional arguments to pass to the constructor (deprecated, use attachments_config instead). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
DictDataset |
DictDataset
|
The loaded dataset. |
load()
Load the dataset.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The loaded dataset. |
validate()
Validate the dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not a list of MetricInput. |
HuggingFaceDataset(dataset, dataset_name=None, attachments_config=None)
Bases: BaseDataset
Hugging Face dataset class for the evaluator.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[MetricInput]
|
The dataset to use for the evaluation. |
Initialize the HuggingFaceDataset class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
Dataset
|
The dataset to use for the evaluation. |
required |
dataset_name |
str | None
|
The name of the dataset. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
from_hub(path_or_name, split, dataset_name=None, attachments_config=None, **kwargs)
staticmethod
Create a HuggingFaceDataset from a Hugging Face dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_or_name |
str
|
The path or name of the dataset. |
required |
split |
str
|
The split of the dataset. |
required |
dataset_name |
str | None
|
The name of the dataset. If None, defaults to path_or_name. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
**kwargs |
Any
|
Additional arguments to pass to the load function. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
HuggingFaceDataset |
HuggingFaceDataset
|
The created dataset. |
from_list(dataset, dataset_name=None, attachments_config=None)
staticmethod
Create a HuggingFaceDataset from a list of MetricInput.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
list[MetricInput]
|
The dataset to create. |
required |
dataset_name |
str | None
|
The name of the dataset. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
HuggingFaceDataset |
HuggingFaceDataset
|
The created dataset. |
load()
Load the dataset.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The loaded dataset. |
validate()
Validate the dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not a list of MetricInput. |
LangfuseDataset(dataset, langfuse_client, dataset_name=None, expected_output_key='expected_response', mapping=None, attachments_config=None)
Bases: BaseDataset
Langfuse dataset class for the evaluator.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[MetricInput]
|
The dataset to use for the evaluation. |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
dataset_name |
str
|
The name of the dataset in Langfuse. |
expected_output_key |
str | None
|
The key for expected output. Defaults to "expected_response". |
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
Initialize the LangfuseDataset class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
List[MetricInput]
|
The dataset to use for the evaluation. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
Optional[str]
|
The name of the dataset in Langfuse. |
None
|
expected_output_key |
str | None
|
The key for expected output. Defaults to "expected_response". |
'expected_response'
|
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
convert_to_standard_dataset(expected_output_key=None, mapping=None)
Convert the dataset to standard data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expected_output_key |
str | None
|
The key for expected output. Defaults to None. |
None
|
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
List[MetricInput]
|
List[MetricInput]: The converted dataset. |
from_csv(path, langfuse_client, dataset_name=None, dataset_description='', metadata=None, is_append=False, attachments_config=None, **kwargs)
staticmethod
Create a LangfuseDataset from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the CSV file. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name to register this dataset under in Langfuse. If None, defaults to the CSV filename without extension. Defaults to None. |
None
|
dataset_description |
str
|
The description of the dataset. Defaults to an empty string. |
''
|
metadata |
dict
|
Optional metadata for the dataset. Defaults to None. |
None
|
is_append |
bool
|
If True, append items to existing dataset. If False, only create if dataset doesn't exist. |
False
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
**kwargs |
Any
|
Additional arguments to pass to pandas read_csv. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The created dataset. |
from_dict(dataset, langfuse_client, dataset_name, dataset_description='', mapping=None, metadata=None, is_append=False, attachments_config=None)
staticmethod
Create a LangfuseDataset from a list of MetricInput.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
List[MetricInput]
|
The dataset to create. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name of the dataset in Langfuse. |
required |
dataset_description |
str
|
The description of the dataset. Defaults to an empty string. |
''
|
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
metadata |
dict
|
Optional metadata for the dataset. Defaults to None. |
None
|
is_append |
bool
|
If True, append items to existing dataset. If False, only create if dataset doesn't exist. |
False
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The created dataset. |
from_gsheets(sheet_id, worksheet_name, client_email, private_key, langfuse_client, dataset_name=None, dataset_description='', mapping=None, metadata=None, is_append=False, attachments_config=None)
async
staticmethod
Create a LangfuseDataset from Google Sheets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sheet_id |
str
|
The ID of the Google Sheet. |
required |
worksheet_name |
str
|
The name of the worksheet within the Google Sheet. |
required |
client_email |
str
|
The client email for Google Sheets API. |
required |
private_key |
str
|
Base64-encoded private key for Google Sheets API. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name of the dataset in Langfuse. |
None
|
dataset_description |
str
|
The description of the dataset. Defaults to an empty string. |
''
|
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
metadata |
dict
|
Optional metadata for the dataset. Defaults to None. |
None
|
is_append |
bool
|
If True, append items to existing dataset. If False, only create if dataset doesn't exist. |
False
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The created dataset. |
from_jsonl(path, langfuse_client, dataset_name=None, dataset_description='', metadata=None, is_append=False, attachments_config=None, **kwargs)
staticmethod
Create a LangfuseDataset from a JSONL file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the JSONL file. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name of the dataset in Langfuse. |
None
|
dataset_description |
str
|
The description of the dataset. Defaults to an empty string. |
''
|
metadata |
dict
|
Optional metadata for the dataset. Defaults to None. |
None
|
is_append |
bool
|
If True, append items to existing dataset. If False, only create if dataset doesn't exist. |
False
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
**kwargs |
Any
|
Additional arguments to pass to the constructor (deprecated, use attachments_config instead). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The created dataset. |
from_langfuse(langfuse_client, dataset_name, mapping=None, attachments_config=None)
staticmethod
Load a dataset from Langfuse.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name of the dataset in Langfuse. |
required |
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The loaded dataset. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not found or has no data. |
load()
Load the dataset.
Returns:
| Type | Description |
|---|---|
List[MetricInput]
|
List[MetricInput]: The loaded dataset with proper Langfuse structure. |
prepare_row_for_inference(row, dataset_name=None, dataset_item_id=None, metadata=None, **kwargs)
async
Prepare row for inference by syncing with Langfuse.
This creates or syncs the dataset item in Langfuse before inference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
row |
dict[str, Any]
|
The row to prepare. |
required |
dataset_name |
str | None
|
Name of the Langfuse dataset. Uses self._dataset_name if None. |
None
|
dataset_item_id |
str | None
|
Optional dataset item ID. |
None
|
metadata |
dict[str, Any] | None
|
Additional metadata. |
None
|
**kwargs |
Any
|
Additional arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: The prepared row, potentially with item_id added. |
to_standard_format()
Convert Langfuse format dataset to standard format.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: Dataset in standard format. |
validate()
Validate the dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not a list of MetricInput or if required fields are missing. |
SpreadsheetDataset(dataset, dataset_name=None, attachments_config=None)
Bases: BaseDataset
Spreadsheet dataset class for the evaluator.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[MetricInput]
|
The dataset to use for the evaluation. |
Initialize the SpreadsheetDataset class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
Dataset
|
The dataset to use for the evaluation. |
required |
dataset_name |
str | None
|
The name of the dataset. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
from_gsheets(sheet_id, worksheet_name, client_email, private_key, dataset_name=None, attachments_config=None)
async
staticmethod
Load the dataset from Google Sheets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sheet_id |
str
|
The ID of the Google Sheet. |
required |
worksheet_name |
str
|
The name of the worksheet within the Google Sheet. |
required |
client_email |
str
|
The client email for Google Sheets API. |
required |
private_key |
str
|
Base64-encoded private key for Google Sheets API. |
required |
dataset_name |
str | None
|
The name of the dataset. If None, defaults to worksheet_name. Defaults to None. |
None
|
attachments_config |
AttachmentConfig | dict[str, Any] | None
|
Configuration for loading attachments. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
SpreadsheetDataset |
SpreadsheetDataset
|
The loaded dataset. |
load()
Load the dataset.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The loaded dataset. |
validate()
Validate the dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not a list of MetricInput. |
get_dataset(dataset, **kwargs)
Detect the dataset type and create an instance.
Supported dataset string format
- hf/
(Hugging Face dataset) - langfuse/
(Langfuse dataset) Required parameters: langfuse_client - gs/
(Google Sheets dataset) Required parameters: sheet_id, client_email, private_key (JSONL file) (CSV file)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
str | BaseDataset
|
The dataset to detect. |
required |
**kwargs |
Any
|
Additional arguments to pass to the dataset constructor. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
BaseDataset |
BaseDataset
|
The detected dataset. |
str |
str
|
The dataset name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not supported. |
load_simple_agent_dataset()
Load the simple agent dataset from the local CSV file.
The dataset contains agent interaction data with trajectories, questions, and responses, suitable for both agent trajectory evaluation and generation quality evaluation.
Returns:
| Name | Type | Description |
|---|---|---|
DictDataset |
DictDataset
|
The loaded simple agent dataset containing MetricInput dictionaries with keys: 'query', 'generated_response', 'expected_response', 'agent_trajectory', 'expected_agent_trajectory'. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the CSV file doesn't exist. |
ValueError
|
If the CSV file is empty or malformed. |
load_simple_qa_dataset()
Load the simple QA dataset from the local CSV file.
The dataset contains question-answer pairs with generated responses and contexts, suitable for RAG evaluation and testing.
Returns:
| Name | Type | Description |
|---|---|---|
DictDataset |
DictDataset
|
The loaded simple QA dataset containing MetricInput dictionaries with keys: 'query', 'generated_response', 'expected_output', 'retrieved_context'. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the CSV file doesn't exist. |
ValueError
|
If the CSV file is empty or malformed. |