Schema
Schema package for gllm_training.
This module re-exports all schema components from various modules for backward compatibility.
DPOColumnMappingConfig
Bases: BaseModel
Defines the structure for column mapping configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
input_columns |
Dict[str, str]
|
Mapping of template variables to DataFrame column names. |
chosen |
Optional[str]
|
Column name for chosen. |
rejected |
Optional[str]
|
Column name for rejected. |
image_columns |
Optional[Dict[str, str]]
|
Mapping of template variables to DataFrame column names containing image paths or URLs. |
ExperimentConfig
Bases: BaseModel
Defines the configuration for a single fine-tuning experiment.
Attributes:
| Name | Type | Description |
|---|---|---|
experiment_id |
str
|
The ID of the experiment. |
hyperparameters |
Hyperparameters
|
The hyperparameters for the experiment. |
hyperparameters_id |
str
|
The ID of the hyperparameters. |
topic |
str
|
The topic of the experiment. |
model_name |
str
|
The name of the model. |
framework |
FinetuningLibraryTypes
|
The fine-tuning framework to use. |
finetuning_technique |
FinetuningTechniques
|
The fine-tuning technique to use. Options: "sft", "grpo". Defaults to "sft". |
multimodal |
bool
|
Whether the model is multimodal. |
datasets_path |
str | None
|
The path to the datasets directory. |
train_filename |
str | None
|
CSV filename for training data. |
validation_filename |
str | None
|
CSV filename for validation data. |
prompt_filename |
str | None
|
CSV filename for prompt templates. |
spreadsheet_id |
str | None
|
The ID of the Google Sheets spreadsheet. |
google_sheets_client_email |
str | None
|
Google Sheets client email. |
google_sheets_private_key |
str | None
|
Google Sheets private key. |
google_scopes |
list[str] | None
|
Google scopes for service account. |
google_token_uri |
str | None
|
Google token URI for service account. |
train_sheet |
str
|
The name of the training sheet. |
validation_sheet |
str | None
|
The name of the validation sheet. |
prompt_sheet |
str
|
The name of the prompt sheet. |
prompt_name |
str | None
|
The name of the prompt. |
dataset_text_field |
str
|
The text field in the dataset. |
sft_column_mapping_config |
dict[str, Any] | None
|
Custom configuration for column mappings. |
save_processed_dataset |
bool
|
Whether to save the processed dataset. |
output_processed_dir |
str
|
The output directory for the processed dataset. |
model_path |
str | None
|
The path to the fine-tuned adapter or full model. |
FinetunedHyperparameters
Bases: BaseModel
Defines the hyperparameters for a fine-tuning experiment.
Attributes:
| Name | Type | Description |
|---|---|---|
hyperparameters_id |
str
|
The ID of the hyperparameters. |
model_settings |
ModelConfig
|
The model configuration. |
lora_config |
LoraConfig
|
The LoRA configuration. |
training_config |
TrainingConfig
|
The training configuration. |
storage_config |
StorageConfig
|
The storage configuration. |
FrameworkType
Bases: StrEnum
Defines valid fine-tuning frameworks.
GRPOColumnMappingConfig
Bases: BaseModel
Defines the structure for column mapping configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
input_columns |
Dict[str, str]
|
Mapping of template variables to DataFrame column names. |
label_columns |
Optional[str]
|
Column name for labels. |
image_columns |
Optional[Dict[str, str]]
|
Mapping of template variables to DataFrame column names containing image paths or URLs. |
GoogleSheetsAuthentication
Bases: BaseModel
Defines valid authentication parameters for Google Sheets API interaction.
Attributes:
| Name | Type | Description |
|---|---|---|
spreadsheet_id |
str
|
The ID of the Google Sheets spreadsheet. |
client_email |
str
|
The client email for the Google Sheets API. |
private_key |
str
|
The private key for the Google Sheets API. |
google_token_uri |
str
|
The Google token URI for the Google Sheets API. |
SFTColumnMappingConfig
Bases: BaseModel
Defines the structure for column mapping configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
input_columns |
Dict[str, str]
|
Mapping of template variables to DataFrame column names. |
dataset_column_names |
str
|
Column name that will be created in the final processed dataset. This column contains the formatted prompt text and maps to dataset_text_field. |
label_columns |
Optional[str]
|
Column name for labels. |
image_columns |
Optional[Dict[str, str]]
|
Mapping of template variables to DataFrame column names containing image paths or URLs. |
StorageConfig
Bases: BaseModel
Defines the storage configuration for model uploads.
Attributes:
| Name | Type | Description |
|---|---|---|
bucket_name |
str | None
|
The name of the bucket. |
upload_to_cloud |
bool
|
Whether to upload the model to the cloud. |
object_prefix |
str
|
The object prefix in the bucket. |
endpoint_url |
str | None
|
The complete URL to use for the storage client. |
provider |
str
|
The cloud provider. |
replace_existing_artifact |
bool
|
When True, upload proceeds even if the remote artifact already exists, relying on the provider's overwrite semantics. When False (default), uploading to an existing remote path raises FileExistsError before the upload is attempted. |
VllmSamplingParamsConfig
Bases: BaseModel
Defines the sampling parameters for VLLM.
Attributes:
| Name | Type | Description |
|---|---|---|
min_p |
float
|
The minimum probability for the top-p sampling. |
top_p |
float
|
The top-p value for sampling. |
top_k |
int
|
The top-k value for sampling. |
seed |
int
|
The seed for the random number generator. |
max_tokens |
int
|
The maximum number of tokens to generate. |
stop |
list[str]
|
The stop tokens for generation. |
include_stop_str_in_output |
bool
|
Whether to include the stop string in the output. |
temperature |
float
|
The temperature for sampling. |