Schema

Schema package for gllm_training.

This module re-exports all schema components from various modules for backward compatibility.

`DPOColumnMappingConfig`

Bases: BaseModel

Defines the structure for column mapping configuration.

Attributes:

Name	Type	Description
`input_columns`	`Dict[str, str]`	Mapping of template variables to DataFrame column names.
`chosen`	`Optional[str]`	Column name for chosen.
`rejected`	`Optional[str]`	Column name for rejected.
`image_columns`	`Optional[Dict[str, str]]`	Mapping of template variables to DataFrame column names containing image paths or URLs.

`ExperimentConfig`

Bases: BaseModel

Defines the configuration for a single fine-tuning experiment.

Attributes:

Name	Type	Description
`experiment_id`	`str`	The ID of the experiment.
`hyperparameters`	`Hyperparameters`	The hyperparameters for the experiment.
`hyperparameters_id`	`str`	The ID of the hyperparameters.
`topic`	`str`	The topic of the experiment.
`model_name`	`str`	The name of the model.
`framework`	`FinetuningLibraryTypes`	The fine-tuning framework to use.
`finetuning_technique`	`FinetuningTechniques`	The fine-tuning technique to use. Options: "sft", "grpo". Defaults to "sft".
`multimodal`	`bool`	Whether the model is multimodal.
`datasets_path`	`str \| None`	The path to the datasets directory.
`train_filename`	`str \| None`	CSV filename for training data.
`validation_filename`	`str \| None`	CSV filename for validation data.
`prompt_filename`	`str \| None`	CSV filename for prompt templates.
`spreadsheet_id`	`str \| None`	The ID of the Google Sheets spreadsheet.
`google_sheets_client_email`	`str \| None`	Google Sheets client email.
`google_sheets_private_key`	`str \| None`	Google Sheets private key.
`google_scopes`	`list[str] \| None`	Google scopes for service account.
`google_token_uri`	`str \| None`	Google token URI for service account.
`train_sheet`	`str`	The name of the training sheet.
`validation_sheet`	`str \| None`	The name of the validation sheet.
`prompt_sheet`	`str`	The name of the prompt sheet.
`prompt_name`	`str \| None`	The name of the prompt.
`dataset_text_field`	`str`	The text field in the dataset.
`sft_column_mapping_config`	`dict[str, Any] \| None`	Custom configuration for column mappings.
`save_processed_dataset`	`bool`	Whether to save the processed dataset.
`output_processed_dir`	`str`	The output directory for the processed dataset.
`model_path`	`str \| None`	The path to the fine-tuned adapter or full model.

`FinetunedHyperparameters`

Bases: BaseModel

Defines the hyperparameters for a fine-tuning experiment.

Attributes:

Name	Type	Description
`hyperparameters_id`	`str`	The ID of the hyperparameters.
`model_settings`	`ModelConfig`	The model configuration.
`lora_config`	`LoraConfig`	The LoRA configuration.
`training_config`	`TrainingConfig`	The training configuration.
`storage_config`	`StorageConfig`	The storage configuration.

`FrameworkType`

Bases: StrEnum

Defines valid fine-tuning frameworks.

`GRPOColumnMappingConfig`

Bases: BaseModel

Defines the structure for column mapping configuration.

Attributes:

Name	Type	Description
`input_columns`	`Dict[str, str]`	Mapping of template variables to DataFrame column names.
`label_columns`	`Optional[str]`	Column name for labels.
`image_columns`	`Optional[Dict[str, str]]`	Mapping of template variables to DataFrame column names containing image paths or URLs.

`GoogleSheetsAuthentication`

Bases: BaseModel

Defines valid authentication parameters for Google Sheets API interaction.

Attributes:

Name	Type	Description
`spreadsheet_id`	`str`	The ID of the Google Sheets spreadsheet.
`client_email`	`str`	The client email for the Google Sheets API.
`private_key`	`str`	The private key for the Google Sheets API.
`google_token_uri`	`str`	The Google token URI for the Google Sheets API.

`SFTColumnMappingConfig`

Bases: BaseModel

Defines the structure for column mapping configuration.

Attributes:

Name	Type	Description
`input_columns`	`Dict[str, str]`	Mapping of template variables to DataFrame column names.
`dataset_column_names`	`str`	Column name that will be created in the final processed dataset. This column contains the formatted prompt text and maps to dataset_text_field.
`label_columns`	`Optional[str]`	Column name for labels.
`image_columns`	`Optional[Dict[str, str]]`	Mapping of template variables to DataFrame column names containing image paths or URLs.

`StorageConfig`

Bases: BaseModel

Defines the storage configuration for model uploads.

Attributes:

Name	Type	Description
`bucket_name`	`str \| None`	The name of the bucket. `None` (default) means no bucket is configured and cloud uploads will be skipped.
`upload_to_cloud`	`bool`	Whether to upload the model to the cloud.
`object_prefix`	`str`	The object prefix in the bucket.
`endpoint_url`	`str \| None`	The complete URL to use for the storage client.
`provider`	`str`	The cloud provider.
`replace_existing_artifact`	`bool`	When True, upload proceeds even if the remote artifact already exists, relying on the provider's overwrite semantics. When False (default), uploading to an existing remote path raises FileExistsError before the upload is attempted.

`VllmSamplingParamsConfig`

Bases: BaseModel

Defines the sampling parameters for VLLM.

Attributes:

Name	Type	Description
`min_p`	`float`	The minimum probability for the top-p sampling.
`top_p`	`float`	The top-p value for sampling.
`top_k`	`int`	The top-k value for sampling.
`seed`	`int`	The seed for the random number generator.
`max_tokens`	`int`	The maximum number of tokens to generate.
`stop`	`list[str]`	The stop tokens for generation.
`include_stop_str_in_output`	`bool`	Whether to include the stop string in the output.
`temperature`	`float`	The temperature for sampling.