Caption
Schema for captioning operations in Gen AI applications.
This module defines the data structures for representing results from captioning operations (image, video, etc.). It provides: 1. Result class for captions 2. Support for multiple caption types 3. Metadata storage 4. Domain knowledge integration 5. External context support through attachments
Caption
Bases: BaseModel
Result class for captioning operations (image, video, etc.).
This class provides a structured format for captioning results, supporting: - Multiple caption types (one-liner, detailed, domain-specific) - Caption count tracking - Metadata storage for processing details
Attributes:
| Name | Type | Description |
|---|---|---|
text_one_liner |
str
|
Brief, single-sentence summary of the content. Defaults to empty string if not provided. |
text_context |
str
|
Detailed, multi-sentence description of the content. Defaults to empty string if not provided. |
domain_knowledge |
str
|
Domain-specific interpretation or context. Defaults to empty string if not provided. |
number_of_captions |
int
|
Total number of distinct captions generated. Defaults to 0 if no captions are generated. |
media_metadata |
dict[str, Any]
|
Additional information about the media such as location. |
multimodal_context |
list[Attachment | str]
|
Optional list of external context
objects (files, bytes, or pre-processed inputs) or raw strings that can
enrich captioning results. Bytes are automatically converted into
Attachment objects via |
output_schema |
str
|
Output schema. Defaults to empty string if not provided. |
schema_description |
str
|
Schema description. Defaults to empty string if not provided. |
language |
str
|
Language of the captions. Defaults to "Indonesian" if not provided. |
Deprecated
image_one_liner: Use text_one_liner instead. Will be removed in 0.4.0.
image_description: Use text_context instead. Will be removed in 0.4.0.
image_metadata: Use media_metadata instead. Will be removed in 0.4.0.
handle_deprecated_fields(values)
classmethod
Map deprecated field names to their replacements and emit warnings.
Deprecated
image_one_liner: Use text_one_liner instead. Will be removed in 0.4.0.
image_description: Use text_context instead. Will be removed in 0.4.0.
image_metadata: Use media_metadata instead. Will be removed in 0.4.0.
handle_multimodal_context(multimodal_value)
classmethod
Normalize and validate multimodal_context.
This method ensures that the multimodal_context field is a list of
Attachment objects or strings. It handles multiple input cases:
- None -> returns an empty list
- list[bytes] -> converts each item into an Attachment via
Attachment.from_bytes - list[Attachment] -> keeps as-is
- list[str] -> keeps as-is if it's not a valid image/binary source, otherwise converts to Attachment.
- list[mixed] -> normalizes supported types
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
multimodal_value
|
Any
|
Input value provided to |
required |
Returns:
| Type | Description |
|---|---|
Any
|
list[Attachment | str]: A normalized list of |
handle_none_metadata(metadata_value)
classmethod
Handle None values for media_metadata by using empty dict.
handle_none_number_of_captions(caption_value)
classmethod
Handle None values for number_of_captions by using default.
handle_none_values(str_value)
classmethod
Handle None values by converting them to default values.