Caption
Schema for image captioning operations in Gen AI applications.
This module defines the data structures for representing results from image captioning operations. It provides: 1. Result class for image captions 2. Support for multiple caption types 3. Metadata storage 4. Domain knowledge integration 5. External context support through attachments
Caption
Bases: BaseModel
Result class for image captioning operations.
This class extends ImageToTextResult to provide a structured format for image captioning results, supporting: - Multiple caption types (one-liner, detailed, domain-specific) - Caption count tracking - Metadata storage for processing details
Attributes:
| Name | Type | Description |
|---|---|---|
image_one_liner |
str
|
Brief, single-sentence summary of the image. Defaults to empty string if not provided. |
image_description |
str
|
Detailed, multi-sentence description of the image. Defaults to empty string if not provided. |
domain_knowledge |
str
|
Domain-specific interpretation or context. Defaults to empty string if not provided. |
number_of_captions |
int
|
Total number of distinct captions generated. Defaults to 0 if no captions are generated. |
image_metadata |
dict[str, Any]
|
Additional information about the image such as image location. |
attachments_context |
list[Attachment]
|
Optional list of external context
objects (files, bytes, or pre-processed inputs) that can enrich
captioning results. Bytes are automatically converted into Attachment
objects via |
output_schema |
str
|
Output schema. Defaults to empty string if not provided. |
schema_description |
str
|
Schema description. Defaults to empty string if not provided. |
language |
str
|
Language of the captions. Defaults to "Indonesian" if not provided. |
handle_none_attachments(attachments_value)
Normalize and validate attachments_context.
This method ensures that the attachments_context field is always a list of
Attachment objects. It handles multiple input cases:
- None -> returns an empty list
- list[bytes] -> converts each item into an Attachment via
Attachment.from_bytes - list[Attachment] -> keeps as-is
- list[mixed] -> normalizes supported types, raises error on unsupported types
- any other type -> raises TypeError
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attachments_value
|
Any
|
Input value provided to |
required |
Returns:
| Type | Description |
|---|---|
Any
|
list[Attachment]: A normalized list of |
Raises:
| Type | Description |
|---|---|
TypeError
|
If an unsupported type is provided (e.g., str, dict). |
handle_none_metadata(metadata_value)
classmethod
Handle None values for image_metadata by using empty dict.
handle_none_number_of_captions(caption_value)
classmethod
Handle None values for number_of_captions by using default.
handle_none_values(str_value)
classmethod
Handle None values by converting them to default values.