Skip to content

Caption

Schema for image captioning operations in Gen AI applications.

This module defines the data structures for representing results from image captioning operations. It provides: 1. Result class for image captions 2. Support for multiple caption types 3. Metadata storage 4. Domain knowledge integration 5. External context support through attachments

Caption

Bases: BaseModel

Result class for image captioning operations.

This class extends ImageToTextResult to provide a structured format for image captioning results, supporting: - Multiple caption types (one-liner, detailed, domain-specific) - Caption count tracking - Metadata storage for processing details

Attributes:

Name Type Description
image_one_liner str

Brief, single-sentence summary of the image. Defaults to empty string if not provided.

image_description str

Detailed, multi-sentence description of the image. Defaults to empty string if not provided.

domain_knowledge str

Domain-specific interpretation or context. Defaults to empty string if not provided.

number_of_captions int

Total number of distinct captions generated. Defaults to 0 if no captions are generated.

image_metadata dict[str, Any]

Additional information about the image such as image location.

attachments_context list[Attachment]

Optional list of external context objects (files, bytes, or pre-processed inputs) that can enrich captioning results. Bytes are automatically converted into Attachment objects via Attachment.from_bytes.

output_schema str

Output schema. Defaults to empty string if not provided.

schema_description str

Schema description. Defaults to empty string if not provided.

language str

Language of the captions. Defaults to "Indonesian" if not provided.

handle_none_attachments(attachments_value)

Normalize and validate attachments_context.

This method ensures that the attachments_context field is always a list of Attachment objects. It handles multiple input cases:

  • None -> returns an empty list
  • list[bytes] -> converts each item into an Attachment via Attachment.from_bytes
  • list[Attachment] -> keeps as-is
  • list[mixed] -> normalizes supported types, raises error on unsupported types
  • any other type -> raises TypeError

Parameters:

Name Type Description Default
attachments_value Any

Input value provided to attachments_context.

required

Returns:

Type Description
Any

list[Attachment]: A normalized list of Attachment objects.

Raises:

Type Description
TypeError

If an unsupported type is provided (e.g., str, dict).

handle_none_metadata(metadata_value) classmethod

Handle None values for image_metadata by using empty dict.

handle_none_number_of_captions(caption_value) classmethod

Handle None values for number_of_captions by using default.

handle_none_values(str_value) classmethod

Handle None values by converting them to default values.