Skip to content

Video caption result

Schema for video caption metadata.

Keyframe

Bases: BaseModel

Represents a keyframe extracted from a video segment.

Attributes:

Name Type Description
time_offset float

Time within the segment where the keyframe occurs.

caption str | None

Text description of this specific keyframe.

Segment

Bases: BaseModel

Represents a video segment with its captions, transcripts, and keyframes.

Attributes:

Name Type Description
start_time float | None

The segment's starting time in seconds.

end_time float | None

The segment's ending time in seconds.

transcripts list[AudioTranscript]

Optional list of transcripts for the segment.

segment_caption list[str]

The single, rich description of the segment's action/plot.

keyframes list[Keyframe]

Optional list of keyframes extracted from the segment.

ensure_caption()

Ensure segment has caption, fallback to keyframes/transcripts if needed.

ensure_keyframes()

Ensure all keyframes time offset is non-negative.

ensure_transcripts()

Ensure all transcripts time offset is non-negative.

VideoCaptionMetadata

Bases: BaseModel

Metadata for video captioning results.

Attributes:

Name Type Description
video_summary str

A high-level summary of the entire video's plot, topic, or main events.

segments list[Segment]

List of video segments with their captions and metadata.

ensure_segment_end_time_greater_than_start_time()

Ensure segment end time is greater than start time.

If end time equal or lower than start time, then use next segment start time-1 as end time.

VideoCaptionResult

Bases: VideoCaptionMetadata

Backward-compatible model alias for video caption result payload.