Livoa LogoLivoa
INPUT TEXT


(English Caption from OCR)

TOKENIZATION & LANGUAGE TAGGING


(en_XX source language token inserted)

ENCODER (12 Layers)


Multi-Head Attention Feed-Forward Network

1 Residual + LayerNorm applied at every block

Encoded Multilingual Representation
TARGET LANGUAGE FORCED BOS TOKEN


(ta_IN / te_IN / ml_IN during decoding)

DECODER (12 Layers)


Self-Attention (Masked) Cross-Attention (Encoder)

+ Feed-Forward Network

Residual Connections + Layer Normalization

LINEAR + SOFTMAX LAYERS


(Generate Next Token Probability)

TRANSLATED OUTPUT SENTENCE


(Tamil / Telugu / Malayalam Caption)

architecture

by Tanuja

0
0 uses