Livoa LogoLivoa
The NLP Pipeline: Text, Speech, Vision, and Multimodality
Text
Tokenization & Embedding
Syntax, Semantics, Context
Summarization, Translation QA
Speech
Audio
ASR (Speech to Text)
Transcription,
Vision
Image
Vision Encoder
Captioning, Visual QA
Multimodality
Text
Speech
Image
Multimodal Transformer
Dialogue Systems, Multimodal Assistants (e.g., GPT-4V, Gemini)
Language Understanding Core
Encoder
Decoder
Input tokens
Positonal Encoding
Encoder
Embeddings
Positional Encoding
Self-Attention
Feed-Forward
Add & Norm
Outputtokens
Masked Self-Attention
Encoder-Decoder Attention
Feed-Forward
Add & Norm
Softmax
Task-Specdef Model
Feed-Forward
Output tokens

test1

by Husni

0
0 uses