Livoa LogoLivoa
Input Stage
Natural Language Query


Simple chat interface for text-based ques-

File Upload


Direct upload of PDF/DOCX docume-

Image Input


Drag-and-drop screenshots, photos, diagrams

Audio Input


Microphone recording or audio file upload

Audio Processing


Whisper offline speach-to-text-transcript generation → text embedding conversion

Unified Storage
Document Procesing


PDF/DOCX text extraction → Intelligem chunking

Image Processing


CLIP vision-tanguage embeddings → metadata extraction

Audio Processing


Whisper offline speach-to-text → transcript generation → text embedding conversion

Unified Storage
Vector Database


FAISS/Odrant storing all modality embeddings in shared semantic space

Metadata Index


Page numbers, timestamp, file paths, access permissions

Retrieval & Generation
Semantic Search


Vector similarity across all modalities

Cross-Modal Fusion


Merge and rank results from-text, images, audio

Context Assembly


Build diverse, relevant context window for LLM

Output Stage
Cited Response


Generated answer with numbered citations

RBAC Control


Role-based access ensuring users only see

Flowchart

by In

0
0 uses