USER INTERFACE
( Web App: Image / Audio / Mic )
IMAGE INPUT
AUDIO FILE INPUT
MICROPHONE INPUT
IMAGE PREPROCESSING
Grayscale,
Contrast, Sharp
AUDIO PREPROCESSING
Noise Reduction
Resampling (16kHz)
REAL-TIME AUDIO
Noise Filtering
Voice Capture
Textextract
ASR (Audio)
Wav2Vec2
ASR (Mic)
Whisper
Text Normalization
Context Classification
Multilingual Translation
Output
by Tanuja