• Dataset Loading
• Text Cleaning
• Audio Processing
• Text Tokenization
• Mel Spectrogram Extraction
• Speaker Embedding Extraction
• Text Encoder
• Speaker Conditioning
• Speech Decoder
• Postnet
• Loss Calculation
• Optimization
• Checkpointing
• Input Text
• Tokenization
• Model Generation
• Vocoder (HiFiGAN)
• Audio Output
by ss