SupervisedFine-Tuning (SFT)

RLHF(Reinforcement Learning from Human Feedback)

Vast Text Corpora –Diverse language data (including emotional expression & social dialogue)

Base LLMLearns baseline patterns of human language, including emotional expression

Labeled DatasetsExample conversations for compassionate responses

Fine-Tuned LLMRefines style and improves task performance and empathic responses

Fine-Tuned LLM(from SFT)

Reward Model & Policy Optimization

Aligned LLM(empathic style)

Human RatersEvaluate responses, preferring helpful, safe, and supportive outputs

Training pipeline for an aligned language model: Pretraining → Supervised Fine-Tuning → Reinforcement Learning from Human Feedback