Fine-Tuning (SFT)
(Reinforcement Learningfrom Human Feedback)
Diverse language data (including emotional expression & social dialogue)
Learns baseline patterns of human language, including emotional expression
Example conversations for compassionate responses
Refines style and improves task performance and empathic responses
(from SFT)
(empathic style)
Evaluate responses, preferring helpful, safe, and supportive outputs
by Alastair