Livoa LogoLivoa
Pretraining
Supervised


Fine-Tuning (SFT)

RLHF


(Reinforcement Learning
from Human Feedback)

Vast Text Corpora –


Diverse language data (including emotional expression & social dialogue)

Base LLM


Learns baseline patterns of human language, including emotional expression

Labeled Datasets


Example conversations for compassionate responses

Fine-Tuned LLM


Refines style and improves task performance and empathic responses

Fine-Tuned LLM


(from SFT)

Reward Model & Policy Optimization
Aligned LLM


(empathic style)

Human Raters


Evaluate responses, preferring helpful, safe, and supportive outputs

Training pipeline for an aligned language model: Pretraining → Supervised Fine-Tuning → Reinforcement Learning from Human Feedback

by Alastair

0
0 uses