1. Dataset Collection and Preparation
Combined dataset of toxic and non-toxic comments sourced from publicly available datasets
Convert comments into strings, tokenizing using a tokenizer
Flan-T5-small model, a pre-trained sequence-to-sequence model
by chai