DATASET
SPLIT THE DATASET Train/Test/val
DATA TRANSFORMS AND DATA LOADER
CREATE A PRE-TRAINED VISION TRANSFORMER MODEL
REMOVE CLASSIFIER HEAD OR LAYER
FEATURE EMBEDDING LAYER FOR A BATCH OR ENTIRE
BUILD HYBRID MODEL (VIT FEATURES + CNN PROCESSED FEATURES)
EVALUATE HYBRID MODEL TO FIND BEST MODEL
TRAIN HYBRID MODEL
PREDICTION TAKES PLACE WITH BEST HYBRID MODEL ONLY IMAGE -- CORRECT PREDICTION GRADCAM -- WRONG PREDICTION
by rey