Input Layer3-channel RGB image224 x 224 pixelsTensor Shape: [3 x 224 x 224]

Stem BlockConv 4x4, stride 4Layer NormalizationOutput: [96 x 56 x 56]

Stage 13 x ConvNeXt BlocksOutput: [96 x 56 x 56]

Stage 2Downsampling (2x2 conv, stride 2)3 x ConvNeXt BlocksOutput: [192 x 28 x 28]

Stage 3Downsampling (2x2 conv, stride 2)9 x ConvNeXt BlocksOutput: [384 x 14 x 14]

Stage 4Downsampling (2x2 conv, stride 2)3 x ConvNeXt BlocksOutput: [768 x 7 x 7]

ConvNeXt Block Details- 7x7 Depthwise Conv- Layer Normalization- 1x1 Pointwise Conv (expand channels x4)- GELU Activation- 1x1 Pointwise Conv (project channels back)- Residual Connection

Classification HeadGlobal Average PoolingLayer NormalizationLinear Layer (4 classes)

Final Output4-class scoresSoftmax probabilities

.,nm