3-channel RGB image
224 x 224 pixels
Tensor Shape: [3 x 224 x 224]
Conv 4x4, stride 4
Layer Normalization
Output: [96 x 56 x 56]
3 x ConvNeXt Blocks
Downsampling (2x2 conv, stride 2)
Output: [192 x 28 x 28]
9 x ConvNeXt Blocks
Output: [384 x 14 x 14]
Output: [768 x 7 x 7]
- 7x7 Depthwise Conv
- Layer Normalization
- 1x1 Pointwise Conv (expand channels x4)
- GELU Activation
- 1x1 Pointwise Conv (project channels back)
- Residual Connection
Global Average Pooling
Linear Layer (4 classes)
4-class scores
Softmax probabilities
by nbv