Livoa LogoLivoa
Self Adaptive Vision Transformer model

Raw Image 1 ∈ ℝ (H×W×C) where H = W = 512

Normalization norm = (1 - μ) / σ where μ = [0.485, 0.456, 0.406], σ = [0.229, 0.224, 0.225]
CNN Stem conv 7x7, stride=2, padding=3 x ∈ ℝ 64x3x7x7 BatchNorm + ReLU MaxPool 3x3, stride=2 output: F₁ ∈ ℝ 256x256x64
Patch Embedding Patch Extraction P = reshape F₁ position Encoding Z₀ = P + E_pos output Z₀ ∈ ℝ 256x768
Transformer Block multi-Head Attention Q,K,V = 2 W_q,k,v Attention = Softmax (QK^T / √d_k) V add & norm Z₁ = LN Z₀ + Attention MLP: FFN x = max(0, xW₁ + b₁) W₂ + b₂ Repeat L times
ASPP module Parallel Convolutions Global Average pooling + upsampling feature concatenation output F_aspp ∈ ℝ 256x256x256
Boundary Refinement Reverse Attention Progressive upsampling
Output Segmentation mask (Head) Y_seg = σ(conv x_i d_final) Boundary Detection Y_boundary = σ(conv x_i Bdy_refined)
Loss Calculation
Loss Function Total Loss Boundary Loss BCE Loss Dice Loss
Ground Truth mask

Self adaptive vision transfer model for burst polyp detection and diagnosis

by Mahfuz

0
0 uses