Livoa
Discord
Pricing
English
Sign In
Self Adaptive Vision Transformer model
Raw Image 1 ∈ ℝ (H×W×C) where H = W = 512
Normalization norm = (1 - μ) / σ where μ = [0.485, 0.456, 0.406], σ = [0.229, 0.224, 0.225]
CNN Stem conv 7x7, stride=2, padding=3 x ∈ ℝ 64x3x7x7 BatchNorm + ReLU MaxPool 3x3, stride=2 output: F₁ ∈ ℝ 256x256x64
Patch Embedding Patch Extraction P = reshape F₁ position Encoding Z₀ = P + E_pos output Z₀ ∈ ℝ 256x768
Transformer Block multi-Head Attention Q,K,V = 2 W_q,k,v Attention = Softmax (QK^T / √d_k) V add & norm Z₁ = LN Z₀ + Attention MLP: FFN x = max(0, xW₁ + b₁) W₂ + b₂ Repeat L times
ASPP module Parallel Convolutions Global Average pooling + upsampling feature concatenation output F_aspp ∈ ℝ 256x256x256
Boundary Refinement Reverse Attention Progressive upsampling
Output Segmentation mask (Head) Y_seg = σ(conv x_i d_final) Boundary Detection Y_boundary = σ(conv x_i Bdy_refined)
Loss Calculation
Loss Function Total Loss Boundary Loss BCE Loss Dice Loss
Ground Truth mask
Self adaptive vision transfer model for burst polyp detection and diagnosis
by Mahfuz
Use this design
0
0 uses