Gating MLP(Feature Attention)

Linear (9 → 32)

Linear (32 → 9)

Sigmoid

Gate outputsg ∈ (0, 1)

g1g2g3g4g5g6g7g8

Gated Inputs Xg = X ⊕ (1 + g)

Per Feature: Scale Factor in [1, 2]

Main MLP

Dense + Activation

Ŷ Prediction(S11)

𝒳 (9 Input Features)

Fig. 1: Overview of the proposed method.

<div style='text-align: center; font-size: 13px'>TabPFN is trained on synthetic data to take entire datasets as inputs and predict in a forward pass</div>

<div style='text-align: center; font-size: 12px'>TabPFN neural network parameterized by θ</div>

<div style='text-align: center; font-size: 11px'>-log qθ(ytest|...) Training loss to be optimized across millions of datasets</div>

<div style='position: absolute; top: -25px; left: 150px;'>2D TabPFN layer (12x)</div>

Predictions: ŷtest