Livoa LogoLivoa
Router : Noisy top-k Gating
Multi-head self
attention output
(B,T,C)
Linear
Layer
Output tensor of
shape
(B,T,N_Experts)
Keep Top K in
last dimension,
zero out rest
(B,T,N_Experts)
Softmax along
last dim
Router/ Gating
network output: sparse
tensor with top
k elements
along last
dimension populated
Linear
Layer for
Noise

Noise logits
tensor of shape
(B,T,N_Experts)

Gaussian noise
(N(0,1)) of shape:
(B,T,N_Experts)
Noise Tensor of
shape:
(B,T,N_Experts)
+
×
Element-wise
multiplication
Element-wise
addition

softplus

Noise Added Logits

nai

by lilith

0
0 uses