Livoa
Discord
Pricing
English
Sign In
Router : Noisy top-k Gating
Multi-head self
attention output
(B,T,C)
Linear
Layer
Output tensor of
shape
(B,T,N_Experts)
Keep Top K in
last dimension,
zero out rest
(B,T,N_Experts)
Softmax along
last dim
Router/ Gating
network output: sparse
tensor with top
k elements
along last
dimension populated
Linear
Layer for
Noise
Noise logits
tensor of shape
(B,T,N_Experts)
Gaussian noise
(N(0,1)) of shape:
(B,T,N_Experts)
Noise Tensor of
shape:
(B,T,N_Experts)
+
×
Element-wise
multiplication
Element-wise
addition
softplus
Noise Added Logits
nai
by lilith
Use this design
0
0 uses