datasets as inputs and predict in a forward pass
neural network
parameterized by θ
Training loss to be optimized
across millions of datasets
by faf