It should be totally deterministic with the use of random seeds, I think.

PeterisP · on April 19, 2022

It's generally not, as the exact results of floating point operations depend on operation order, and in most modern frameworks training the exact calculations aren't fully deterministic for performance reasons, you'll get slightly different results/gradients depending on whether you run the same matrix multiplication on CPU or GPU or different GPU or split between multiple GPUs etc.

It's generally considered that those variations should not impact model accuracy (other mentioned concerns like randomization for initialization, dropout or sample selection do affect accuracy so there are tools to ensure that they are reproducible from the random seeds), and we care a lot about training performance unless model accuracy is impacted, so there's not much engineering attention paid to ensuring that the model weights would be exactly identical and verifiable, most users would not accept a performance hit for that.

version_five · on April 20, 2022

This reflects my experience as well. Some frameworks like pytorch have a reproducibility function that can execute everything deterministically, at the expense of performance.

I've done lots of ensembling work where we train multiple copies of the model, and generally we would start with different seed each time. If we start with the same seed but don't force the training to be deterministic, the results are typically different on each training run, though I have not actually explored if they are "less different" than if you start with different random seeds for initializing everything. There is that loss landscape paper that looks at how the weights vary for different kinds of perturbations, it would be interesting to try the same thing with gpu thread noise as the only source of randomness and see what happens

croddin · on April 20, 2022

To what extent could the differences in weights between training runs/ architectures be bounded to a certain epsilon? This type of attack might still be possible with small changes to weights but that might at least make it harder.

qorrect · on April 19, 2022

He might mean features like dropout, or out of sample training, randomness that's introduced during training. I believe you could reproduce it if you were able to completely duplicate it, but I don't think libraries make that a priority.