> We train each GQN model simultaneously on 4 NVidia K80 GPUs for 2 million gradient steps. The values of the hyper-parameters used for optimisation are detailed in Table S1, and we show the effect of model size on final performance in Fig. S4.
> The values of all hyper-parameters were selected by
performing informal search. We did not perform a systematic grid search owing to the high
computational cost.
I would say having an obscene amount of compute is definitely a big competitive advantage, especially over a lot of small academic research labs.