But then you aren't really throwing "random noise" at it are you? It's more like you are throwing generated data sets with abstract structures at it, and use the randomization part to ensure that it does not overfit on other accidental structures that might be in an individual image, because the randomization ensures that there are no other structures to speak of in the "average" (which does sound like a very sensible way to train a network on abstract structures). Or do I misunderstand the method here?
Oh for sure, and I don't mean to accuse the authors; if pop-sci articles spread confusion about their work that's not their fault. I just want to clear things up for myself