>>14186721Consider this:
A typical midrange network these days can be trained on a graphics card and requires cards with least 10GB of VRAM. The larger the network you want to train the more memory is needed. The state of the network is then loaded between VRAM and GPU-core at easily over 100GB/s on a high end card, with a latency of a few hundred nanoseconds. Even trying to use system RAM through the PCIe bus with a bandwidth as high as 30GB/s is by comparison considered too slow to be worthwhile for working memory. A good residential internet connection goes as high as 1Gb/s or 0.125GB/s, with a latency maybe as low as a few milliseconds if you're accessing a fairly local server. This is many orders of magnitude slower. The latency especially will cause a lot of problems if you're trying to propagate back and forth through many separate computers. You could easily dip into seconds or even minutes per iteration.
Maybe with some very specialized architecture that divides the network into mostly isolated nodes it might kinda-sorta-maybe work. But why bother when you can buy processors these days that are specifically designed from the start to train giant networks really efficiently.