>>14148465A lot of modern machine learning techniques are based on having multiple independent networks in parallel to process the input data in different ways to extract different information, and then other networks downstream that combine the data from the preceding networks to accomplish some task. Each of these sub networks can be (at least in part) trained separately, and have its specific shape and training technique optimized for its sub task. The Idea is that these segmented, hand crafted topologies can be trained much faster than monolithic networks trained from the start to accomplish something really abstract, complicated and difficult to quantify.
There's a lot of wizardry involved in crafting good datasets, since it's often impossible to get sufficiently large hand labelled sets, and for certain problems there isn't really anything to label, so alternative techniques are needed. A lot of work has gone into finding the specific advantages and weaknesses of each alternative technique. I remember a "curiosity" based learning system that was rewarded whenever it received novel input, which performed remarkably well at hard to score tasks, but could be thrown completely off by any source of random input.
My point is, a lot of advancement has been made. Not just by running larger networks on more powerful computers, but in exploring the specific strengths and weaknesses of each network architecture, how best to combine them and how to train them. We're not just throwing giant labelled datasets at giant convolutional classifier networks.