>>13321780You have good intuitions here.
To drive this home more deeply, the negative log likelihood is literally a special case of policy gradients. And you can basically frame any machine learning objective function into an RL one if you fiddle with it enough.
More broadly however, the key to understanding RL is thinking about what we really are doing when we are optimizing.
In standard supervised learning, we construct an objective function, and the system optimizes that function. However we don't really care about the objective function, we care about performance on a given task. To put this concretely, you don't really give two shits about the value of the objective function of your trained neural network, you care about its accuracy. Generally speaking you can't derive a gradient for "accuracy", so you use a proxy objective function.
Reinforcement learning is special because it avoids this problem. You are indeed directly optimizing on task performance, not a proxy measurement. This might not seem like a big deal, but it is very significant.
I agree with your notion that an AGI will be modular. And in my mind that just strengthens the arguments for RL. The "central AGI model" can dole out rewards to the subsystems based on their contribution to task performance. You don't need to waste time tinkering with getting the right objective functions.
The other part of this that RL really highlights, is that any "intelligent agent" is ultimately defined by their environment, or their task. I don't think it would be possible to define an objective function for AGI, but it is conceivable that we can design environments that only something with general intelligence could thrive in. This makes the problem extremely more tractable.
I am basically in agreement with you, but its just these little nuances that make all the difference for me. And in general I do think representation learning is a valuable line of research.