whats missing from machine learning is a roadmap of how to get to useful abstractions from the current mass of data that has been collected.
for humans image detection is a small part of a bigger pie we call understanding or knowledge. no current AI system has an analogous function like that, they are either doing verification proofs of a heuristic function, trying to fit observed data to some trained floating point number OR they have very well defined scopes with limited choices and allowed to develop interesting strategies through self-learning (playing a maximisation game with floor or ceiling functions based on some scoring system).
humans are biointegrated and have multiple sensations and feelings and embedded structural hierarchies of inputs and outputs. the richness of that entire experience must be a part of our intelligence, from feeling your skin to sensing a dull pressure behind your eyeballs to slightly shifting your weight and adjusting your posture. the type of machine complex enough to simulate all this competing information is beyond the scale of current technology, so then it's about useful abstractions. which bits do you need for intelligence? so far the field has agreed you probably could get away with just language, this is why computer languages developed in the way they did.
it's a giant gulf to cross from natural languages to machine languages, and work has been stalled there since the 1960s.
what AI is great at is human aid devices, like letting you see other spectrums of light or sensing distant vibrations or keying you up to a satellite and giving you brain internet. what it's terrible at is actual intelligence. if intelligence is the goal, so far AI has only shown the capacity to sharpen or dull the human intellect, depending on how you feel about load sharing technologies.