>>12498165Consider a probability distribution p_1, ..., p_n
-log(pi) is can be interpreted as a degree of surprise - the lower the probability the higher the surprise
Entropy is the expected/mean surprise of the distribution.
The least informative the distribution, the higher the entropy.
Entropy is maximum when all p_i are equal (uniform distribution)
This gets generalized for infinite/continuous distribution in pretty much the same way.