Can someone help me understand Shannon Entropy?
So I get the basic concept, that entropy is a measure of all the possible outcomes of a message.
However, the books I've read next go into the surprise of a message. English doesn't have randomly distributed letters, so the actual surprise of an English language message isn't 26^n possible messages, since some letters tend to follow others.
The problem I have conceptualizing this, is that then the text went into how letters tend to appear in words, and words tend to appear together as well, so surprise is actually even lower.
This got me thinking, isn't suprise also totally reliant on what you know about who you're getting the message from?
>"Fuck, I just tapped to traps again"
Has little surprise value on /r9k/, but very high value when it is delivered by a pastor at church.
In a deterministic universe, the God's eye view of things would see zero surprise in any message, and the whole idea of physics as information starts to seem silly.
The reason I think I might be getting it wrong is because a passage basically said you could get your distribution of letters wrong, and have the wrong value for surprise. But to my mind, there is no "right" value of surprise below 0. If you know exactly who is talking to you and what they are about to say, it's a lot different from randomly generated text.
Second, how does information science deal with synonyms. I can't find this anywhere. There are tons of ways to say the exact same thing in English with different words. Entropy can be a total measure of possible messages, but it can't be a total measure of possible information when the same length string can represent the same exact idea multiple times in its possible configurations, right? I still get the concept is useful, it just doesn't seem like the hard limit it is described as.
So I get the basic concept, that entropy is a measure of all the possible outcomes of a message.
However, the books I've read next go into the surprise of a message. English doesn't have randomly distributed letters, so the actual surprise of an English language message isn't 26^n possible messages, since some letters tend to follow others.
The problem I have conceptualizing this, is that then the text went into how letters tend to appear in words, and words tend to appear together as well, so surprise is actually even lower.
This got me thinking, isn't suprise also totally reliant on what you know about who you're getting the message from?
>"Fuck, I just tapped to traps again"
Has little surprise value on /r9k/, but very high value when it is delivered by a pastor at church.
In a deterministic universe, the God's eye view of things would see zero surprise in any message, and the whole idea of physics as information starts to seem silly.
The reason I think I might be getting it wrong is because a passage basically said you could get your distribution of letters wrong, and have the wrong value for surprise. But to my mind, there is no "right" value of surprise below 0. If you know exactly who is talking to you and what they are about to say, it's a lot different from randomly generated text.
Second, how does information science deal with synonyms. I can't find this anywhere. There are tons of ways to say the exact same thing in English with different words. Entropy can be a total measure of possible messages, but it can't be a total measure of possible information when the same length string can represent the same exact idea multiple times in its possible configurations, right? I still get the concept is useful, it just doesn't seem like the hard limit it is described as.