Self-information

From Wikipedia, the free encyclopedia

In information theory, self-information is a measure of the information content associated with the outcome of a random variable. It is expressed in the unit of information: the bit.

By definition, the amount of self-information contained in a probabilistic event depends only on the probability of that event: the smaller its probability, the larger the self-information associated with receiving the information that the event indeed occurred.

Further, by definition, the measure of self-information has the following property. If an event C is composed of two mutually independent events A and B, then the amount of information at the proclamation that C has happened, equals the sum of the amounts of information at proclamations of event A and event B respectively.

Taking into account these properties, the self-information $I (ω n)$ (measured in bits) associated with outcome $ω n$ is:

$I(\omega_n) = \log_2 \left(\frac{1}{\Pr(\omega_n)} \right) = - \log_2(\Pr(\omega_n))$

This definition, using the binary logarithm function, complies with the above conditions.

This measure has also been called surprisal, as it represents the "surprise" of seeing the outcome (a highly probable outcome is not surprising). This term was coined by Myron Tribus in his 1961 book Thermostatics and Thermodynamics.

The information entropy of a random event is the expected value of its self-information.

Self-information is an example of a proper scoring rule.

[edit] Examples

On tossing a coin, the chance of 'tail' is 0.5. When it is proclaimed that indeed 'tail' occurred, this amounts to

I('tail') = log₂ (1/0.5) = log₂ 2 = 1 bits of information.

When throwing a die, the probability of 'four' is 1/6. When it is proclaimed that 'four' has been thrown, the amount of self-information is

I('four') = log₂ (1/(1/6)) = log₂ (6) = 2.585 bits.

When, independently, two dice are thrown, the amount of information associated with {throw 1 = 'two' & throw 2 = 'four'} equals

I('throw 1 is two & throw 2 is four') = log₂ (1/P(throw 1 = 'two' & throw 2 = 'four')) = log₂ (1/(1/36)) = log₂ (36) = 5.170 bits.
This outcome equals the sum of the individual amounts of self-information associated with {throw 1 = 'two'} and {throw 2 = 'four'}; namely 2.585 + 2.585 = 5.170 bits.