Because protocols need to be changed, improved, and fixed from time to time, it is essential to have a protocol negotiation step at the start of every networked interaction, and protocol requirements at the start of every store and forward communication.

But we also want anyone, anywhere, to be able to introduce new protocols, without having to coordinate with everyone else, as attempts to coordinate the introduction of new protocols have ground to a halt, as more and more people are involved in coordination and making decisions. The IETF is paralyzed and moribund.

So we need a large enough address space that anyone can give his protocol an identifier without fear of stepping on someone else's identifier. But this involves inefficiently long protocol identifiers, which can become painful if we have lots of protocol negotiation, where one system asks another system what protocols it supports. We might have lots of protocols in lots of variants each with long names.

So our system forms a guess as to the likelihood of a protocol, and then sends or requests enough bits to reliably identify that protocol. But this means it must estimate probabilities from limited data. If one's data is limited, priors matter, and thus a Bayesian approach is required.

The Bayesian prior is the probability of a probability, or, if this recursion is philosophically troubling, the probability of a frequency. We have an urn containing a very large number of samples, from which we have taken no or few samples. What proportion of samples in the urn will be discovered to have property X?

Let our prior estimate of probability that the proportion of
samples in the urn that are X is ρ be Ρ_{prior}(ρ)

Then our prior estimate of the chance P_{X}
that the
first sample will be X is

P_{X} = ∫ Ρ_{prior}(ρ) ×
dρ

Then if we take one sample out of the urn, and it is indeed
X, then we update our prior by:

P_{new}(ρ)
= ρ × P_{prior}(ρ) /
P_{X}

The Beta distribution is

P_{αβ} (ρ)= ρ^{(α-1)}
×
ρ^{(β-1)} / B(α,β)

B(α,β) is the normalization ∫ ρ^{(α-1)}
×
ρ^{(β-1)} × dρ

B(α,β) = Γ(α) × Γ(β) / Γ(α + β)

Γ(α) = (α - 1)! for positive integer α

It is convenient to take our prior to be a Beta distribution, for if our prior the proportion of samples that are X is the Beta distribution α,β, and we take three samples, one of which is X, and two of which are not X, then our new distribution is the Beta distribution α+1,β+2

If our distribution is the Beta distribution α,β, then
the probability that the next sample will be X is

α/(α+β)

If α and β are large, then the Beta distribution approximates a delta function

If α and β equal 1, then the Beta distribution assumes all probabilities equally likely.

If α and β are very small, then the then the Beta distribution assumes that chances are that the distribution is extreme - that things will be almost all X, or almost no X.

α and β must be greater than zero.

The principle of maximum entropy tell us to choose our prior to be α=1, β=1, but in practice, we usually have some reason to believe all samples are alike, so need a prior that weights this possibility. Realistically, there is a finite probability that all samples are X, or not X, but no beta function describes this case.

If our prior for the question “what proportion of men are mortal?” was a beta distribution, we would not be convinced that all men are mortal until we had first checked all men – thus a beta distribution is not always a plausible prior.

The weight of evidence is the inverse of entropy
of P(ρ) – the integral
∫Ρ_{prior}(ρ) × ln[Ρ_{prior}(ρ)]
× dρ the lower the entropy, the more we know about the
distribution P(ρ), hence the principle of maximum
entropy – that our distribution should faithfully represent
the weight of our evidence, no stronger and no weaker.

The principle of maximum entropy leaves us with the question
of what counts as evidence. To apply, we need to take
into account *all* evidence, and everything in the
universe has some relevance.

Thus to answer the question “what proportion of men are mortal” the principle of maximum entropy, naively applied, leads to the conclusion that we cannot be sure that all men are mortal until we have first checked all men. If, however, we include amongst our priors the fact that all men are kin, then that all men are X, or no men are X has to have a considerably higher prior weighting than the proposition that fifty percent of men are X.

The Beta distribution is mathematically convenient, but unrealistic. That the universe exists, and we can observe it, already gives us more information than the uniform distribution, thus the principle of maximum entropy is not easy to apply.

These documents are licensed under the Creative Commons Attribution-Share Alike 3.0 License