How to learn from experience
Bayesian probability theory is a quantification of how we, as rational creatures, actually respond to situations
What is the probability that, if you toss a coin, it will land “heads”? What is the probability that it will rain today? What is the probability that India will win the next cricket series in Australia? What is the probability that there is life on Mars?
These are some of the ways we use the word “probability” in everyday life. To statisticians, some of these usages are meaningful, some less so, and some are very contentious; the literature abounds in colourful language. In recent years, ‘Bayesian' statisticians have gained an upper hand in this debate over their better-established ‘frequentist' colleagues. With contributions from physicists, biologists, economists and others, Bayesian probability has ushered in a revolution in the application of probability to everyday life. So what is Bayesian probability, and why should you care?
In the previously-dominant “frequentist” view, the probability of an event is the number of “successes” in a large number of “trials.” So one can toss a coin many times to learn that “head” occurs 50 per cent of the time. But there is only one planet like Mars, so repeated trials are impossible, and a “probability,” in the traditional view, is a meaningless concept. Rainy days and cricket matches are repeated, but not under identical circumstances, so it is again hard to talk of probabilities in the frequentist language.
But we all need to make judgment calls in non-repeatable situations — should I carry an umbrella? Will the bowler's next ball be a yorker? Should a doctor start antibiotic treatment without waiting for a culture report? A frequentist may give up, but we cannot: we tend to trust a “gut instinct” about such things. The goal of the Bayesian approach, one can say, is to formalise that “gut instinct.” To a Bayesian, probability is something that reflects a degree of belief in any proposition, repeatable or not.
Imagine that you were allowed to toss the coin, not hundreds of times, but only ten times. You find that six of these are heads. Do you then conclude that the probability of heads was 6/10, or 60 per cent? Most people would not, because they have a strong prior belief that the coin is fair, and 6/10 is a credible result from a fair coin. If 8/10 or 9/10 were heads, most people would start doubting the fairness of the coin.
A Bayesian would allow you to put a number to your “prior belief” that the coin is fair, and use it to calculate a “posterior belief” after tossing the coin. We do that with a formula called “Bayes' theorem,” named after an 18th-century clergyman who discovered a limited form of it; it was generalised by the 19th-century mathematician Laplace. Remarkably, this is the only way to update beliefs if we insist that our reasoning should conform to some minimal notions of rationality.
The chief frequentist criticism of Bayesian probability is that this “prior,” which quantifies our belief, is subjective — since it exists “prior” to the data under consideration. At the start of an India-Bangladesh match, with India winning the toss and Sehwag and Tendulkar in to bat, our belief may be very strong that India is going to win the match. However, down 15 overs, with all the stars gone and not many runs on the board, our belief in an Indian victory may be considerably reduced. In Bayesian terms, this sequence of states of belief involves the prior (our belief before the match started), the data (the course of the match till 15 overs) and the posterior (our modified state of belief, taking into account both the prior and the data, calculated using Bayes' theorem).
How do we assign a prior? As an Indian supporter, your belief in an Indian victory might be considerably different from a die-hard Bangladeshi supporter, who believes very strongly in a David slaying a Goliath. It is this apparent ‘subjectivity' in assigning prior beliefs that has been the bone of contention between Bayesians and frequentists. Bayesians would argue that even a subjective prior is better than no prior (or “uniform” priors).
In many situations, however, there are two powerful principles, known as the principle of indifference and principle of maximum entropy, that ensure that people with identical states of knowledge will assign the same prior degrees of belief. Entropy quantifies the information in a system: the rule is that we should choose priors such that we maximise our ignorance, subject to “constraints” (the things we already know). This idea sounds logical today, but was highly contentious when it was introduced by E.T. Jaynes in the 1950s. The principle of indifference says that when we know nothing about the alternatives, we should give the equal prior probabilities — this is just a special case of maximum entropy principle.
The assignment of priors, then, is not arbitrary and subjective, but does require us to declare very carefully, what our current state of knowledge is about the proposition under discussion. A prior is not based on an absence of data: it is based on the entire life experience of the person making the judgment. Moreover, the posterior probabilities that we calculate can then serve as the priors the next time we see a similar situation. In a way, Bayesian probability theory is a quantification of how we, as rational creatures, actually respond to situations: we do not treat each situation as new, but use everything we have already learned to make decisions.
The applications of Bayesian probability and the maximum entropy principle are many. Cell phones use it to predict typed text, Google uses it for its translation and transliteration programs (and to filter your spam), astronomers use it to clean up images from faint distant galaxies, and biologists use it to find genes in sequences of millions of amino acids. Bayesian probability can harness powerful computer algorithms, first invented for studying nuclear reactions, to search the human genome to look search for mutations that cause deadly diseases like cancer.
Laplace said that probability theory is “common sense reduced to calculation.” The modern Bayesians have shown how it can be done.
(The authors are in the physics group at the Institute of Mathematical Sciences, Chennai)