Statistics & Bayesian probability
This Chapter describes the major concepts from statistics and Bayesian probability theory. It provides extra sections on statistical models and on the computational algorithms that are shared by both domains. There is also a section on information geometry, with rigourous mathematical descriptions of the intrinsic properties shared by both domains, using the coordinate-free and domain-independent language of differential geometry.
Statistics
Statistics is that part of mathematics that models stochastic systems, i.e., a world where the outcome of events is not deterministically predictable but can only be described in terms of probabilities. Statistics is a very rich and extensive research domain, but has never received a thorough axiomatic basis; it's much more of a large toolbox of definitions, algorithms and procedures.
Bayesian probability theory
This is a full paradigm (with axiomatic basis) for information processing under uncertainty, which shares a lot of concepts, definitions and algorithms with statistics, but places its own fundamental emphases: invariance under coordinate and other transformation, the importance of prior and model knowledge for all inference problems, and the central place of Bayes' rule for inference.
Information geometry
This domain is based on the work of people such as Cencov, Amari, Campbell, …, that describes Bayesian information processing with concepts and terminology from differential geometry. The goal of this quite theoretic approach is to emphasize the intrinsic properties of information processing (in a fully coordinate-independent way), as well as the common properties of the wide range of existing inference algorithms. (Both statistics and Bayesian theory have a rather poor tradition to try to find and highlight the common fundamentals of their concepts and tools.)
Graphical models
Complex systems contain dozen, hundreds or even thousands of variables, that all influence the system's behaviour in one way or another. That means that a stochastic model of the system requires probability density functions with all these variables as parameters, which quickly leads to computationally intractable and humanly unintelligible mathematical representations of the system. Since a couple of decennia, researchers have introduced graphical representations for the interdependencies of all the variables in the system model.
Computational algorithms
Only the simplest, low-dimensional systems have inference problems that can be solved exactly and efficiently. In most practical cases, many approximations must be introduced, which leads to a large set of computational algorithms.