# Mathematical structures of probabilistic programming

The design of operational and denotational models for programming languages has historically been a rather post-hoc endeavour, justified by practical concerns such as the validation of program transformations. If one rather views algorithms as “the idiom of modern science” (http://www.cs.princeton.edu/~chazelle/pubs/algorithm.html), then formal semantics is the precondition for models as programs to be sound objects of mathematical scrutiny. Dually, the structures naturally arising in the mathematical universe of the domain of interest should inform the design of programming languages. In this abstract, we take a look from this perspective to probabilistic programming and see how our work on on a category-theoretical approach to measure theory and statistics is relevant to this blooming field. We focus on two recent contributions, namely pointless Bayesian inversion [1,2] and natural transformations in probability and statistics [3,4,5].

## Pointless Bayesian inversion

Following Kozen (Semantics of probabilistic programs, J. Comput. Syst. Sci.) and Moggi (Notions of computation and monads, Information and Computation), terms in probabilistic programs can be given a denotational semantics as arrows in the Kleisli category of the Giry monad, i.e. kernels. Mathematical models of probabilistic programming have all relied on this pointful, kernel-centric view—including for the key operation in Bayesian learning, namely Bayesian inversion.
We argue that a pointless, operator-based approach to Bayesian inversion is both more general, simpler and offers a more structured view of Bayesian machine learning.

Culbertson & Sturtz (A categorical foundation for Bayesian probability, Applied Categorical Structures) present Bayesian inversion as a mapping associating each pair consisting of a prior $p$ and a likelihood $f$ with $(p, f) \in G(H) \times (H \rightarrow G(D))$ to a pair $(q, f^\dagger) \in G(D) \times (D \rightarrow G(H))$ consisting of the posterior $f^\dagger$ and the marginal likelihood $q$ (which is just the pushforward of $p$ through $f$). The kernel $f^\dagger$ is characterised as the $q$-almost surely unique one satisfying an infinitary generalisation of Bayes’ law. We identify the following problems with this approach:

• it relies on the disintegration theorem, hence on rather strong topological assumptions on the underlying spaces;
• it is unstructured: the map from $(p, f)$ to $(q, f^\dagger)$ is not even a function, as $f^\dagger$ is only defined $q$-almost surely.

In [1,2], we establish that this operation can be recast as the familiar operation of adjunction of operators. This is formalised in a categeory of $\omega$-complete cones following earlier work (see Selinger, Towards a semantics for higher-order quantum computation and Chaput et al., Approximating Markov processes by averaging). We construct a functorial representation from a category of typed kernels to a category of linear operators (viewing these kernels as generalised predicate transformers, as in (Kozen, A Probabilistic PDL). We show that pointful inversion coincides with adjunction whenever the former is defined (e.g. on standard Borel spaces). This move presents the following advantages:

1.  no topological assumptions are required;
2. adjunction is structured: it corresponds to an equivalence of categories between so-called abstract Markov kernels and Markov operators.

Besides being a ground for denotational semantics, we foresee that this pointless setting will be amenable to the application of the approximation techniques for Markov processes developed
in Approximating Markov processes by averaging to Bayesian learning.

## Natural transformations in probability and statistics

Modelling effects functorially, as in Moggi’s approach to computational lambda calculi, makes the study of natural transformations between those functors a useful source of potential programming primitives. In [3,4] we constructed a general theorem (called the Machine) allowing to reduce the proof of existence of natural transformations between a wide family of endofunctors of the category of Polish spaces to their components at finite spaces. These transformations include the monadic data of the Giry functor, the $iid$ distribution, the normalisation of a nonzero measure to a probability, the Poisson point process, the Dirichlet process and last but not least, the de Finetti representation theorem. Further, one of our results in [4] called rigidity gives a sufficient condition for there to be at most one natural transformation between two well-chosen functors. We believe the Machine is relevant to probabilistic programming, as it eases the exploration of potential primitives to be added in existing or future languages. On this matter, we derived in [5] some results which establish the basis for a compositional language of natural transformations seen as robustly parameterised probabilistic models. Further, we expect that our techniques can address questions of computability of such natural stochastic processes by reducing them to the finite-dimensional case.

## Outlook

Ongoing work includes importing more results of measure theory in our framework, so as to bridge the Machine with our cone-theoretic developments. Our long term plan is to obtain a cohesive category-theoretical view of measure theory and statistics, where both semantics and a sound theory of approximation for probabilistic models coexist.

## References

[1] Bayesian inversion by $\omega$-complete cone duality (CONCUR 2016)

[2] Pointless learning (accepted to FOSSACS 2017, draft available here)

[3] Dirichlet is natural (MFPS 31)

[4] Giry and the Machine (MFPS 32)

[5] Robustly parameterised higher-order probabilistic models (CONCUR 2016)

This entry was posted in Uncategorized. Bookmark the permalink.