## Welcome

Welcome to PPS, workshop on probabilistic programming semantics, on Tuesday, 17 January 2017, colocated right before POPL. This informal workshop aims to bring programming-language and machine-learning researchers together to advance the semantic foundations of probabilistic programming.

We are delighted that Gordon Plotkin has accepted our invitation to give a talk “Towards a metric semantics for probabilistic programming“. In addition, as listed on the posted schedule, we have accepted 21 extended abstracts submitted by a wide range of contributors. We accepted 10 submissions as posters and 11 as talks, not on the basis of reviewer scores but based on which medium we thought would be most effective in conveying the material. So, some highly ranked submissions that are more technical in nature are accepted as posters.

To foster collaboration and establish common ground, we ask all accepted contributors to post their revised extended abstracts on this site, along with any other materials such as preprints they want to share.

Everyone is welcome to post comments, questions, and other discussion on the posts. Because probabilistic programming is a research area that bridges multiple communities with different vocabularies, comments of the flavor “I don’t understand what you mean by X” are particularly valuable!

Posted in Uncategorized | Comments Off on Welcome

## An Exponential Family Basis for Probabilistic Programming

Many common distributional families take the form of exponential families.
The functional form of the conditional densities of these families gives them convenient compositional properties, yielding opportunities for powerful reasoning and optimization.

We present a Haskell-based approach for expressing probabilistic models in terms of free arrows, over a basis of exponential families. Arrows are more restrictive than the more common monadic approach, but this sacrifice in expressiveness is balanced with broader opportunities for inference, for example in terms of the dependency graph of random variables. Moreover, any monadic inference method is easily applied to arrow-based models.

The extended abstract is available at http://pps2017.sice.indiana.edu/arrow-ppl/.

Posted in Uncategorized | 1 Comment

## Encapsulating models and approximate inference programs in probabilistic modules

We introduce the ‘probabilistic module’ interface, which allows encapsulation of complex probabilistic models with latent variables alongside custom stochastic
approximate inference machinery, and provides a platform-agnostic abstraction
barrier separating the model internals from the host probabilistic inference system.
The interface can be seen as a stochastic generalization of a standard simulation
and density interface for probabilistic primitives. We show that sound approximate
inference algorithms can be constructed for networks of probabilistic modules, and
we demonstrate that the interface can be implemented using learned stochastic
inference networks and MCMC and SMC approximate inference programs.

The interface has two procedures, ‘simulate’ and ‘regenerate’:

• ‘Simulate’ runs the original stochastic computation being encapsulated and returns the output value along with a harmonic mean estimate of the output density at the returned value.
• ‘Regenerate’ proposes an execution history of the computation that is responsible for a given output, and returns a importance sampling estimate of the output density at the given value.

The two estimators must be defined using the same auxiliary probabilistic computation that samples execution histories of the original computation, given the output. This proposal program is called the ‘regenerator’. The optimal regenerator samples from the conditional distribution on execution histories given output. Generally, we resort to suboptimal, approximate regenerators. Regenerators can be learnt from data or constructed from stochastic approximate inference algorithms such as sequential Monte Carlo.

Encapsulating a stochastic computation such as a simulator for a generative probabilistic model with an accurate regenerator serves to ‘approximately collapse out’ the latent variables in the computation. As the accuracy of the regenerator increases, the module behaves more like a collapsed computation with an available density.

Poster [pdf]

Paper [pdf]

In a related work, we show that the same interface is sufficient for estimating bounds on the KL divergence between the output distributions of stochastic computations, and we apply this to estimate the approximation error of SMC approximate inference samplers including those using MCMC transitions.

The interface is also related to the ‘stochastic procedure’ interface used in the Venture probabilistic programming platform. There are also connections to pseudo-marginal MCMC and particle MCMC.

Authors: Marco Cusumano-Towner, Vikash Mansinghka

Posted in Uncategorized | Comments Off on Encapsulating models and approximate inference programs in probabilistic modules

## Probabilistic Logic Programs: Unifying Program Trace and Possible World Semantics

While statistical relational learning (SRL) and probabilistic programming (PP) both develop rich representation languages and reasoning tools for probabilistic models that naturally deal with a variable number of objects as well as the relationships amongst them, in the past 5 to 8 years SRL and PP have been studied almost in isolation and now have a quite different focus. In probabilistic programming, the focus is on functional and imperative programs, on modeling continuous random variables, on (Markov Chain) Monte-Carlo techniques for probabilistic inference, and on a Bayesian machine learning perspective. In contrast, in statistical relational artificial intelligence, the focus is on logical and database representations, on discrete distributions, on knowledge compilation and lifted inference (reasoning at an abstract level — without having to ground out variables over domains), and on learning the structure of the model.

We argue that probabilistic logic programming (PLP), whose rich history goes back to the early 90s with results by David Poole and Taisuke Sato, is uniquely positioned in that it naturally connects these two views into a single formalism with rigorously defined semantics, and thus opens up ways to bridge the gap between the two communities and to connect their results. More specifically, in this note we show how probabilistic logic programs possess not only a possible world semantics as common in SRL but also a program trace semantics as common in PP, which they inherit from traditional logic programs.  Furthermore, as extensions of Prolog, probabilistic logic programming languages are Turing equivalent, a property that they share with probabilistic programming languages extending traditional functional or imperative programming languages, and which distinguishes them from many of the statistical relational learning formalisms such as probabilistic databases and  Markov Logic.

Authors: Angelika Kimmig and Luc De Raedt

Posted in Uncategorized | Comments Off on Probabilistic Logic Programs: Unifying Program Trace and Possible World Semantics

## The semantics of Subroutines and Iteration in the Bayesian Programming language ProBT

Authors: R. Laurent, K. Mekhnacha, E. Mazer and P. Bessière

Abstract: Bayesian models are tools of choice when solving problems with incomplete information. Bayesian networks provide a first but limited approach to address such problems. For real world applications, additional semantics is needed to construct more complex models, especially those with repetitive structures or substructures. ProBT, a Bayesian a programming language, provides a set of constructs for developing and applying complex models with substructures and repetitive structures.
The goal of this paper is to present and discuss the semantics associated to these constructs.

PPS_2017_Laurent_et_al_semantics_in_ProBT

Posted in Uncategorized | Comments Off on The semantics of Subroutines and Iteration in the Bayesian Programming language ProBT

## GraPPa: Spanning the Expressivity vs. Efficiency Continuum

An important trade-off in the design of PPLs is between the expressiveness of the language of generative models and the power and efficiency of the inference methods. This trade-off between expressiveness and efficiency requires users to choose a different language, and associated toolchain, de- pending on the particular problem they are trying to solve. Additionally, even knowing where a particular problem lies on the continuum between expressiveness and efficiency, or where it is likely to lie as the problem evolves, can require a non-trivial amount of expertise and understanding of machine learning.

In this talk, we will describe ongoing work on GraPPa, the Galois Probabilistic Programming language, that addresses this concern. GraPPa, which is implemented as an embedded domain- specific language (EDSL) in Haskell, is a single PPL that allows users to choose where each model lies on the continuum between expressiveness and efficiency, simply by choosing what sorts of random variables to use in a model. The key technical idea that enables this approach is an encoding, in the type of each model, of the set of random variables and associated distributions used in that model. This approach is compositional, meaning that a model with random variables in one set can be combined with a model with random variables in another set, and the type of the resulting model will contain the union of the two sets.

See our full extended abstract for details: grappa-pps-2017

Posted in Uncategorized | Comments Off on GraPPa: Spanning the Expressivity vs. Efficiency Continuum

## Metropolis-Hastings for Mixtures of Conditional Distributions

Models with embedded conditioning operations — especially with conditioning within conditional branches — are a challenge for Monte-Carlo Markov Chain (MCMC) inference. They are out of scope of the popular Wingate et al. algorithm or many of its variations. Computing the MCMC acceptance ratio in this case has been an open problem.

We demonstrate why we need such models. Second, we derive the acceptance ratio formula. The corresponding MH algorithm is implemented in the Hakaru10 system, which thus can handle mixtures of conditional distributions.

PPS2017

Posted in Uncategorized | Comments Off on Metropolis-Hastings for Mixtures of Conditional Distributions

## Exchangeable Random Processes and Data Abstraction

What is the right notion of module for probabilistic programs?

Purely probabilistic program modules (i.e., those not using internal state) are exchangeable: calls can be commuted and discarded without changing the behaviour.

In general, memory access is neither commutative nor discardable. However, there are many interesting probabilistic program modules that involve state privately, but from the perspective of the user are exchangeable. A source of examples are the Exchangeable Random Processes (XRPs), such as Polya’s urn implementation of the Beta-Bernoulli process.

Suppose a module’s external behavior is exchangeable. Does that mean that there is an implementation of that module that is purely probabilistic, i.e., without using state even internally?

Authors: Nate Ackerman, Cameron Freer, Dan Roy, Sam Staton, Hongseok Yang.

Extended abstract: [PDF]

Posted in Uncategorized | Comments Off on Exchangeable Random Processes and Data Abstraction

## Mathematical structures of probabilistic programming

The design of operational and denotational models for programming languages has historically been a rather post-hoc endeavour, justified by practical concerns such as the validation of program transformations. If one rather views algorithms as “the idiom of modern science” (http://www.cs.princeton.edu/~chazelle/pubs/algorithm.html), then formal semantics is the precondition for models as programs to be sound objects of mathematical scrutiny. Dually, the structures naturally arising in the mathematical universe of the domain of interest should inform the design of programming languages. In this abstract, we take a look from this perspective to probabilistic programming and see how our work on on a category-theoretical approach to measure theory and statistics is relevant to this blooming field. We focus on two recent contributions, namely pointless Bayesian inversion [1,2] and natural transformations in probability and statistics [3,4,5].

## Pointless Bayesian inversion

Following Kozen (Semantics of probabilistic programs, J. Comput. Syst. Sci.) and Moggi (Notions of computation and monads, Information and Computation), terms in probabilistic programs can be given a denotational semantics as arrows in the Kleisli category of the Giry monad, i.e. kernels. Mathematical models of probabilistic programming have all relied on this pointful, kernel-centric view—including for the key operation in Bayesian learning, namely Bayesian inversion.
We argue that a pointless, operator-based approach to Bayesian inversion is both more general, simpler and offers a more structured view of Bayesian machine learning.

Culbertson & Sturtz (A categorical foundation for Bayesian probability, Applied Categorical Structures) present Bayesian inversion as a mapping associating each pair consisting of a prior $p$ and a likelihood $f$ with $(p, f) \in G(H) \times (H \rightarrow G(D))$ to a pair $(q, f^\dagger) \in G(D) \times (D \rightarrow G(H))$ consisting of the posterior $f^\dagger$ and the marginal likelihood $q$ (which is just the pushforward of $p$ through $f$). The kernel $f^\dagger$ is characterised as the $q$-almost surely unique one satisfying an infinitary generalisation of Bayes’ law. We identify the following problems with this approach:

• it relies on the disintegration theorem, hence on rather strong topological assumptions on the underlying spaces;
• it is unstructured: the map from $(p, f)$ to $(q, f^\dagger)$ is not even a function, as $f^\dagger$ is only defined $q$-almost surely.

In [1,2], we establish that this operation can be recast as the familiar operation of adjunction of operators. This is formalised in a categeory of $\omega$-complete cones following earlier work (see Selinger, Towards a semantics for higher-order quantum computation and Chaput et al., Approximating Markov processes by averaging). We construct a functorial representation from a category of typed kernels to a category of linear operators (viewing these kernels as generalised predicate transformers, as in (Kozen, A Probabilistic PDL). We show that pointful inversion coincides with adjunction whenever the former is defined (e.g. on standard Borel spaces). This move presents the following advantages:

1.  no topological assumptions are required;
2. adjunction is structured: it corresponds to an equivalence of categories between so-called abstract Markov kernels and Markov operators.

Besides being a ground for denotational semantics, we foresee that this pointless setting will be amenable to the application of the approximation techniques for Markov processes developed
in Approximating Markov processes by averaging to Bayesian learning.

## Natural transformations in probability and statistics

Modelling effects functorially, as in Moggi’s approach to computational lambda calculi, makes the study of natural transformations between those functors a useful source of potential programming primitives. In [3,4] we constructed a general theorem (called the Machine) allowing to reduce the proof of existence of natural transformations between a wide family of endofunctors of the category of Polish spaces to their components at finite spaces. These transformations include the monadic data of the Giry functor, the $iid$ distribution, the normalisation of a nonzero measure to a probability, the Poisson point process, the Dirichlet process and last but not least, the de Finetti representation theorem. Further, one of our results in [4] called rigidity gives a sufficient condition for there to be at most one natural transformation between two well-chosen functors. We believe the Machine is relevant to probabilistic programming, as it eases the exploration of potential primitives to be added in existing or future languages. On this matter, we derived in [5] some results which establish the basis for a compositional language of natural transformations seen as robustly parameterised probabilistic models. Further, we expect that our techniques can address questions of computability of such natural stochastic processes by reducing them to the finite-dimensional case.

## Outlook

Ongoing work includes importing more results of measure theory in our framework, so as to bridge the Machine with our cone-theoretic developments. Our long term plan is to obtain a cohesive category-theoretical view of measure theory and statistics, where both semantics and a sound theory of approximation for probabilistic models coexist.

## References

[1] Bayesian inversion by $\omega$-complete cone duality (CONCUR 2016)

[2] Pointless learning (accepted to FOSSACS 2017, draft available here)

[3] Dirichlet is natural (MFPS 31)

[4] Giry and the Machine (MFPS 32)

[5] Robustly parameterised higher-order probabilistic models (CONCUR 2016)

Posted in Uncategorized | Comments Off on Mathematical structures of probabilistic programming

## Support and influence analysis for visualizing posteriors of probabilistic programs

A common way to interpret the results of any computational model is to visualize its output. For probabilistic programming, this often means visualizing a posterior probability distribution. The webppl language has a visualization library called webppl-viz that facilitates this process. A useful feature of webppl-viz is that it does some amount of automatic visualization—the user simply passes in the posterior and the library tries to construct a useful visual representation of it. For instance, consider this posterior, with a discrete component b and continuous component m:

var dist = Infer(
{method: 'MCMC', samples: 1000},
function () {
var b = flip(0.7)
var m = gaussian(0, 1)
var y = gaussian(m, b ? 4 : 2)
condition(y > 0.3)
return {b: b, m: m}
});


We visualize it by calling viz(dist), which gives us this picture:

This is a reasonable choice. There are two density curves for m—an orange curve for when b is true, and a blue curve for when b is false. webppl-viz often produces helpful graphs but, as I will show, it can also produce graphs with obvious flaws. One reason for this is that webppl-viz defines a limited set of variable types for visualization and it uses heuristics to guess the types particular components in a posterior sample. Another issue is that webppl-viz does not scale well with the number of components in a posterior. There are only a handful of ways to visually encode data, and webppl-viz gives up if the dimensionality of the posterior exceeds the number of available visual channels. In this article, I argue that methods from programming languages research suggest solutions to these two problems.

Extended abstract: Ouyang abstract

Posted in Uncategorized | Comments Off on Support and influence analysis for visualizing posteriors of probabilistic programs

## Reasoning about Inference in Probabilistic Programs

Some of the fundamental errors that occur in probabilistic programs are due
to incorrect statistical models, ignoring dependence between random variables,
or, using wrong hyper-parameters (such as sample size) for inference. To prevent these errors, the solution most probabilistic programming languages adopt is to keep inference external to the core language semantics. While this limitation avoids potential pitfalls, it prevents programmers
from invoking inference at arbitrary points in the program and composing the
results.

However, designers of large systems often need to use inference for practical
reasons, such as, to efficiently transmit a probability distribution across the
network or to decompose an intractable inference problem into
smaller pieces.

We present a programming model, FLEXI, that extends a typical probabilistic programming
language with new constructs for invoking inference. To debug problems that arise from the use of inference, we use a Decorated Bayesian network to represent the program’s distribution and its inference operations. The representation lets us detect and explain bugs from two categories: approximation and dependence.

List of Authors: Chandrakana Nandi, Adrian Sampson, Dan Grossman, Todd Mytkowicz, Kathryn S. McKinley