The functional form of the conditional densities of these families gives them convenient compositional properties, yielding opportunities for powerful reasoning and optimization.

We present a Haskell-based approach for expressing probabilistic models in terms of *free arrows*, over a basis of exponential families. Arrows are more restrictive than the more common monadic approach, but this sacrifice in expressiveness is balanced with broader opportunities for inference, for example in terms of the dependency graph of random variables. Moreover, any monadic inference method is easily applied to arrow-based models.

The extended abstract is available at http://pps2017.sice.indiana.edu/arrow-ppl/.

]]>approximate inference machinery, and provides a platform-agnostic abstraction

barrier separating the model internals from the host probabilistic inference system.

The interface can be seen as a stochastic generalization of a standard simulation

and density interface for probabilistic primitives. We show that sound approximate

inference algorithms can be constructed for networks of probabilistic modules, and

we demonstrate that the interface can be implemented using learned stochastic

inference networks and MCMC and SMC approximate inference programs.

The interface has two procedures, ‘simulate’ and ‘regenerate’:

- ‘Simulate’ runs the original stochastic computation being encapsulated and returns the output value along with a harmonic mean estimate of the output density at the returned value.
- ‘Regenerate’ proposes an execution history of the computation that is responsible for a given output, and returns a importance sampling estimate of the output density at the given value.

The two estimators must be defined using the same auxiliary probabilistic computation that samples execution histories of the original computation, given the output. This proposal program is called the ‘regenerator’. The optimal regenerator samples from the conditional distribution on execution histories given output. Generally, we resort to suboptimal, approximate regenerators. Regenerators can be learnt from data or constructed from stochastic approximate inference algorithms such as sequential Monte Carlo.

Encapsulating a stochastic computation such as a simulator for a generative probabilistic model with an accurate regenerator serves to ‘approximately collapse out’ the latent variables in the computation. As the accuracy of the regenerator increases, the module behaves more like a collapsed computation with an available density.

Poster [pdf]

Paper [pdf]

In a related work, we show that the same interface is sufficient for estimating bounds on the KL divergence between the output distributions of stochastic computations, and we apply this to estimate the approximation error of SMC approximate inference samplers including those using MCMC transitions.

The interface is also related to the ‘stochastic procedure’ interface used in the Venture probabilistic programming platform. There are also connections to pseudo-marginal MCMC and particle MCMC.

Authors: Marco Cusumano-Towner, Vikash Mansinghka

]]>We argue that probabilistic logic programming (PLP), whose rich history goes back to the early 90s with results by David Poole and Taisuke Sato, is uniquely positioned in that it naturally connects these two views into a single formalism with rigorously defined semantics, and thus opens up ways to bridge the gap between the two communities and to connect their results. More specifically, in this note we show how probabilistic logic programs possess not only a possible world semantics as common in SRL but also a program trace semantics as common in PP, which they inherit from traditional logic programs. Furthermore, as extensions of Prolog, probabilistic logic programming languages are Turing equivalent, a property that they share with probabilistic programming languages extending traditional functional or imperative programming languages, and which distinguishes them from many of the statistical relational learning formalisms such as probabilistic databases and Markov Logic.

Authors: Angelika Kimmig and Luc De Raedt

Extended abstract: Probabilistic Logic Programs: Unifying Program Trace and Possible World Semantics

]]>Abstract: Bayesian models are tools of choice when solving problems with incomplete information. Bayesian networks provide a first but limited approach to address such problems. For real world applications, additional semantics is needed to construct more complex models, especially those with repetitive structures or substructures. ProBT, a Bayesian a programming language, provides a set of constructs for developing and applying complex models with substructures and repetitive structures.

The goal of this paper is to present and discuss the semantics associated to these constructs.

An important trade-off in the design of PPLs is between the expressiveness of the language of generative models and the power and efficiency of the inference methods. This trade-off between expressiveness and efficiency requires users to choose a different language, and associated toolchain, de- pending on the particular problem they are trying to solve. Additionally, even knowing where a particular problem lies on the continuum between expressiveness and efficiency, or where it is likely to lie as the problem evolves, can require a non-trivial amount of expertise and understanding of machine learning.

In this talk, we will describe ongoing work on GraPPa, the Galois Probabilistic Programming language, that addresses this concern. GraPPa, which is implemented as an embedded domain- specific language (EDSL) in Haskell, is a single PPL that allows users to choose where each model lies on the continuum between expressiveness and efficiency, simply by choosing what sorts of random variables to use in a model. The key technical idea that enables this approach is an encoding, in the type of each model, of the set of random variables and associated distributions used in that model. This approach is compositional, meaning that a model with random variables in one set can be combined with a model with random variables in another set, and the type of the resulting model will contain the union of the two sets.

See our full extended abstract for details: grappa-pps-2017

We demonstrate why we need such models. Second, we derive the acceptance ratio formula. The corresponding MH algorithm is implemented in the Hakaru10 system, which thus can handle mixtures of conditional distributions.

]]>Purely probabilistic program modules (i.e., those not using internal state) are exchangeable: calls can be commuted and discarded without changing the behaviour.

In general, memory access is neither commutative nor discardable. However, there are many interesting probabilistic program modules that involve state privately, but from the perspective of the user are exchangeable. A source of examples are the Exchangeable Random Processes (XRPs), such as Polya’s urn implementation of the Beta-Bernoulli process.

Suppose a module’s external behavior is exchangeable. Does that mean that there is an implementation of that module that is purely probabilistic, i.e., without using state even internally?

Authors: Nate Ackerman, Cameron Freer, Dan Roy, Sam Staton, Hongseok Yang.

Extended abstract: [PDF]

]]>Following Kozen (*Semantics of probabilistic programs, J. Comput. Syst. Sci.*) and Moggi (*Notions of computation and monads, Information and Computation)*, terms in probabilistic programs can be given a denotational semantics as arrows in the Kleisli category of the Giry monad, i.e. *kernels*. Mathematical models of probabilistic programming have all relied on this pointful, *kernel-centric* view—including for the key operation in Bayesian learning, namely *Bayesian inversion*.

We argue that a *pointless*, operator-based approach to Bayesian inversion is both more general, simpler and offers a more structured view of Bayesian machine learning.

Culbertson & Sturtz (*A categorical foundation for Bayesian probability, Applied Categorical Structures*) present Bayesian inversion as a mapping associating each pair consisting of a *prior* and a likelihood with to a pair consisting of the *posterior * and the *marginal likelihood * (which is just the pushforward of through ). The kernel is characterised as the -almost surely unique one satisfying an infinitary generalisation of Bayes’ law. We identify the following problems with this approach:

- it relies on the disintegration theorem, hence on rather strong topological assumptions on the underlying spaces;
- it is
*unstructured*: the map from to is not even a function, as is only defined -almost surely.

In [1,2], we establish that this operation can be recast as the familiar operation of *adjunction* of operators. This is formalised in a categeory of -complete cones following earlier work (see Selinger, *Towards a semantics for higher-order quantum computation* and Chaput et al., *Approximating Markov processes by averaging*). We construct a functorial representation from a category of typed kernels to a category of linear operators (viewing these kernels as generalised predicate transformers, as in (Kozen, *A Probabilistic PDL*). We show that pointful inversion coincides with adjunction whenever the former is defined (e.g. on standard Borel spaces). This move presents the following advantages:

- no topological assumptions are required;
- adjunction is
*structured*: it corresponds to an equivalence of categories between so-called*abstract Markov kernels*and*Markov operators*.

Besides being a ground for denotational semantics, we foresee that this pointless setting will be amenable to the application of the approximation techniques for Markov processes developed

in *Approximating Markov processes by averaging *to Bayesian learning.

Modelling effects functorially, as in Moggi’s approach to computational lambda calculi, makes the study of natural transformations between those functors a useful source of potential programming primitives. In [3,4] we constructed a general theorem (called the *Machine*) allowing to reduce the proof of existence of natural transformations between a wide family of endofunctors of the category of Polish spaces to their components at finite spaces. These transformations include the monadic data of the Giry functor, the distribution, the normalisation of a nonzero measure to a probability, the Poisson point process, the Dirichlet process and last but not least, the de Finetti representation theorem. Further, one of our results in [4] called *rigidity *gives a sufficient condition for there to be at most *one* natural transformation between two well-chosen functors. We believe the *Machine* is relevant to probabilistic programming, as it eases the exploration of potential primitives to be added in existing or future languages. On this matter, we derived in [5] some results which establish the basis for a compositional language of natural transformations seen as robustly parameterised probabilistic models. Further, we expect that our techniques can address questions of computability of such natural stochastic processes by reducing them to the finite-dimensional case.

Ongoing work includes importing more results of measure theory in our framework, so as to bridge the Machine with our cone-theoretic developments. Our long term plan is to obtain a cohesive category-theoretical view of measure theory and statistics, where both semantics and a sound theory of approximation for probabilistic models coexist.

[1] Bayesian inversion by -complete cone duality (CONCUR 2016)

[2] Pointless learning (accepted to FOSSACS 2017, draft available here)

[3] Dirichlet is natural (MFPS 31)

[4] Giry and the Machine (MFPS 32)

[5] Robustly parameterised higher-order probabilistic models (CONCUR 2016)

]]>`b`

and continuous component `m`

:
var dist = Infer( {method: 'MCMC', samples: 1000}, function () { var b = flip(0.7) var m = gaussian(0, 1) var y = gaussian(m, b ? 4 : 2) condition(y > 0.3) return {b: b, m: m} });

We visualize it by calling `viz(dist)`

, which gives us this picture:

This is a reasonable choice. There are two density curves for `m`

—an orange curve for when `b`

is true, and a blue curve for when `b`

is false. webppl-viz often produces helpful graphs but, as I will show, it can also produce graphs with obvious flaws. One reason for this is that webppl-viz defines a limited set of variable types for visualization and it uses heuristics to guess the types particular components in a posterior sample. Another issue is that webppl-viz does not scale well with the number of components in a posterior. There are only a handful of ways to visually encode data, and webppl-viz gives up if the dimensionality of the posterior exceeds the number of available visual channels. In this article, I argue that methods from programming languages research suggest solutions to these two problems.

Extended abstract: Ouyang abstract

]]>to incorrect statistical models, ignoring dependence between random variables,

or, using wrong hyper-parameters (such as sample size) for inference. To prevent these errors, the solution most probabilistic programming languages adopt is to keep inference external to the core language semantics. While this limitation avoids potential pitfalls, it prevents programmers

from invoking inference at arbitrary points in the program and composing the

results.

However, designers of large systems often need to use inference for practical

reasons, such as, to efficiently transmit a probability distribution across the

network or to decompose an intractable inference problem into

smaller pieces.

We present a programming model, FLEXI, that extends a typical probabilistic programming

language with new constructs for invoking inference. To debug problems that arise from the use of inference, we use a Decorated Bayesian network to represent the program’s distribution and its inference operations. The representation lets us detect and explain bugs from two categories: approximation and dependence.

List of Authors: Chandrakana Nandi, Adrian Sampson, Dan Grossman, Todd Mytkowicz, Kathryn S. McKinley

Camera ready extended abstract:

]]>