1 Semantics of probabilistic programming: a gentle introduction Fredrik Dahlqvist, Alexandra Silva and Dexter Kozen 1 2 Probabilistic programs as measures. The first part on semantics consists of four chapters on various aspects of the formal semantics of probabilistic programming languages.

## Introduction

Moreover, the expected number of iterations of the body of the loop is given by . The Cantor distribution is also an example of continuous probability distribution, assigning probability zero to every element of the state space.

## Measure theory: What you need to know

34 Dahlqvist, Kozen, and Silva: Semantics of Probabilistic Programming conclude that the semantics of the whole loop is the operator that sends μto. 1.3, is instructive as a warm-up exercise to compute the denotative semantics of the iterator loop frequent pattern.

## Introduction

We begin in section 2.2 with an introduction to the general approach to probabilistic programming and informally consider various aspects of the semantics of probabilistic programs. In section 2.3 we develop the informal semantics from a measure-theoretic perspective and show with examples why a naive semantics is not so simple (§2.3.3).

## Informal semantics for probabilistic programming .1 A ﬁrst example: discrete samples, discrete observation

I forgot what time it is. ii) The speed of the bicycles per hour is determined by a function of the time of day. iii). Recall that the meaning of density functions applied to probabilities (as opposed to probabilities) is as follows: although the probability that the time is exactly 5:30 is zero, we can give a probability that the time is in a or second interval (more generally, a measurable set), as the integral of the density function.

## Introduction to measurability issues

If we run a weighted simulation times, pick seedsω1. ωk ∈Ω, we get an empirical posterior probability that the result in the setU: is. This is well motivated by the illustration in Figure 2.7: the probability of hitting 20 is the sum of the probabilities of hitting the three regions shown. In Section 2.2.4 we discussed the point that although the denominator can be 0 or ∞, for an entire program, this almost never happens.

It turns out that while some infinite measures are definable in a probabilistic programming language, the counting measure onRi is not definable – we show this in section 2.5.2.

## Formal semantics of probabilistic programs as measures

We will show later (§2.5.1) that the order of evaluation does not matter, so we can use (t0,t1) as an ambiguous syntactic sugar, but it simplifies the formal semantics to insist that the order of evaluation be given explicitly. Some families are not defined for all parameters, e.g. the standard deviation should be positive, but we make safe ad-hoc choices rather than using exceptions or subtypes.). The fact that letx=tinu is a sﬁnite kernel is Lemma 2.7: this is the trickiest part of the semantics.

In section 2.3 we considered an even more detailed approach, where a program-pt:Ais is interpreted as a measurable function Ω→A, i.e. a random variable in a basic probability space, together with a particular likelihood function Ω→ [0,∞ ). The feature is fictitious but based on real observations from Bells on Bloor in Toronto (Koehl et al., 2017). The density function → f(t)e−0.016×f(t)× 241 is not normalized, but we can divide it by the normalization constant to get a true posterior density function:. 2.2) In general, we cannot naively use density functions for a full compositional semantics, because some basic programs do not have density functions.

## Overview

To this end, we outline a coding of computable distributions in a fragment of Haskell and show how topological domains can be used to model the resulting PCF-like language. Our next step is to find mathematical structures that can be used to model the implementation faithfully. In this section, we make the connection between implementation and mathematics more concrete by using the previously described constructs to give both (algorithmic) sampling and distribution semantics to a core PCF-like language extended with real-valued and continuous distributions (via a probability monad) called λC D.5 Sampling semantics can be used to guide implementation, while distribution semantics can be used for comparison reasoning. v) What are the implications of taking a (Type-2) computable viewpoint for Bayesian inferences (Section 3.6).

4 Remember that the PCF (Programming Computable Functions) language is a core calculus that can be used to model typed functional languages such as Haskell.

## Computability Revisited

Example 3.10 (R,dEuclid,Q) is a computable metric space for the real numbers using the rational numbers Q as the approximate elements. Fortunately, we can also think of a distribution on a computable metric space (X,d,S) in terms of sampling, i.e. as a Type-2 computable function 2ω X. The sampling representation is (computably) equivalent to the representation of a computable distribution as a point in a suitably computable metric space.

A computable probability space (X, μ) is a pair where X is a computable metric space and μ is a computable distribution (see Hoyrup and Rojas, 2009, Def. 5.0.1).

## A Library for Computable Distributions

The moduleCompDistLib provides the function mkSamp to convert an arbitrary Haskell function of the correct type to a value of typeSampα. Continuous distributions We then fill in the previously presented sketch of the standard uniform distribution. Once we have the standard uniform distribution, we can code other primitive distributions (eg normal, exponential, etc.) as transformations of the uniform distribution as in standard statistics using return and bind.

For example, we give an encoding of the standard normal distribution using the Marsaglia polar transform.

## Mathematical Structures for Modeling the Library

Compare this figure with one for CPOs (Abramsky and Jung, 1994, p. 46).) The symbol indicates that the category is closed under that construct and the symbol + also indicates that it matches the appropriate categorical construct. The primary motivation for doing so is that we will gain a different perspective on computability (i.e., in addition to the topological and order-theoretic) that emphasizes the connection with constructive mathematics. Therefore, we will get another method of giving semantics in addition to the traditional order-theoretic one.

Proposition 3.33 (Lietz, 2004, Prop. 3.3.2). If ~∗, then the interpretations of B0 in the categories Asm(K2) and Asmt(K2) (ie, the truth or the classical interpretation)∼ give computably equivalent realizability structures.

## A Semantics for a Core Language

The interpretation of the type of distributions Dist τ is a pair of a sampler and a distribution such that the sampler realizes the distribution. We can check that the valuation given by the semantics is actually the advance along the sampler. The structure of the argument to show that the term denotation function is well-defined is similar to the argument to show that the CPO semantics of PCF is well-defined.

We can check that the denotation of the first one is the same as botSamp: EalwaysDivρ(U)=.

## Bayesian Inference

Thus, calculating the conditional distribution P(N|X) is equivalent to determining the value of the program variable given the value of the program variableex. The idea is that we condition Y (i.e. the smoothed data) instead of U (i.e. the ideal data) when calculating the posterior distribution for the model parameters. limitation (see e.g. Goodman et al., 2008; Wood et al., 2014). Thus, conditioning takes a sampleable distribution, a bounded computable density that describes how observations have been corrupted, and returns a sampleable distribution that represents the conditional one.

The parameter does not match the joint distribution of the model (both model parameters and probability), corresponds to a bounded conditional density.

## Summary and Further Directions

In particular, a universality result would crystallize the idea that Turing-complete probabilistic programming languages express Type-2 computable distributions. Now that we are back on familiar ground with regard to computability, we can turn our attention to the design of probabilistic programming languages. Pages 275–284 of: Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science.

Pagina's 171–182 van: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages.

## Introduction

Summary: This chapter is intended as a gentle introduction to probabilistic λ-calculi in their two major variations, namely randomized λ-calculi and Bayesian λ-calculi. We focus our attention on the operational semantics, expressive power and termination properties of randomized λ-calculi, and provide only some hints and references on denotational models and Bayesian λ-calculi. Randomized λ-calculi will be the subject of Section 4.3, while Bayesian λ-calculi will be introduced in Section 4.4.

We are mainly interested in randomizedλ computations, and give some hints about Bayesianλ computations in Section 4.4.

## A Bird’s Eye View on Probabilistic Lambda Calculi

In the remainder of this chapter, we focus on the first three aspects, leaving the task of delving into the denotational semantics of probabilistic λ-calculations to a future paper. It is clear that the result of the evaluation process cannot be just one value, but rather a distribution of values. However, finite derivations are used to derive approximations to the operational semantics of the underlying expression, as we will see in Section 4.3.2 below.

The choice of a strategy does not affect the actual final result of the calculation, although it is not guaranteed that all strategies will lead to a normal shape.

## A Typed λ -Calculus with Binary Probabilistic Choice

The rules that derive the one-step and step-indexed reduction ratios are given in Figure 4.3, and must be interpreted inductively. There is nothing in the definition of an almost certainly finite term that ensures that the term's expected number of reduction steps is finite. What makes call-by-value more attractive is the possibility to "implicitly memorize" the result of probabilistic choices by way of sequence, something that is not available in call-by-name.

Various techniques to prove that imperative programs are terminated have been introduced, based on the notion of ranking martingale or Lyapunov function, the natural probability analogues of the so-called ranking function (Bournez and Garnier, 2005).

## Sampling and Conditioning

On the other hand, it is actually possible to interpret PCF⊕ by context-space-style denotation models ( Danos and Ehrhard, 2011 ), and it also leads to full abstraction results ( Ehrhard et al., 2014 ). What is non-trivial, however, is to make sense of these calculations, even in terms of an operational semantics. Subsequently, quasi-Borel spaces have also been shown to give rise to a category of domains, thus accounting for the presence of recursion in the underlying calculus (Vákár et al., 2019).

Inscribing induces a family{CTτ}τ, where CTτ is the set of all closed terms assignable to typeτ in the empty environment, i.e.