A Beginners Guide to Probability Densities and Masses
We will answer the questions:

What is a probability mass?

What is probability density, and how is it related to probability mass?

Why do we need both probability masses and probability densities?

How does discretization bridge the gap between density and mass?
What is a probability mass?
A probability mass is exactly what you think of when anyone says the word ‘probability’.

It is what you mean when you say that the probability of a coin coming up tails is 0.5

It is what you mean when you say there is a 25% chance of rain tomorrow.

The total probability mass contained in any (properly normalized) probability distribution is 1

The mean of a probability distribution can be found at the distribution’s ‘center of mass’

i.e., the mean locates the point on the xaxis at which probability mass to its left and right just balances

The famous statistical pvalue (Fig 1) is also a probability mass.
In Fig 1, a pvalue is computed from the probability distribution over possible data means
 This distribution is theoretical, computed before seeing the data

The pvalue is the area under the distribution that is at least as far from zero as the experimentally observed data
 hence it is based on possible data mean values that are positive and negative

the pvalue can only be computed after seeing the experimental data

 hence it is based on possible data mean values that are positive and negative

‘Area under a probability distribution that is associated with certain abscissa regions’ is another way of saying:
 ‘the probability mass associated with those abscissa regions’

‘the proportion of total probability mass found along those regions of the abscissa’
What is probability density, and how is it related to probability mass?
Probability density is what you are plotting when you draw most familiar probability distributions, although there are both probability mass and density distributions:
 Continuous probability distributions such as the Gaussian, Cauchy, exponential, Fdistribution, tdistribution, etc. are all probability density functions

Discrete probability distributions, such as the sampling distributions for the Poisson, binomial, inverse binomial, multinomial, etc. are all probability mass functions.
Probability mass and probability density are related in exactly the way that physical mass and density are related.

The physical density of an object tells you the rate at which mass accumulates over a unit of length of the material in that object

Thus if you had a rectangular bar of metal with density , you can use the density of the bar to calculate masses over sections of the bar.

If you plotted the density of the bar along its length, it would be a uniform density function

If you could take a slice as fine as a mathematical line (with zero width) through the bar, you would have removed none of its mass.

In other words, any punctate point or slice through the bar has zero mass


However, if you took a small length of the bar (shaded region), then the mass would be:

this is the same as saying it is the cross section of the bar multiplied by the area under the lengthdensity function.

the latter description generalizes to a bar with nonuniform density.

For example, if it had a bellshaped density function (as in the pvalue example), you still define the mass in terms of the area under the density function.

The same is true of probability density functions

You can compute probability masses over some region of the abscissa of a density function, such as in Fig. 1.

However, any single point along the abscissa of a density function, for example the point at
in Fig. 1, has zero probability (mass).
Why do we need both probability masses and probability densities?
Probability mass is what we mean when we talk about probability, so why do we need all this extra business with densities and computing areas under probability density functions?

The reason we need probability densities is to represent the probabilities of continuous quantities.

Take, for example, probability distribution over the binomial (cointossing) rate parameter ( ) in Fig. 3.

This probability distribution represents the probability that the rate at which the coin shows heads is .

This rate can be any value between 0 and 1


Thus, even though it is truncated (it does not extend indefinitely to infinity at either end of its range) it is nevertheless a continuous probability distribution


The probability distribution in Fig. 3 is a probability density function. The values on the ordinate are densities rather than masses.

The reason is that you have an infinity of possible values of between 0 and 1 on the abscissa (and an infinity between 0.5 and 1, and between 0.75 and 1, etc.)
If the numbers in the probability distribution were masses, then there would be an infinite number of those masses.
 and since all of the masses are greater than 0, the total mass would be infinite
 there would be no way to normalize the probability distribution (make the total mass one)
 therefore it could not be a probability (mass) function

However, just like with physical masses, the density of a physical material can be applied to an infinity of punctate points (or slices) through the object

and those densities can be converted directly to the mass of the object based on how much of the object we are concerned with (the area under the density function).


Thus, when we plot a continuous probability distribution we are always plotting probability density.

The probability masses corresponding to continuous distributions can only be defined over sections of the abscissa, not single points on the abscissa.

How does discretization bridge the gap between density and mass?
Taking another look at the continuous distribution over binomial rate, let’s imagine computing masses over different subsections of the abscissa.

Fig. 3b shows the mass that you get when you compute the area under the density function over the range of abscissa values from 0 to 1.

Since this is the full range of abscissa values, the total probability mass must be 1.


Fig. 3c shows what happens when you segment the abscissa into a larger number of subareas.

Now, although the total mass is still 1, the smaller masses associated with smaller segments of the abscissa give the distribution more of the shape of the density function.


Fig. 3d and 3e are virtually visually indistinguishable, although Fig. 3e has twice the number of segments as Fig. 3d.

Not only are these last two functions indistinguishable from one another, they are both visually identical to the density function in Fig. 3a.


Thus, by discretizing the abscissa we can create a probability mass function that captures the shape of the density function.



If the discretized probability distribution has the same shape as the density function, then it will have the same ‘moments’ (mean, variance, etc.).

The only additional wrinkle of which you should be aware, is when there are additional infinities in the density function. The binomial rate density in Fig. 3 has a single infinity associated with it being a continuous abscissa. Notice that the abscissa is truncated at 0 and 1, so there are not infinities associated with the function extending to infinity in either direction.

If you have a continuous function that extends to infinity in either direction, then you must also truncate your discretized function if you want it to be normalized.

This is not generally problematic as long as you go far enough into the tails before you truncate

e.g., you will rarely go wrong if you truncate the tails of a Gaussian distribution beyond about .


Fig. 3
(a)
(b)
(c)
(d)
(e)
Fig. 1
Fig. 2