The German tank problem

2024-02-24

Probability Statistics

Contents


1 Introduction

During WW2, the Allies faced the problem of estimating the monthly rate of German tank production from very limited data. Under a set of assumptions which were justified at the time, it was possible for them to achieve this accurately using statistical methods.  In this article, I will outline some of these methods.

The phrase German tank problem has come to refer to the general problem of estimating the maximum of a discrete uniform distribution from sampling without replacement.

1.1 Abbreviations and symbols used throughout this post

The following is a list of abbreviations I use throughout this post:

AbbreviationMeaning
GTPGerman tank problem
LFLikelihood function
MAPMaximum a posteriori estimator
MLEMaximum likelihood estimator
MVUEMinimum variance unbiased estimator
PMFProbability mass function

The following is a list of symbols I use throughout this post:

2 Assumptions

Let's make the following assumptions:

  1. The serial numbers indeed count the number of tanks. E.g. the 1st tank manufactured had serial number 1 (or probably something resembling #00001), the 2nd had serial number 2, the th had serial number .
  2. The probability of observing a given serial number  from the total population of tanks having total number  follows a discrete uniform distribution, i.e. .

With these assumptions in place, let's proceed with our analysis.

There are 2 key approaches we can take, corresponding to the 2 popular interpretations of probability theory:

  1. Frequentist
  2. Bayesian

3 Frequentist approach

In the frequentist approach, we:

  1. Ask for a single point estimate of ; and
  2. Ask how confident we are in our estimate.

To answer Question 1, we need an estimator for . We will consider 2 estimators: The maximum likelihood estimator (MLE) and the minimum variance unbiased estimator (MVUE).

To answer Question 2, we calculate confidence intervals for our selected estimator. For sake of brevity, I will not discuss this in this post.

3.1 Maximum likelihood estimator

In frequentist analysis, a popular estimator is the maximum likelihood estimator, which is the maximum of the likelihood function. The likelihood function looks identical to the joint probability density of the observed data, except that we consider it not as function of the data (like the joint PMF is), but as a function of the parameters PMF. It is a term that appears in Bayes' theorem.

What is our data, and what are the parameters of our density? Our data are the observed serial numbers. The single parameter - the one we wish to estimate - is the upper bound of the uniform PMF we have assumed in Assumption 2.

To derive the likelihood function, first consider the case where we have observed the serial number of only a single tank. In this case, under Assumption 2, the joint density (in this case, just the 'density') is simply the univariate discrete uniform PMF with bounds :

This PMF is familiar, and for  looks like:

A graph with numbers and points  Description automatically generated

This of course is a function of . Now, to go from this to the likelihood function, instead of fixing the parameter  and considering it a function of , we fix an  and consider it as a function of . We write  to refer to the likelihood.

For example, suppose we have observed . This sole data point is our dataset. The likelihood function looks like this:

A graph of a number of points  Description automatically generated

Note that, because that the LF is a function of the parameter  and not a function of the data , the LF is not a PMF and thus is not normalised.  

Now, consider the case where we have observed the serial numbers of 2 tanks. If, hypothetically, we were sampling with replacement (capturing a tank, recording the serial number, and then returning it to the enemy!), the samples would be independent and the joint density would be simply the bivariate discrete uniform PMF with bounds .

But this, of course, is NOT what we're doing. We are sampling without replacement. So the samples are NOT independent. The density is thus:

In the general case of  tanks, the density is:

Where .

To get the likelihood function from this, you simply consider it as a function of  instead of X. We write:

For example, suppose . The following figure displays the likelihood function in this case.

A graph with blue dots  Description automatically generated

3.2 Suppose we fixed the maximum observed serial number m. How does the likelihood function behave with sample size k?

Let's normalise the two likelihood functions and superimpose them:

A graph of a function  Description automatically generated

Compared to the case of a single observation (), the LF decays more rapidly when . Intuitively, this is because the larger your sample size, the more confident you are that its maximum is the true maximum.

Finally, the MLE is simply the value that maximises the LF. Having derived the likelihood function, we can immediately see that the MLE is simply the sample maximum, in this case 60.

3.3 Minimum variance unbiased estimator

Now, there's a problem with the MLE in this case: It is biased, which means that its expected value is not equal to the true value. In fact, 'most of the time', the MLE will underpredict the true value, 'sometimes' it will get it right, but it can NEVER overpredict it. It is 'biased downward'.

So instead, we look for an unbiased estimator.

3.4 Derivation of an unbiased estimator

An unbiased estimator  for , is one whose expected value is equal to .

To derive this, first, let's consider the expected value of the sample maximum, . It is possible to show that:

See, for example, Example 5.1.2 of Hogg et. al. (2019). Rearranging this,

Now, from the linearity of the expectation operator, we can bring the factor  and the  inside the expectation operator:

Thus, we have found something whose expected value is equal to . Thus, the estimator

Is an unbiased estimator for .

3.5 Showing this is in fact the minimum variance unbiased estimator

As it turns out, this estimator is in fact the minimum variance unbiased estimator (MVUE) of . The proof is beyond the scope of this post, but essentially what you need to do is:

  1. Show that the sample maximum is a sufficient statistic for the population maximum.
  2. Show that it is also a complete statistic for the population maximum.
  3. Invoke the Lehmann-Scheffe theorem, which states that any estimator that is unbiased and depends on the data only through a complete, sufficient statistic is the unique MVUE.

The details are left to the reader.

In this example, the MVUE for  is  tanks.  

In summary, for the frequentist approach, we have:

EstimatorValue
MLE60
MVUE71

4 Bayesian approach

In the Bayesian approach, instead of being interested in a single point estimate of , we ask for a probability density of n given our assumptions and data. Specifically, we ask for the posterior PMF of  having prior observed  and , i.e. . We can then use various descriptive statistics from the PMF as our estimators. In particular, the mode of the posterior PMF is called the maximum a posteriori (MAP) estimator.

The posterior PMF can be expressed using Bayes' rule as:

I have decided to omit the details from this post, but in summary, each of these are:

Substituting these into Bayes' rule,

Example

Suppose we observe , as before. The posterior PMF in this case is shown in the following figure.

A graph of a function  Description automatically generated

The posterior mode (MAP), median and mean are shown in the following table.

EstimatorValue
Posterior mode (MAP)60
Posterior mean79
Posterior median70

4.1 Comparison between frequentist and Bayesian approaches

In summary, we have:

ApproachEstimatorValue
FrequentistMLE60
FrequentistMVUE71
BayesianPosterior mode (MAP)60
BayesianPosterior mean79
BayesianPosterior median70

Observations:

4.2 Exploring the Bayesian posterior PMF

In this last section, I will ask and answer some basic questions to provide some intuition behind the Bayesian approach.

4.3 Suppose we 'get lucky' on the first sample and it happens to contain the highest serial number. How does the posterior behave in this case as we collect more data?

The results of a quick simulation of this case are shown below. What we can say is that, given fixed , with increasing , the probability mass at  increases. As in the case of the likelihood function in the frequentist case from before, this is intuitive, since a larger sample size gives you more confidence that the observed maximum serial number is the total number of tanks. In the limit , the PMF becomes a single spike of probability mass 1.0 at .

A graph of a number of numbers  Description automatically generated with medium confidence

4.4 Suppose the highest serial number observed in a sample is equal to the sample size. How does the posterior behave in this case as a function of sample size?

The results of a quick simulation of this case are shown below. What we can say is that, setting , the credibility that  increases with increasing , as shown by the fact that there is more mass at  (the PMF isn't merely shifted to the right).  This is because, from Bayes' rule, the posterior PMF is proportional to the LF, which, as we've seen in the frequentist approach, becomes 'steeper' with increasing sample size.

We can interpret this intuitively as follows: Suppose we don't know  and we draw the sample . While this is compelling evidence that , it isn't as compelling as the evidence that, say,  if the sample we draw is .

A graph of a number of numbers  Description automatically generated with medium confidence

5 Conclusion

In this post, we have explored two approaches to the German tank problem - the problem estimating the maximum of a discrete uniform distribution from sampling without replacement. We have explored frequentist and Bayesian approaches and compared their results.

6 References

https://en.wikipedia.org/wiki/Lehmann%E2%80%93Scheff%C3%A9_theorem

Back to top