1. Random
  2. 11. Finite Sampling Models
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9

5. The Matching Problem

Definitions and Notation

The Matching Experiment

The matching experiment is a random experiment that can the formulated in a number of colorful ways. Let nN+.

  1. Suppose that n male-female couples are at a party and that the males and females are randomly paired for a dance. A match occurs if a couple happens to be paired together.
  2. An absent-minded secretary prepares n letters and envelopes to send to n different people, but then randomly stuffs the letters into the envelopes. A match occurs if a letter is inserted in the proper envelope.
  3. n people with hats have had a bit too much to drink at a party. As they leave the party, each person randomly grabs a hat. A match occurs if a person gets his or her own hat.

The experiments in [1] are equivalent from a mathematical point of view, and correspond to selecting a random permutation X=(X1,X2,,Xn) of the population Dn={1,2,,n}.

  1. Number the couples from 1 to n. Then Xi is the number of the woman paired with the ith man.
  2. Number the letters and corresponding envelopes from 1 to n. Then Xi is the number of the envelope containing the ith letter.
  3. Number the people and their corresponding hats from 1 to n. Then Xi is the number of the hat chosen by the ith person.

Our modeling assumption, of course, is that X is uniformly distributed on the set of permutations of Dn. The number of objects n is the basic parameter of the experiment. We will also consider the case of sampling with replacement from the population Dn, because the analysis is much easier but still provides insight. In this case, X is a sequence of independent random variables, each uniformly distributed over Dn.

Matches

A match occurs at position j{1,2,,n} if Xj=j. So the number of matches is the random variable N defined mathematically by Nn=j=1nIj where Ij=1(Xj=j) is the indicator variable for the event of match at position j.

Our problem is to compute the probability distribution of the number of matches. This is an old and famous problem in probability that was first considered by Pierre-Remond Montmort; it sometimes referred to as Montmort's matching problem in his honor.

Sampling With Replacement

First let's solve the matching problem in the easy case, when the sampling is with replacement. Of course, this is not the way that the matching game is usually played, but the analysis will give us some insight.

(I1,I2,,In) is a sequence of n Bernoulli Trials, with success probability 1n.

Details:

The variables are independent since the sampling is with replacement. Since Xj is uniformly distributed, P(Ij=1)=P(Xj=j)=1n.

The number of matches Nn has the binomial distribution with trial parameter n and success parameter 1n. P(Nn=k)=(nk)(1n)k(11n)nk,k{0,1,,n}

Details:

This follows immediately from [4].

The mean and variance of the number of matches are

  1. E(Nn)=1
  2. var(Nn)=n1n
Details:

These results follow from [5]. Recall that the binomial distribution with parameters nN+ and p[0,1] has mean np and variance np(1p).

The distribution of the number of matches converges to the Poisson distribution with parameter 1 as n: P(Nn=k)e1k! as n for kN

Details:

This is a special case of the convergence of the binomial distribution to the Poisson. For a direct proof, note that P(Nn=k)=1k!n(k)nk(11n)nk But n(k)nk1 as n and (11n)nke1 as n by a famous limit from calculus.

Sampling Without Replacement

Now let's consider the case of real interest, when the sampling is without replacement, so that X is a random permutation of the elements of Dn={1,2,,n}.

Counting Permutations with Matches

To find the probability density function of Nn, we need to count the number of permutations of Dn with a specified number of matches. This will turn out to be easy once we have counted the number of permutations with no matches; these are called derangements of Dn. We will denote the number of permutations of Dn with exactly k matches by bn(k)=#{Nn=k} for k{0,1,,n}. In particular, bn(0) is the number of derrangements of Dn.

The number of derrangements is bn(0)=n!j=0n(1)jj!

Details:

By the complement rule for counting measure bn(0)=n!#(i=1n{Xi=i}). From the inclusion-exclusion formula, bn(0)=n!j=1n(1)j1JDn,#(J)=j#{Xi=i for all iJ} But if JDn with #(J)=j then #{Xi=i for all iJ}=(nj)!. Finally, the number of subsets J of Dn with #(J)=j is (nj). Substituting into the displayed equation and simplifying gives the result.

The number of permutations with exactly k matches is bn(k)=n!k!j=0nk(1)jj!,k{0,1,,n}

Details:

The following is two-step procedure that generates all permutations with exactly k matches: First select the k integers that will match. The number of ways of performing this step is (nk). Second, select a permutation of the remaining nk integers with no matches. The number of ways of performing this step is bnk(0). By the multiplication principle of combinatorics it follows that bn(k)=(nk)bnk(0). Using [8] and simplifying gives the results.

The Probability Density Function

The probability density function of the number of matches is P(Nn=k)=1k!j=0nk(1)jj!,k{0,1,,n}

Details:

This follows directly from [9], since P(Nn=k)=#{Nn=k}/n!.

In the matching experiment, vary the parameter n and note the shape and location of the probability density function. For selected values of n, run the simulation 1000 times and compare the empirical density function to the true probability density function.

P(Nn=n1)=0.

Details:

A simple probabilistic proof is to note that the event is impossible—if there are n1 matches, then there must be n matches. An algebraic proof can also be constructed from the probability density function in exericse [10].

The distribution of the number of matches converges to the Poisson distribution with parameter 1 as n: P(Nn=k)e1k! as n,kN

Details:

From the power series for the exponential function, j=0nk(1)jj!j=0(1)jj!=e1 as n So the result follows from the probability density function in [10].

The convergence is remarkably rapid.

In the matching experiment, increase n and note how the probability density function stabilizes rapidly. For selected values of n, run the simulation 1000 times and compare the relative frequency function to the probability density function.

Moments

The mean and variance of the number of matches could be computed directly from the distribution. However, it is much better to use the representation in terms of indicator variables. The exchangeable property is an important tool in this section.

E(Ij)=1n for j{1,2,,n}.

Details:

Xj is uniformly distributed on Dn for each j so P(Ij=1)=P(Xj=x)=1n.

E(Nn)=1 for each n

Details:

This follows from [15] and the additive property of expected value.

So the expected number of matches is 1, regardless of n, just as in when the sampling is with replacement .

var(Ij)=n1n2 for j{1,2,,n}.

Details:

This follows from P(Ij=1)=1n.

A match in one position would seem to make it more likely that there would be a match in another position. Thus, we might guess that the indicator variables are positively correlated.

For distinct j,k{1,2,,n},

  1. cov(Ij,Ik)=1n2(n1)
  2. cor(Ij,Ik)=1(n1)2
Details:

Note that IjIk is the indicator variable of the event of a match in position j and a match in position k. Hence by the exchangeable property P(IjIk=1)=P(Ij=1)P(Ik=1Ij=1)=1n1n1. As before, P(Ij=1)=P(Ik=1)=1n. The results now follow from standard computational formulas for covariance and correlation.

Note that when n=2, the event that there is a match in position 1 is perfectly correlated with the event that there is a match in position 2. This makes sense, since there will either be 0 matches or 2 matches.

var(Nn)=1 for every n{2,3,}.

Details:

This follows from [17] and [18], and basic properties of covariance. Recall that var(Nn)=j=1nk=1ncov(Ij,Ik).

In the matching experiment, vary the parameter n and note the shape and location of the mean ± standard deviation bar. For selected values of the parameter, run the simulation 1000 times and compare the sample mean and standard deviation to the distribution mean and standard deviation.

For distinct j,k{1,2,,n}, cov(Ij,Ik)0 as n.

So the event that a match occurs in position j is nearly independent of the event that a match occurs in position k if n is large. For large n, the indicator variables behave nearly like n Bernoulli trials with success probability 1n, which of course, is what happens when the sampling is with replacement.

A Recursion Relation

In this subsection, we will give an alternate derivation of the distribution of the number of matches, in a sense by embedding the experiment with parameter n into the experiment with parameter n+1.

The probability density function of the number of matches satisfies the following recursion relation and initial condition:

  1. P(Nn=k)=(k+1)P(Nn+1=k+1) for k{0,1,,n}.
  2. P(N1=1)=1.
Details:

First, consider the random permutation (X1,X2,,Xn,Xn+1) of Dn+1. Note that (X1,X2,,Xn) is a random permutation of Dn if and only if Xn+1=n+1 if and only if In+1=1. It follows that P(Nn=k)=P(Nn+1=k+1In+1=1),k{0,1,,n} From the defnition of conditional probability argument we have P(Nn=k)=P(Nn+1=k+1)P(In+1=1Nn+1=k+1)P(In+1=1),k{0,1,,n} But P(In+1=1)=1n+1 and P(In+1=1Nn+1=k+1)=k+1n+1. Substituting into the last displayed equation gives the recurrence relation. The initial condition is obvious, since if n=1 we must have one match.

This result can be used to obtain the probability density function of Nn recursively for any n.

The Probability Generating Function

Next recall that the probability generating function of Nn is given by Gn(t)=E(tNn)=j=0nP(Nn=j)tj,tR

The family of probability generating functions satisfies the following differential equations and ancillary conditions:

  1. Gn+1(t)=Gn(t) for tR and nN+
  2. Gn(1)=1 for nN+

Note also that G1(t)=t for tR. Thus, the system of differential equations can be used to compute Gn for any nN+.

In particular, for tR,

  1. G2(t)=12+12t2
  2. G3(t)=13+12t+16t3
  3. G4(t)=38+13t+14t2+124t4

For k,nN+ with k<n, Gn(k)(t)=Gnk(t),tR

Details:

This follows from differential equation in [23].

For nN+, P(Nn=k)=1k!P(Nnk=0),k{0,1,,n1}

Details:

This follows from [25] and basic properties of generating functions.

Examples and Applications

A secretary randomly stuffs 5 letters into 5 envelopes. Find each of the following:

  1. The number of outcomes with exactly k matches, for each k{0,1,2,3,4,5}.
  2. The probability density function of the number of matches.
  3. The covariance and correlation of a match in one envelope and a match in another envelope.
Details:
  1. k 0 1 2 3 4 5
    b5(k) 44 45 20 10 0 1
  2. k 0 1 2 3 4 5
    P(N5=k) 0.3667 0.3750 0.1667 0.0833 0 0.0083
  3. Covariance: 1100, correlation 116

Ten married couples are randomly paired for a dance. Find each of the following:

  1. The probability density function of the number of matches.
  2. The mean and variance of the number of matches.
  3. The probability of at least 3 matches.
Details:
  1. k P(N10=k)
    0 16481448000.3678795
    1 16687453600.3678792
    2 2119115200.1839410
    3 10316800.06130952
    4 5334560.01533565
    5 1136000.003055556
    6 119200.0005208333
    7 1151200.00006613757
    8 1806400.00001240079
    9 0
    10 136288002.755732×107
  2. E(N10)=1, var(N10)=1
  3. P(N103)=14569718144000.08030037

In the matching experiment, set n=10. Run the experiment 1000 times and compare the following for the number of matches:

  1. The true probabilities
  2. The relative frequencies from the simulation
  3. The limiting Poisson probabilities
Details:
  1. See part (a) of [28].
  2. k e11k!
    0 0.3678794
    1 0.3678794
    2 0.1839397
    3 0.06131324
    4 0.01532831
    5 0.003065662
    6 0.0005109437
    7 0.00007299195
    8 9.123994×106
    9 1.013777×106
    10 1.013777×107