Derivation of the Cramér-Rao Bound Ryan D. Reece October 1, 2009 Abstract I give a pedagogical derivation of the Cramér-Rao Bound, which gives a lower bound on the variance of estimators used in statistical point estimation, commonly used to give numerical estimates of the systematic uncertainties in a measurement. 1 Derivation For estimators θi of parameters θi in a given model with liklihood function L, the bias of these estimators bi is defined as bi ≡ E[θi(x)− θi] ≡ ∫ dx L(x, θ) (θi(x)− θi) Note that the estimators θi depend on the observable x, and the likelihood function L depends on x and the paramters of the model θ. For ease of notation these dependencies will now be left implicit. The observable could be a tuple of many measurements x = {x1, x2, . . . xN}, in which case the integral over x actual denotes integrals over each independent measurement.∫ dx = ∫ dx1 ∫ dx2 * * * ∫ dxN In that case, the likelihood function is the joint likelihood function, which is a product of the likelihood functions for each measurement. L = N∏ i=1 Li 1 Taking the derivative of the bias with respect to its corresponding parameter gives ∂bi ∂θi = ∫ dx (θi − θi) ∂L ∂θi}{{} L ∂ lnL ∂θi − ∫ dx L} {{ } 1 1 + ∂bi ∂θi = ∫ dx (θi − θi) L ∂ lnL ∂θi( 1 + ∂bi ∂θi )( 1 + ∂bj ∂θj ) = [∫ dx (θi − θi) L ∂ lnL ∂θi ] [∫ dx (θj − θj) L ∂ lnL ∂θj ] Then we use the Cauchy-Schwarz inequality:∣∣∣∣∫ dx f(x) g(x)∣∣∣∣2 ≤ ∫ dx |f(x)|2 * ∫ dx |g(x)|2( 1 + ∂bi ∂θi )( 1 + ∂bj ∂θj ) ≤ [∫ dx L (θi − θi)(θj − θj) ] [∫ dx L ∂ lnL ∂θi ∂ lnL ∂θj ] The first integral is the covariance of the estimators. Vij ≡ Cov[θi, θj] ≡ ∫ dx L (θi − θi)(θj − θj) The second integral is defined as the Fisher information matrix. Iij ≡ E [ ∂ lnL ∂θi ∂ lnL ∂θj ] ≡ ∫ dx L ∂ lnL ∂θi ∂ lnL ∂θj A more convenient and equivalent expression for Iij can be derived as follows. 2 Consider the following. E [ ∂2 lnL ∂θi ∂θj ] = E [ ∂ ∂θi ( 1 L ∂L ∂θj )] = E [ − 1 L2 ∂L ∂θi ∂L ∂θj + 1 L ∂ ∂θi ∂L ∂θj ] = −E [ ∂ lnL ∂θi ∂ lnL ∂θj ] + E [ 1 L ∂ ∂θi ( L ∂ lnL ∂θj )] = −Iij + ∫ dx L 1 L ∂ ∂θi ( L ∂ lnL ∂θj ) = −Iij + ∂ ∂θi ∫ dx L ∂ lnL ∂θj = −Iij + ∂ ∂θi ∫ dx L 1 L ∂L ∂θj = −Iij + ∂ ∂θi ∂ ∂θj ∫ dx L = −Iij + ∂ ∂θi ∂ ∂θj (1) = −Iij Therefore Iij ≡ E [ ∂ lnL ∂θi ∂ lnL ∂θj ] ≡ E [ − ∂ 2 lnL ∂θi ∂θj ] One can see that the Fisher information matrix measures the curvature of the likelihood function in parameters space, averaged over the possible observed data. Intuitively, this means the larger the value of Iij, the more sharply pronounced the likelihood function is in that region, and the more sensitive the experiment is to the parameters of the model. Plugging these expressions back into the inequality we derived by using the Cauchy-Schwarz inequality, we have the general expression for the Cramér-Rao bound. Vij ≥ ( 1 + ∂bi ∂θi )( 1 + ∂bj ∂θj ) E [ − ∂2 lnL ∂θi ∂θj ] (1) It is often the case that the bias of a well chosen estimator is asymtotically zero in the large sample limit. In which case, the bound for the covariance 3 matrix of unbiased estimators simplifies to the following. Vij ≥ ( E [ − ∂ 2 lnL ∂θi ∂θj ])−1 The diagonal elements of which give the variance of the estimators. Vii = σ 2 θi ≥ ( E [ −∂ 2 lnL ∂θ2i ])−1 The efficiency of an unbiased estimator is defined as ε(θi) ≡ ( E [ −∂2 lnL ∂θ2i ])−1 σ2 θi The efficiency is therefore less than or equal to one, and equal to one in the case that the Cramér-Rao bound becomes an equality. If the estimators are unbiased and efficient (ε = 1), it is evident that the covariance matrix of the estimators is given by the following. Vij = ( E [ − ∂ 2 lnL ∂θi ∂θj ])−1 It can be shown that in the large sample limit, Maximum Likelihood Estimators (MLE) are asymptotically unbiased and efficient. Since integrating the second derivatives of the likelihood function over all possible observables is often prohibitive in practice, one often estimates this expectation value by the numerically determined second derivatives evaluated at the point of maximum likelihood. Vij ≈ ( − ∂ 2 lnL ∂θi ∂θj ∣∣∣∣ θ=θ )−1 Therefore, the variance of the estimators can be estimated by σ2 θi ≈ ( −∂ 2 lnL ∂θ2i ∣∣∣∣ θi=θi )−1 (2) 4 2 Example The purpose of equation 2 is to estimate the variance of an estimator when an analytic calculation is not practical. In this example, however, we will study a case where an analytic calculation of the variance is trivial such that we make the validity of equation 2 apparent. Consider an experiment withN repeated measurements that are Gaussian distributed. The likelihood function is therefore L = N∏ i=1 1 σ √ 2π exp ( −(xi − μ) 2 2σ2 ) The MLE for the mean, μ, can be found by maximizing the likelihood function, or equivalently, its natural logrhythm. lnL = −N ln(σ √ 2π)− N∑ i=1 (xi − μ)2 2σ2 0 = ∂ lnL ∂μ = N∑ i=1 (xi − μ) σ2 ⇒ μ = 1 N N∑ i=1 xi Therefore the MLE of μ is just the mean of the sample, as one might expect. Calculating the second derivative of the likelihood gives ∂2 lnL ∂μ2 = −N σ2 Therefore, equation 2 gives the following for the variance of this estimator σ2μ = σ2 N (3) which can easily be shown to be the variance of the sample mean of any distribution as follows. Let x ≡ 1 N N∑ i=1 xi 5 E [x] = E [ 1 N N∑ i=1 xi ] = 1 N N∑ i=1 E [xi] = 1 N N∑ i=1 μ = μ Thefore, x is an unbiased estimator of the mean. Now we calculate the variance. V [x] = E [ x2 ] − (E [x])2 = E [( 1 N N∑ i=1 xi )( 1 N N∑ j=1 xj )] − μ2 = 1 N2 ( N∑ i=1 E [ x2i ] + N∑ i 6=j E [xi xj] ) − μ2 For the first sum, note that V [x] = σ2 = E [ x2 ] − (E [x])2 = E [ x2 ] − μ2 ⇒ E [ x2 ] = μ2 + σ2 For the second sum, recall that the individual measurements xi are made independently, therefore E [x1 x2] = E [x1] E [x2] = μ 2 Therefore V [x] = 1 N2 ( μ2 + σ2 + (N2 −N)μ2 ) − μ2 = σ 2 N in agreement with the variance we calculated using the Cramér-Rao bound (equation 3), implying that μ is indeed an efficient estimator.