Parameter Estimation#
Learning Objectives
Identify beneficial properties of the Maximum Likelihood Estimator.
Calculate the likelihood for a model with a set of parameters given the observed data.
Construct a procedure to select parameters to increase the likelihood of a model.
What is a Parameter Estimator?#
A parameter estimator is a function of the sample approximating a parameter of the distribution.
Example:
sample mean is an estimator for the mean of the normal distribution
sample variance is an estimator for the variance of the normal distribution
Method of Moments Estimator#
In section Parametric Distributions we used formulas based on sample statistics such as sample mean, variance, and skewness (centered moments) to obtain the model parameters. Those formulas were obtained by matching the sample moments to the theoretical moments of the distribution, and the procedure is called method of moments.
Maximum Likelihood Estimator#
Given an observed sample \(x\) and a probability model with a density function \(p_{\theta}(x)\), the likelihood function is defined as a function of the parameter:
When the sample consists of \(N\) independent observations \(x_1,...,x_N\), the likelihood is:
When the random variable is discrete, the likelihood of a parameter coincides of with the probability of observing the sample under that distribution:
We want to find the parameter \(\theta\) which maximizes this probability!
This approach provides a good estimator even if the variable is continuous!
Evaluating Estimators#
MSE - Mean Squared Error
Properties of an estimator:
unbiased: the expected value of the estimator is equal to the true parameter
consistent: as sample size grows, the estimator converges to the true parameter
efficient: the estimator has minimum variance
Note
MLE is Minimum Variance Unbiased Estimator as sample size grows.
MLE Widget#
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# loading steps
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&confirm=t&id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq' -O background.wav
from scipy.io import wavfile
# reading background data
samplerate, sound = wavfile.read('background.wav')
import numpy as np
# first we split small intervals of 0.1s
sound_split = np.split(sound[:(len(sound)-len(sound)%samplerate)], len(sound[:(len(sound)-len(sound)%samplerate)])/samplerate*10)
# we calculate RMS for each interval
RMS_split = [np.sqrt(np.mean(np.square(group.astype('float')))) for group in sound_split]
# define the r.v. as X
X = 20*np.log10(RMS_split)
# sample mean and variance
mean = np.mean(X)
std = np.std(X)
print(f"Mean: {mean:.2f}")
print(f"Std: {std:.2f}")
Mean: 41.92
Std: 1.82
# gaussian likelihood
gaussian_likelihood = stats.norm.pdf(X, loc=mean, scale=std)
L = np.prod(gaussian_likelihood)
print(L)
0.0
Caution
The likelihood evaluation involves a product of many numbers which often leads to numerical errors: either a rather big number or zero. Instead, we can maximize the logarithm of the likelihood, since log is a monotone function. The logarithm of the product is the sum of the logarithms.
# gaussian log-likelihood
logL = np.sum(np.log(gaussian_likelihood))
print(logL)
-1191.6218498671105
Note
Note that even if one sample point is outside of the range of the model’s probability density function, the evaluation at that point will be zero, and due to the independence assumption the whole likelihood will be zero. That in a way makes sense: if we observe a point which has probability zero for a specific model, then it means that this model has not produced this observation. In practice, there will be always outliers we may not want to fit to.
skewnorm_likelihood = stats.skewnorm.pdf(X, a=2, scale=std, loc=mean)
logL = np.sum(np.log(skewnorm_likelihood))
print(logL)
-1718.7251573821795
Exercise
Plug-in the numbers from the fitting widget and compare the log-likelihood of the two distributions.
out = interact(plot_skewnorm_density_L, a = shape_slider, scale = scale_slider, loc = loc_slider)
Exercise
Select a combination of parameters which “maximizes” the log-likelihood.
Note
In practice, usually the maximum likelihood estimate is found numerically through an optimization procedure. In rare cases, such as for the parameters of the Gaussian distribution, it can be solved analytically and corresponds to the sample mean and standard deviation. Try to derive it yourself!