Parameter Estimation#

Learning Objectives

  • Identify beneficial properties of the Maximum Likelihood Estimator.

  • Calculate the likelihood for a model with a set of parameters given the observed data.

  • Construct a procedure to select parameters to increase the likelihood of a model.

What is a Parameter Estimator?#

A parameter estimator is a function of the sample approximating a parameter of the distribution.

Example:

  • sample mean is an estimator for the mean of the normal distribution

  • sample variance is an estimator for the variance of the normal distribution

Method of Moments Estimator#

In section Parametric Distributions we used formulas based on sample statistics such as sample mean, variance, and skewness (centered moments) to obtain the model parameters. Those formulas were obtained by matching the sample moments to the theoretical moments of the distribution, and the procedure is called method of moments.

Maximum Likelihood Estimator#

Given an observed sample \(x\) and a probability model with a density function \(p_{\theta}(x)\), the likelihood function is defined as a function of the parameter:

\[\mathcal{L}(\theta| x) = p_{\theta}(x)\]

When the sample consists of \(N\) independent observations \(x_1,...,x_N\), the likelihood is:

\[\mathcal{L}(\theta| x) = \prod_{i=1}^Np_{\theta}(x_i)\]

When the random variable is discrete, the likelihood of a parameter coincides of with the probability of observing the sample under that distribution:

\[P_{\theta}(X_1 = x_1, X_2 = x_2,..., X_N=x_N) = \prod_{i=1}^N p_{\theta}(x_i) = \mathcal{L}(\theta| x)\]

We want to find the parameter \(\theta\) which maximizes this probability!

\[\hat{\theta}_{MLE} = \underset{\theta}{\operatorname*{arg max}} {\textrm{ } \mathcal{L}(\theta | x)}\]

This approach provides a good estimator even if the variable is continuous!

Evaluating Estimators#

MSE - Mean Squared Error

\[ MSE(\hat \theta) = \mathbb{E}_{\theta}|\hat \theta - \theta|^2\]
\[MSE(\hat \theta) = bias(\hat\theta)^2 + Var(\hat\theta)\]

Properties of an estimator:

  • unbiased: the expected value of the estimator is equal to the true parameter

  • consistent: as sample size grows, the estimator converges to the true parameter

  • efficient: the estimator has minimum variance

Note

MLE is Minimum Variance Unbiased Estimator as sample size grows.

MLE Widget#

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# loading steps

!wget --no-check-certificate 'https://docs.google.com/uc?export=download&confirm=t&id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq' -O background.wav

from scipy.io import wavfile

# reading background data
samplerate, sound = wavfile.read('background.wav')

import numpy as np

# first we split small intervals of 0.1s
sound_split = np.split(sound[:(len(sound)-len(sound)%samplerate)], len(sound[:(len(sound)-len(sound)%samplerate)])/samplerate*10)

# we calculate RMS for each interval
RMS_split = [np.sqrt(np.mean(np.square(group.astype('float')))) for group in sound_split]

# define the r.v. as X
X = 20*np.log10(RMS_split)

Hide code cell output

--2025-05-16 12:29:38--  https://docs.google.com/uc?export=download&confirm=t&id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq
Resolving docs.google.com (docs.google.com)... 142.250.217.78, 2607:f8b0:400a:804::200e
Connecting to docs.google.com (docs.google.com)|142.250.217.78|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq&export=download [following]
--2025-05-16 12:29:38--  https://drive.usercontent.google.com/download?id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq&export=download
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 142.251.33.97, 2607:f8b0:400a:80b::2001
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|142.251.33.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5759054 (5.5M) [audio/wav]
Saving to: ‘background.wav’

background.wav      100%[===================>]   5.49M  --.-KB/s    in 0.08s   

2025-05-16 12:29:41 (64.7 MB/s) - ‘background.wav’ saved [5759054/5759054]
# sample mean and variance
mean = np.mean(X)
std = np.std(X)
print(f"Mean: {mean:.2f}")
print(f"Std: {std:.2f}")
Mean: 41.92
Std: 1.82
# gaussian likelihood
gaussian_likelihood = stats.norm.pdf(X, loc=mean, scale=std)
L = np.prod(gaussian_likelihood)
print(L)
0.0

Caution

The likelihood evaluation involves a product of many numbers which often leads to numerical errors: either a rather big number or zero. Instead, we can maximize the logarithm of the likelihood, since log is a monotone function. The logarithm of the product is the sum of the logarithms.

# gaussian log-likelihood

logL = np.sum(np.log(gaussian_likelihood))
print(logL)
-1191.6218498671105

Note

Note that even if one sample point is outside of the range of the model’s probability density function, the evaluation at that point will be zero, and due to the independence assumption the whole likelihood will be zero. That in a way makes sense: if we observe a point which has probability zero for a specific model, then it means that this model has not produced this observation. In practice, there will be always outliers we may not want to fit to.

skewnorm_likelihood = stats.skewnorm.pdf(X, a=2, scale=std, loc=mean)
logL = np.sum(np.log(skewnorm_likelihood))
print(logL)
-1718.7251573821795

Exercise

Plug-in the numbers from the fitting widget and compare the log-likelihood of the two distributions.

Hide code cell source

def plot_skewnorm_density_L(a, scale, loc):
  h = plt.hist(X,bins=100, density=True, alpha=0.5)

  # evaluate the function at the histogram bins
  skewnorm_density = stats.skewnorm.pdf(h[1], a=a, scale=scale, loc=loc)

  # evaluation the function at the observations
  skewnorm_likelihood = stats.skewnorm.pdf(X, a=a, scale=scale, loc=loc)
  L = np.sum(np.log(skewnorm_likelihood))

  plt.plot(h[1], skewnorm_density)
  plt.title(f"Log-Likelihood {L:.10f}")

Hide code cell source

from ipywidgets import interact
import ipywidgets as widgets

Hide code cell source

shape_slider = widgets.FloatSlider(
    value=0,
    min=0.5,
    max=10.0,
    step=0.5,
    description='Shape:',
    #disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.1f',
)
scale_slider = widgets.FloatSlider(
    value=std,
    min=std-1,
    max=std+1,
    step=0.01,
    description='Scale:',
    #disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.1f',
)
loc_slider = widgets.FloatSlider(
    value=mean,
    min=mean-2,
    max=mean+2,
    step=0.01,
    description='Offset:',
    # disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.1f',
)
out = interact(plot_skewnorm_density_L, a = shape_slider, scale = scale_slider, loc = loc_slider)

Exercise

Select a combination of parameters which “maximizes” the log-likelihood.

Note

In practice, usually the maximum likelihood estimate is found numerically through an optimization procedure. In rare cases, such as for the parameters of the Gaussian distribution, it can be solved analytically and corresponds to the sample mean and standard deviation. Try to derive it yourself!