Search

# Poisson Model, Hurdle Model, Likelihood In Machine Learning ### Data

The spreadsheet CLGoals.xlsx contains the number of goals scored in each UEFA Champions.

League game to-date this season (three match weeks of sixteen games). The data are count data that take the values 0, 1, 2, ....

### Modeling

Poisson Model

The Poisson distribution is probably the most standard model for count data.

The Poisson model, with parameter λ, assumes that Thus, P{X = 0} = exp(−λ), P{X = 1} = λ exp(−λ), P{X = 2} = λ^2 exp(−λ)/2, ...

The expected value (mean) of the Poisson distribution is λ and the variance is also λ (thus,

the standard deviation is √λ).

Hurdle Model

The Hurdle model, with parameters θ and λ, assumes that Thus, P{X = 0} = θ, P{X = 1} = (1 − θ)λ exp(−λ)/(1 − exp(−λ)), ...

If θ = e^−λ

then the Hurdle model is the same as the Poisson model. If θ < e^−λ , then zeros are less likely than under a Poisson model. If θ > e^−λ , then zeros are more likely than under a Poisson model.

Likelihood

The likelihood function is defined to be the probability of the observed data for a given param-eter value. If we have independent observations x1, x2, . . . , xn, then the likelihood is The log-likelihood is (natural) logarithm of the likelihood, thus it takes the form 1. Read the data into R.

Hint: The read.xlsx() function in the openxlsx R package is useful for doing this.

2. Produce a table that tabulates frequency of each number of goals.

3. Produce a plot of the frequency of each number of goals.

4. Calculate the mean and the standard deviation of the number of goals.

1. Write a function that calculates the log-likelihood function (for a specified value of λ) for the Poisson model for the UEFA Champions League data.

2. Plot the log-likelihood function for a range of values of λ.

Hint: Make sure that λ = x is in the range.

3. Add a vertical line to the plot at the value x and visually verify that this maximizes the log-likelihood function.

4. Simulate 48 values from a Poisson model with λ = x and summarize the resulting values (contrasting them with the summaries produced in Task 1).

5. Simulate 48 values from a Poisson model for other values of λ and summarize

1. Create a dHurdle() function that has arguments x, param that computes P{X = x} for

the Hurdle model, where the first element of the vector param is θ and the second element of the vector param is λ. Ensure that the function can handle x being a vector of values.

2. Write a function that calculates the log-likelihood function (for a specified value of param) for the Hurdle model for the UEFA Champions League goal data.

3. Use the optim function to find the value of θ and λ that maximizes the log-likelihood.

Hint: optim minimizes functions, by default, so you may want to write a function that

computes minus the log-likelihood and minimize that.

Alternatively, you can set control=list(fnscale=-1) as an argument in optim to make

it maximize.

4. Comment on the value of θ found and compare the log-likelihood values found for the Poisson and Hurdle models.

### Recent Posts

See All

#### Deploying Machine Learning Models in SageMaker - AWS Cloud.

Tel: (+91) 0120  4118730

Time :   10 : 00  AM -  08 : 00 PM IST