Perform a multivariate OLS Regression with Python. Be sure to do the following:
For each categorical variable, create dummy variables that are suitable for use in an OLS regression. Note that you will need to be careful to avoid multicollinearity
Given the available regressors, explain whether or not you would keep all of the categorical variables you identified above. Why or why not?
Run Regressions and Evaluate Results
Using the two regressors that were added (time on the job, year of birth), run a multivariate regression. Be sure to run a model that includes all possible (or a large number of) regressors.
Based on your regressions, determine which regressors are significant. Why?
Indicate which regressors are worth keeping and explain why
Rerun the regression with the reduced complexity model.
Inspect the residuals and determine whether or not they are normally distributed. Be sure to provide rationale for your determination.
Compare the results of your regression to the results of the univariate regression you previously performed. If there is an improvement in the key metrics, is the increased complexity justified? Which metric would you use to justify this?
If you had unlimited programming skills, determine the additional regressors that you would create from the available data. Why?
Describe how your interpretations of the regression can help inform business decisions.
What to Submit:
A Jupyter notebook in HTML format with annotations that explain your work