top of page

Machine Learning With R | Sample Assignment | Assignment Help

Updated: Mar 26, 2021



Answer each of the following five questions. All of the questions are based on one dataset, which is an extract from the 2018 European Social Survey.


The dataframe ess2018 will have the following variables:


Variable: satisfaction_life

level: 0-10

Descriptiion: How satisfied are you with your life?


Variable: satisfaction_economy

level: 0-10

Descriptiion: How satisfied are you with the present state of the economy?


Variable: satisfaction_government

level: 0-10

Descriptiion: Thinking about the [country] government, how satisfied are you with the way it is doing its job?


Variable: satisfaction_democracy

level: 0-10

Descriptiion: How satisfied are you with the way democracy works in [country]?


Variable: satisfaction_education

level: 0-10

Descriptiion: Please say what you think overall about the state of education in [country]


Variable: satisfaction_health_services

level: 0-10

Descriptiion: Please say what you think overall about the state of health services in [country] nowadays?


Variable: immigration_same_ethnicity

level: 0-4

Descriptiion: To what extent do you think [country] should allow people of the same race or ethnic group as most [country]’s people to come and live here?


Variable: immigration_diff_ethnicity

level: 0-4

Descriptiion: To what extent do you think [country] should allow people of a different race or ethnic group as most [country]’s people to come and live here?


Variable: immigration_world_poor

level: 0-4

Descriptiion: To what extent do you think [country] should allow people from the poorer countries outside Europe to come and live here?


Variable: country

level: Nominal

Descriptiion: Country Name


Variable: gender

level: Nominal

Descriptiion: Male or Female


Variable: age

level: 15-90

Descriptiion: Age in Years


Variable: degree

level: Nominal

Descriptiion: TRUE = holds university degree


Variable: weight

level: Interval

Descriptiion: ESS survey weight


The 0-10 scales for the first four satisfaction variables run from 0 (“Very dissatisfied”) to 10 (“Very Satisfied”) and the scales for the last two from “Very bad” to “Very good”. No labels were provided to survey respondents for the intermediate numerical values, only for 0 and 10. The 1-4 scales for the immigration questions run (1) “Allow many to come and live here”, (2) “Allow some”, (3) “Allow a few”, (4) “Allow none”.


You may find it useful to extract the satisfaction_ questions into a separate data frame using this command:

satisfaction_questions <- ess2018[,grep("satisfaction",names(ess2018))]


Many of the questions below have more than one right answer. If I ask for you to identify/describe “one” or “two” of something, that does not mean that there are only that many good answers to the question.


Question 1

Consider the following four measures, constructed for each respondent ii from their responses to the three immigration_ questions:


A. the mean of the individual’s responses

to immigration_same_ethnicity, immigration_diff_ethnicity,

and immigration_world_poor


B. the difference between the individual’s

responses: immigration_same_ethnicity -

 immigration_diff_ethnicity


C. the difference between the individual’s

responses: immigration_diff_ethnicity -

 immigration_world_poor


D. the minimum value of the individual’s responses

to immigration_same_ethnicity, immigration_diff_ethnicity,

and immigration_world_poor


Note for the last of these that minimum means the numerical minimum value, see the above statement for how the numerical levels relate to the survey responses.

State the following for each of the four measures:

  • What is the range of the measure?

  • What are the units of the measure?

  • What assumptions are made in the construction of the measure?

  • What concept does the measure come closest to measuring?

Question 2

A: Construct a set of histograms for the satisfaction_ variables and calculate the means of all the variables as well. Comment on what we learn from the means of the different variables.

B: Construct an equal weight index of the six satisfaction_ variables.

C: If you had to put a label on the concept measured by this equal weight index, what would it be? Please provide a short justification.

D: Identify an alternative concept that you might have measured with a subset of these indicators. What is the concept, and which indicators would you use to measure that concept?


Question 3

A: Fit a linear regression (lm()) for the equal weight index with dummy variables for countries and include a weight=ess2018$weight argument so that you are using the survey weights. Describe general patterns in which countries’ citizens have higher and lower values of the satisfaction index.


B: Now, extend the analysis in part A to separately analyse the equal weight index for for survey respondents with and without degrees, in each country.

There are many ways you could do this, but whichever analysis you do, state clearly what you have done, and describe what we learn about the relationship between educational background and the index, and how it varies across European countries. You will likely find it helpful to present theresults of your analysis in a table and/or figure.


Question 4

A: Use cor() to examine the pairwise correlations between the satisfaction_ variables. Describe any major patterns that you see.


B: Use prcomp() to do principle components analysis on the satisfaction_ variables. Examine the coefficients and give an interpretation for the first principle component


C: Examine the coefficients and give an interpretation for the second principle

components.


D: Create the screeplot for this principle components analysis. What do we

learn from this?


E: Explain how the application of factor analysis to these data would be similar/different than using PCA as you have done in the preceding items.

You do not need to run the factor analysis, describe the similarities/differences at a conceptual level.


Question 5

Use the following commands to do a k-means clustering on the satisfaction_ variables where k=4k=4:

set.seed(42) # this will ensure that you get the same clusters as everyone else


kmeans_4 <- kmeans(satisfaction_questions,centers=4)


What do the four clusters correspond to? Do whatever analysis that you need to do in order to establish what distinguishes the four clusters, and to explain how they relate to the underlying indicators and to the previous analysis that we did with principle components analysis. Write 2-3 paragraphs answering these questions, with supporting tables and figures as required.


You can send your requirement/project/assignment files, directly at contact@codersarts.com, and get our instant assistance or CONTACT on below details


bottom of page