top of page

Machine Learning Exercise

Public·1 member

Machine Learning Practice Exercise 1

1) DESCRIPTION

80% of people who purchase car insurance are men. If the owners of 9 car insurance are randomly selected, then find the probability using binomial distribution that exactly X out of them are men

  • Read a number X from a line of input

  • Print the output rounded till 4 decimal point


Example:

Sample Input:

6

Sample Output:

0.1762



2) DESCRIPTION

If the probability of a profit or loss in investment is equal, find the probability using geometric distribution that an investor’s k investment is his first profit

  • Take input from the user k

  • Print the ouput and round up the output till three decimal points

Example:

Sample Input:

4

Sample Output:

0.062



3) DESCRIPTION

Conditional Probability

  • The probability of an event which is conditioned or dependent on another event is a Conditional Probability

  • Conditional Probability = P(A|B) = P(A and B)/P(B)

  • P(A|B) is the probability of event A occurring, given that event B occurs

You have the Member dataset, which is an input data file Members.csv present at the location /data/training/blackfriday.csv

This dataset contains information about information related to the people. Here’s a brief description of the columns in the sample dataset


Dataset Description:

The dataset contains data of 8 rows and 4 different columns. The columns are:

Gender: whether the particular person is male or female

Height: Height of the person

Weight: Weight of the person

Foot-size: Foot-size of the person

This is a preview of the data under consideration:













Question:

Calculate the probability of members height being more than 5 inches, given that member is female

Input Format:

The file to be read will be Members.csv, which contains the data as mentioned above. This file is in .csv format.

Example:

Sample Input:

https://media-doselect.com/Members.csv

Sample Output:

0.52

EXECUTION TIME LIMIT


4) DESCRIPTION

Write a program to perform the following operations:

1. Read a number X from a line of input, where X must be a float value

2. Input X represents the probability of a person being hit by a falling meteorite

3. Calculate the odds of a person being hit by a falling meteorite

4. Print the output and round up till 3 decimal points

Example:

Sample Input:

0.07

Sample Output:

7.527

5) DESCRIPTION

Average number of apples in a carton is 25 with variance of 36. Calculate the probability using normal distribution of number of apples less than X.

  • Read a number X from a line of input

  • Print the output and round up the till four decimal points

Example:

Sample Input:

28

Sample Output:

0.6915


6) DESCRIPTION

Black Friday falls on the Friday following the ‘Thanksgiving Day’ and is used as an occasion by many stores to offer highly promoted Sales.

You have the Black Friday dataset, which is an input data file blackfriday.csv present at the location /data/training/blackfriday.csv

This dataset contains information about purchases made in a retail store on Black Friday sale. Here’s a brief description of the columns in the sample dataset:

  • USER_ID: ID of the user

  • Gender: F or M

  • Age: Age group to which the customer belongs

  • Occupation: ID of occupation of the customer

  • City_Category: A or B or C

  • Stay_In_Current_City_Years: 0 to 4+

  • Marital_Status: 0: Unmarried, 1: Married

  • Purchase: Purchase amount in dollars

  • This is a preview of the data under consideration:




The retailer wants to analyse this data and improve its future sales based on the analysis. In all the questions of this Assignment, we have to perform analysis on this data.

  • Purchases made by customers on Black Friday sale are stored in the column named Purchase

  • Age represents the age group the customer belongs to out of 0-17, 18-25, 26-35, 36-45, 46-50, 51-55 and 55+

  • Gender represents the gender of the customer as F or M

  • City_Category represents the category of city the customer belongs to as A, B or C


In this question, we have to perform calculations on the above data as explained below.


Question:

Given that the age is 18-25, Calculate the probability of the number of people who have purchased above 10000


Input Format:

The file to be read will be blackfriday.csv, which contains the data as mentioned above. This file is in .csv format.


Hint:

Avoid using repetitive customers


Example:

Sample Input:

https://media-doselect.s3.amazonaws.com/generic/3M8qkrpOgMEwqevMR5kPon3v/blackfriday.csv

Sample Output:

0.3276


7) DESCRIPTION

Write a Python code to perform the following operations:


1. Create a list having 10 elements that are positive integer values

  • Read 10 input values on each line of input


2. Convert both the lists into series


3. Find the population mean and population standard deviation of the series using pandas

  • On first output line: Print the population mean and population standard deviation values rounded up to 3 decimal places and separated by a space


4. Draw a sample of 5 from the series

  • Use pandas.DataFrame.sample with the following parameters n=sample_size, random_state=1


5. Find the sample mean and sample standard deviation of the series using pandas

  • On the second output line: Print the sample mean and sample standard deviation values rounded up to 3 decimal places and separated by a space

Example:

Sample Input:

98 63 23 697 136 35 09 343 23 1

Sample Output:

142.8 219.589 53.4 60.111


8) DESCRIPTION

Dataset: mpg.csv

Dataset Description:

Data set contains 398 observations containing 8 variables.

Here’s a preview of the data under consideration:




Problem Statement

Based on this data set, write a Python code to perform the following operations:

1. Load the data set from the location of the file provided as input using pandas

2. Read a string on the second input line which specifies a quantitative data column name in the data set


3. Find the population mean and population standard deviation of the specified quantitative data column using pandas


4. Draw a sample of 200 from the specified quantitative data column

  • Use pandas.DataFrame.sample with the following parameters n=sample_size, random_state=1


5. Find the sample mean and sample standard deviation of the specified quantitative data column using pandas


6. Find the difference between the sample mean & population mean as well as sample standard deviation & population standard deviation

  • On first output line: Print the difference as <sample mean> - <population mean> rounded up to 3 decimal places

  • On second output line: Print the difference as <sample std deviation> - <population std deviation> rounded up to 3 decimal places

Example:

Sample Input:

https://media-doselect.com/mpg.csv weight

Sample Output:

22.07 34.815


9) DESCRIPTION

A food delivery company gets cancellations on x orders in a day out of 900 total orders. Each customer can make only one cancellation in a day. The company assumes that all customers are independent of each other.

Write a Python code to perform the following operations:

1. Read an integer input which specifies the number of cancelled orders

2. Find out the margin of error using scipy.stats.norm.ppf

  • On first output line: Print the margin of error value rounded up to 5 decimal places

3. Determine an approximate 95% confidence interval for the proportion of orders cancelled in a day

  • On second output line: Print the confidence interval values rounded up to 5 decimal places and separated by a space

Note:

  • Margin of Error = Critical Value*Standard Error of Statistic

  • Confidence Interval = Sample Statistic ± Margin of Error

Example: Let's say 300 out of 900 orders were cancelled

Sample Input:

300

The margin of error & confidence interval values should be printed as -

Sample Output:

0.02585 0.30749 0.35918


10) DESCRIPTION

Dataset: Property.csv

Dataset Description:

Data set contains 21613 observations containing 21 variables.

Here’s a preview of the data under consideration:




Problem Statement

Based on this data set, write a Python code to perform the following operations:

1. Load the data set from the location of the file provided as input using pandas

2. Read a string input which specifies a quantitative data column name in the data set


3. Find the population mean and population standard deviation of the specified quantitative data column using pandas

  • On first output line: Print the (1)population mean and (2)population standard deviation values rounded up to 3 decimal places and separated by a space


4. Draw a sample of 100 from the specified quantitative data column

  • Use pandas.DataFrame.sample with the following parameters n=sample_size, random_state=4


5. Find the sample mean and sample standard deviation of the specified quantitative data column using pandas

  • On second output line: Print the (1)sample mean and (2)sample standard deviation values rounded up to 3 decimal places and separated by a space


6. Check if the sample mean differs from the population mean using Hypothesis Testing

a) The hypothesis is stated as follows:

  • Null hypothesis = sample mean does not differ from the population mean

  • Alternate hypothesis = sample mean differs from the population mean


b) Perform a test at 95% confidence level and find out the z-statistic and critical value

  • On third output line: Print the (1)z-statistic and (2)critical value rounded up to 3 decimal places and separated by a space


c) Conclude the relationship between the sample mean and the population mean

  • On fourth output line: Print the hypothesis that holds true as per Point 1 in the Note given below

Note:

Point 1:

Z-statistics is

  • Lesser than critical value: fail to reject the null hypothesis

  • Greater than critical value: reject the null hypothesis

Point 2:

Make sure your code prints the hypothesis exactly as given above (i.e., lowercase letters and space between words)

Example:

Sample Input:

https://media-doselect.s3.amazonaws.com/generic/RkzkY87b8Y1QNRwG3QKwe94v/Property.csv price

Sample Output:

540088.142 367127.196 515254.41 280175.923 -0.676 1.645 fail to reject the null hypothesis

11) DESCRIPTION

Write a Python code to perform the following operations:

1. Read the following list defined below:

  • 763, 667, 593, 402, 348, 278, 123

2. Create another list having 7 elements that are positive integer values

  • Read 7 input values on each line of input

3. Check if there exists a relationship between means of the two lists using Hypothesis Testing

a) The hypothesis is stated as follows:

  • Null hypothesis = there is no relationship (independent)

  • Alternate hypothesis = there is a relationship

b) Perform a t-test using stats.ttest_ind and find out the p-value

  • On first output line: Print the p-value rounded up to 5 decimal places

c) Conclude the relationship between means of the two lists

  • On second output line: Print the hypothesis that holds true as per Point 1 in the Note given below

Note:

Point 1:

P-value is

  • Lesser than significance level (0.05): there is a relationship

  • Greater than significance level (0.05): there is no relationship (independent)

Point 2:

Make sure your code prints the hypothesis exactly as given above (i.e., lowercase letters and space between words)

Example:

Sample Input:

23 56 86 99 116 294 366

Sample Output:

0.00976 there is a relationship

229 Views
bottom of page