Product Sold Data Analysis | Sample Assignment

Pushkar Nandgaonkar
Apr 26, 2022
3 min read

Updated: May 10, 2022

Question 1

You are provided with a “Historical_Data.csv” from a company named ABC which sell products online.

The dataset (Historical data) contains sales record (on daily basis) from different countries

Use Historical_Data.csv to build different time-series models for each Article ID. This file is located in res/Historical_Data.csv.

To perform the above exercise carry out the following tasks:

1. Print number of days which sold more than 3 units.

Hint:

01/08/2017 | IN | 2

01/08/2017 | FR | 3

Total sale for the day will be (3+2) = 5

2. Print sales of the country(FR) in the month of August.

3. Print total units sold in the country(AT).

Hint: Pre-process the date column to DateTime type

Final Output Sample:

Output Format:

Perform the above operations and write your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.

Note: Write code only in solution() function and do not pass any additional arguments to the function. For predefined stub refer stub.py

Question 2

You are provided with a “Historical_Data.csv” from a company named ABC which sells products online.

Historical data contains sales record (on daily basis ) from different countries

Use Historical_Data.csv to build different time-series models for each Article ID.

This CSV file is located in res/Historical_Data.csv.

You will observe that for some dates the sales were not made. Add 0 as ‘Sold_Units’ and ‘Article_ID’ for such dates.

Example: If sales for country ‘FR’ was made on 2017-03-02 and next sale on 2017-03-04 then, for 2017-03-03, country ‘FR’ fill 0 for ‘Sold_Units’ and ‘Article_ID’.

a. Once data pre-processing is done, print the starting date of sale for ‘FR’.

Example: If the first sale of country ‘FR’ was on 2018-02-04 then the output(in the same format) should be:

Output: 2018-02-04

b. Print the number of non-selling days for the country('AT')

Example: Total non-selling days for AT is 150

Output: 150

Final Output Sample:

Output Format:

Perform the above operations and write your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.

Note: Write code only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py

Question 3

You are provided with a “Historical_Data.csv” from a company named ABC which sells products online.

Historical data contains sales record (on daily basis ) from different countries

Use Historical_Data.csv to build different time-series models for each Country Code.

This file is located in res/Historical_Data.csv.

NOTE: Use Auto Arima for prediction.

For the above scenario,

Perform backtesting for each time-series on the last 10 values and report the Mean Absolute Error country-wise for each of the last 10 dates (up to 3 decimal places).

Final Output Sample:

Steps to be followed:

1. Load the data from “Historical_Data.csv” by taking the date column as an index and sorting it by date column and save it in dataframe(df) 2. From the above dataframe(df) extract out the data for “Sold units” countrywise and store in some dataframe (say “df2”) using a for a loop. Here for each iteration it will extract out the sold units for each country, irrespective of “Article_id”. 3. Then on the extracted “df2” perform a splitting task. Take the last 10 rows to the 'test’ dataframe and rest of the above to “train” dataframe. (This is known as Backtesting)

4. Use ARIMA model on the above “train” and “test” dataframes and calculate the predictions and store it in a separate dataframe (say “Final_df”). In this “Final_df” we will store the “Sold_units” and “Predicted_units” for each country.

Syntax of Auto-Arima:

from pmdarima.arima import auto_arima model=auto_arima(train,trace=True,suppress_warnings=True,error_action='ignore')
model.fit(train)

5. Then finally calculate the Mean Absolute Error by using the Sold Units and Predicted_units for the respective countries. Hint: You can use “sklearn.metrics” package

NOTE: All the above tasks should be done in a loop wherein each iteration represents a particular country irrespective of Article Id.

Output Format:

Perform the above operations and write your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.

Note: Forecasting model might take some time to generate the output on the test dataset, request your patience in this interim.

Note: Write code only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py. This question will be manually evaluated and score will be allotted accordingly