top of page

Working With Weather Database

Updated: Mar 25, 2021



Step 1 – Import the Weather Database into your SQL Server Management Studio

1. Open SQL Server Management Studio.

Note: a recorded copy of the following steps can be found at https://njit.webex.com/mw3300/mywebex/nbrshared.do

2. Right click on database in the navigation window and select New Database



3. When the screen below appears, enter Weather as shown in the circle below and press the OK button


4. Next right click the Weather database and select Tasks and then Import Data as shown below


5. The import wizard will start and the following window will open. Click the NEXT button



6. When the window below opens, select FLAT FILE as the data source and the window will expand to have the following additional selections

7. Click the Browse button and navigate to where you unzipped the zip file in step 2 and select the AQS_Sires.txt file as shown below


8. Select the COLUMNS option in the left hand navigation tab to make sure the columns have been properly identified. The screen should look like the following


9. Next select the ADVANCED option and the following screen will appear. Once it does, you need to change the OUTPUT COLUMN WIDTH to 255 by click on the original width (shown as 50) and overwriting it with 255. This must be done for the Met_Site_Type, Met_Site_Direction, Owning_Agency, Local_Site_Name, Address, City Name, CBSA_Name and Tribe_Name and then click the NEXT button


10. Next, change the Destination to SQL Server Native Client and make sure the Server and Authentication information is correct and then hit NEXT


11. Hit NEXT when the following window appears


12. Hit the FINISH button when the following windows appears


13. Click FINISH when the following screen appears


14. Hit FINISH and the following screen will show the progress and have the below information when it finishes


Database Description

1. The database contains 2 tables


a. Aqs_sites – contains information regarding the sites where the temperature information in the Temperature table was collected. All the column should be self explanatory. The linkage between the 2 tables use the State_Code, County_Code and Site_Number columns


b. Temperature – contains the daily temperature information collected at the site for 2 decades. The important columns for the assignment are:

  • i. State_Code, County_Code and Site_Number are used to join to the aqs_sites table

  • ii. Date_Local – The date that the sample was collected

  • iii. Average_Temp – The average temperature for that particular date

  • iv. Daily_High_Temp – The highest temperature for the day

  • v. All temperatures are in degrees Fahrenheit

2. Suggestion: Due to the large number of records in the database, your queries may take several minutes to execute. If your queries are taking a long time to run, I suggest you make a table containing a small subset of the data in the Temperature table to use for writing and debugging your queries. After they all execute successfully, you can change the queries to use the full Temperature table


3. Grading – You will receive separate grades for Part 3, Part 4 and Part 5 (if you complete Part 5 for the extra credit). Both parts 3 and 4 will be worth 100 points each and each question in the step will be worth a proportionate value (1/# of questions in the step) Parts 3 and 4 will make up 50% of the Project grade. The extra credit (Part 5) will be worth 15% extra.



Creating Geospatial Data

Your last concern is how long will it take to travel back home to visit friends and family after you move. Since the Weather database has latitude and longitude information, you have decided to convert this information into a new column with a GEOGRAPHY data type and populate the new column with a set command and one of the following formula (Depending on the data type for latitude and longitude) The example below is for when latitude and longitude are varchar.


Use Weather
go
IF NOT EXISTS(
 SELECT *
 FROM sys.columns 
 WHERE Name = N'GeoLocation'
 AND Object_ID = Object_ID(N'AQS_Sites'))
BEGIN
 ALTER TABLE AQS_Sites ADD GeoLocation Geography NULL
END
go
UPDATE aqs_sites
SET GeoLocation = geography::STPointFromText('POINT(' + LONGITUDE + ' ' + 
 [Latitude] + ')', 4326)
where (LATITUDE is not null and 
 Longitude is not null) and 
 Longitude <> '0' and 
Longitude <>''

Submission 1 – Problems

You are trying to decide where in the US to reside. The most important factor to you is temperature, you hate cold weather. Answer the following questions to help you make your decision. For all problems show all columns included in the examples. Note that the term temperature applies to the average daily temperature unless otherwise stated.

1. Determine the date range of the records in the Temperature table

First Date Last Date

1986-01-01 2017-05-09

2. Find the minimum, maximum and average of the average temperature column for each state sorted by state name.

State_Name Minimum Temp Maximum Temp Average Temp

Alabama -4.662500 88.383333 59.328094

Alaska -43.875000 80.791667 29.146757

Arizona -99.000000 135.500000 67.039050

3. The results from question #2 show issues with the database. Obviously, a temperature of -99 degrees Fahrenheit in Arizona is not an accurate reading as most likely is 135.5 degrees. Write the queries to find all suspect temperatures (below -39o and above 105o). Sort your output by State Name and Average Temperature.

State_Name state_code County_Code Site_Number average_Temp date_local

Wisconsin 55 059 0002 -58.000000 2002-03-28

Washington 53 009 0013 -50.000000 2012-10-17

Texas 48 141 0050 106.041667 1991-07-28

Texas 48 141 0050 106.291667 1991-07-25

4. You noticed that the average temperatures become questionable below -39 o and above 125 o and that it is unreasonable to have temperatures over 105 o for state codes 30, 29, 37, 26, 18, 38. You also decide that you are only interested in living in the United States, not Canada or the US territories. Create a view that combines the data in the AQS_Sites and Temperature tables. The view should have the appropriate SQL to exclude the data above. You should use this view for all subsequent queries. My view returned 5,616,112 rows. The view includes the State_code, State_Name, County_Code, Site_Number, Make sure you include schema binding in your view for later problems.

5. Using the SQL RANK statement, rank the states by Average Temperature

State_Name Minimum Temp Maximum Temp Average Temp State_rank

Florida 35.96 88.00 73.348137 1

Texas -1.13 122.60 68.793757 2

Mississippi 22.23 91.16 68.493975 3

6. At this point, you’ve started to become annoyed at the amount of time each query is taking to run. You’ve heard that creating indexes can speed up queries. Create an index for your view. You are required to create an index with the unique and clustered parameters and the index will be on the State_Code, County_Code, Site_Number, Date_Local columns.

Note: There are a couple of thousand duplicate rows that you must delete before you can create a unique index. I used the Rownumber parameter in a partition statement and deleted any row where the row number was greater than 1.

To see if the indexing help, add print statements that write the start and stop time for the query in question #2 and run the query before and after the indexes are created. Note the differences in the times. Also make sure that the create index steps include a check to see if the index exists before trying to create it.

The following is a sample of the output that should appear in the messages tab that you will need to calculate the difference in execution times before and after the indexes are created

Begin Question 6 before Index Create At - 13:40:03

(777 row(s) affected)

Complete Question 6 before Index Create At - 13:45:18

7. You’ve decided that you want to see the ranking of each high temperatures for each city in each state to see if that helps you decide where to live. Write a query that ranks (using the rank function) the states by averages temperature and then ranks the cities in each state. The ranking of the cities should restart at 1 when the query returns a new state. You also want to only show results for the 15 states with the highest average temperatures.

Note: you will need to use multiple nested queries to get the State and City rankings, join them together and then apply a where clause to limit the state ranks shown.

State_Rank State_Name State_City_Rank City_Name Average Temp

1 Florida 1 Not in a City 73.975759

1 Florida 2 Pinellas Park 72.878784

1 Florida 3 Valrico 71.729440

1 Florida 4 Saint Marks 69.594272

2 Texas 1 McKinney 76.662423

2 Texas 2 Mission 74.701098

8. You notice in the results that sites with Not in a City as the City Name are include but do not provide you useful information. Exclude these sites from all future answers. You can do this by either adding it to the where clause in the remaining queries or updating the view you created in #4


9. You’ve decided that the results in #8 provided too much information and you only want to 2 cities with the highest temperatures and group the results by state rank then city rank.

State_Rank State_Name State_City_Rank City_Name Average Temp

1 Florida 1 Pinellas Park 72.878784

1 Florida 2 Valrico 71.729440

2 Louisiana 1 Baton Rouge 69.704466

2 Louisiana 2 Laplace (La Place) 68.115400

10. You decide you like the average temperature to be in the 80's. Pick 2 cities that meets this condition and calculate the average temperature by month for those 2 cities. You also decide to include a count of the number of records for each of the cities to make sure your comparisons are being made with comparable data for each city.

Hint, use the datepart function to identify the month for your calculations.

City_Name Month # of Records Average Temp

Mission 1 620 60.794048

Mission 2 565 64.403861

Mission 3 588 69.727512

11. You assume that the temperatures follow a normal distribution and that the majority of the temperatures will fall within the 40% to 60% range of the cumulative distribution. Using the CUME_DIST function, show the temperatures for the same 3 cities that fall within the range.

City_Name Avg_Temp Temp_Cume_Dist

Mission 73.916667 0.400686891814539

Mission 73.956522 0.400829994275902

Mission 73.958333 0.402404121350887

12. You decide this is helpful, but too much information. You decide to write a query that shows the first temperature and the last temperature that fall within the 40% and 60% range for the 3 cities your focusing on.

City_Name 40 Percentile Temp 60 Percentile Temp

Mission 73.956522 80.083333

Pinellas Park 71.958333 78.125000

Tucson 63.750000 74.250000

13. You remember from your statistics classes that to get a smoother distribution of the temperatures and eliminate the small daily changes that you should use a moving average instead of the actual temperatures. Using the windowing within a ranking function to create a 4 day moving average, calculate the moving average for each day of the year.

Hint: You will need to datepart to get the day of the year for your moving average. You moving average should use the 3 days prior and 1 day after for the moving average.

City_Name Day of the Year Rolling_Avg_Temp

Mission 1 59.022719

Mission 2 58.524868

Mission 3 58.812967

Mission 364 60.657749

Mission 365 61.726333

Mission 366 61.972514



We are also providing other database assignments help like, MongoDB, Oracle, PostgreSQL, MySQL, etc. If you need or looking for any database assignments or project help then you can please contact us at: contact@codersarts.com

Comments


bottom of page