Welcome to Turing Machine Data Scientist Program: Use-case 2 - Exploratory Data Analysis
Q1: What is the standard deviation of maximum windspeed across all the days.
q1- What is the standard deviation of maximum windspeed across all the days.
Solution: 1
# import the data file for this hands-on.
import pandas as pd
import numpy as np
#dataDOTcsv = "https://hr-projects-assets-prod.s3.amazonaws.com/c3pde3c3lgm/963fbab228e2896e79fc09e385ab377d/data.csv"
fl = pd.read_csv("data.csv")
# Task 1: What is the standard deviation of maximum windspeed across all the days
temp = np.std(fl["Maximum windspeed (mph)"]).round(2)
q1 = temp.round(2)
Q2: What is the difference between 50th percentile and 75th percentile of average temperature.
q2- What is the difference between 50th percentile and 75th percentile of average temperature.
Solution: 2
# import the data file for this hands-on.
import pandas as pd
import numpy as np
fl = pd.read_csv("data.csv")
# Task 2: What is the standard deviation of maximum windspeed across all the days
a = fl["Average temperature (°F)"].quantile(0.75)
b = fl["Average temperature (°F)"].quantile(0.50)
q2 = round(a-b, 2)
Q3: What is the pearson correlation between average dew point and average temperature.
q3- What is the pearson correlation between average dew point and average temperature.
Solution: 3
#Task 3: What is the pearson correlation between average dew point and average temperature.
temp = fl["Average dewpoint (°F)"].corr(fl["Average temperature (°F)"])
q3 = round(temp, 2)
Q4: Out of all the available records which month has the lowest average humidity.
q4- Out of all the available records which month has the lowest average humidity.
Solution: 4
# Task 4: Out of all the available records which month has the lowest average humidity.
effected_col = 'Average humidity (%)'
return_col = "Day"
lowst_Avg = min(fl[effected_col])
col = fl[effected_col]
temp = fl.loc[col == lowst_Avg, return_col]
dates = temp.iloc[0]
dt_indx = pd.date_range(dates, periods = 1, freq ='M')
q4 = dt_indx.month[0]
Q7: Average Tempture bwtn months - March 2010 to May 2012
q7- Average Tempture bwtn months - March 2010 to May 2012
Solution: 7
#Task 7 : Average Tempture bwtn months - March 2010 to May 2012
fl["Day_New"] = pd.to_datetime(fl['Day'], format='%d/%m/%Y')
ans = fl[(fl["Day_New" ] >= '2010/03/01') & ( fl["Day_New" ] <= '2012/05/31' )]
avg_temp = ans.describe().loc["mean"][0]
# fl.drop( columns=["Day_New"], inplace=True) # to delete new column
ans = round(avg_temp,2)
print(ans)
q7 = ans
Q8: Find the range of averange temperature on Dec 2010
q8- Find the range of averange temperature on Dec 2010
Solution: 8
#Task 8: Find the range of averange temperature on Dec 2010
fl["Day_New"] = pd.to_datetime(fl['Day'], format='%d/%m/%Y')
temp = fl[(fl["Day_New" ] >= '2010/12/01') & ( fl["Day_New" ] <= '2010/12/31' )]
# avg_temp_dec = temp.describe().loc["mean"][0]
mx = temp["Average temperature (°F)"].max()
mn = temp["Average temperature (°F)"].min()
avg_temp_dec = round(mx-mn, 2)
print(avg_temp_dec)
q8 = avg_temp_dec
Q9: Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure
q9- Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure
Solution: 9
#Task9 : Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure
fl['pressure_diff'] = fl['Maximum pressure '] - fl['Minimum pressure ']
max_press_indx = fl['pressure_diff'].idxmax()
max_press_date = fl['Day_New'][max_press_indx]
ans = max_press_date.strftime('%Y-%m-%d')
print(ans)
q9 = ans
Q10: How many days falls under median (i.e equal to median value) of barrometer reading.
q10- How many days falls under median (i.e equal to median value) of barrometer reading.
Solution: 10
#Task 10: How many days falls under median (i.e equal to median value) of barrometer reading.
medn = fl["Average barometer (in)"].median()
medn_fltr = filter(lambda x : x==medn , fl["Average barometer (in)"])
ans = list(medn_fltr).count(medn)
print(ans)
q10 = ans
Q11: Out of all the available records how many days are within one standard deviation of average temperaturem
q11- Out of all the available records how many days are within one standard deviation of average temperaturem
Solution: 11
# Task 11: Out of all the available records how many days are within one standard deviation of average temperaturem
avg_temp_std= round(fl["Average temperature (°F)"].std(),2)
avg_temp_mean = round(fl.iloc[:,1].mean(),2)
num_days_std= len(fl[(fl["Average temperature (°F)"] >= avg_temp_mean-avg_temp_std) & (fl.iloc[:,1] <= avg_temp_mean + avg_temp_std)] )
print('num_days_std =',num_days_std)
q11 = num_days_std
Q5 - Q6: Which month has the highest median for maximum_gust_speed out of all the available records.
q5-q6 : Which month has the highest median for maximum_gust_speed out of all the available records.
Solution: 5-6
### Which month has the highest median for maximum_gust_speed out of all the available records.
# Also find the repective value - hint: group by month
# Try to write the code for this problem, and in case faced any isssue, feel free to write on Comment-Box.
# If you write your solution properly then you will get below response.
# q5 = 34.50
# q6 = 2