Welcome to Turing Machine Data Scientist Program: Use-case 2 - Exploratory Data Analysis
Q1: What is the standard deviation of maximum windspeed across all the days.
q1- What is the standard deviation of maximum windspeed across all the days.
Solution: 1
# import the data file for this hands-on. import pandas as pd import numpy as np #dataDOTcsv = "https://hr-projects-assets-prod.s3.amazonaws.com/c3pde3c3lgm/963fbab228e2896e79fc09e385ab377d/data.csv" fl = pd.read_csv("data.csv") # Task 1: What is the standard deviation of maximum windspeed across all the days temp = np.std(fl["Maximum windspeed (mph)"]).round(2) q1 = temp.round(2)
Q2: What is the difference between 50th percentile and 75th percentile of average temperature.
q2- What is the difference between 50th percentile and 75th percentile of average temperature.
Solution: 2
# import the data file for this hands-on. import pandas as pd import numpy as np fl = pd.read_csv("data.csv") # Task 2: What is the standard deviation of maximum windspeed across all the days a = fl["Average temperature (°F)"].quantile(0.75) b = fl["Average temperature (°F)"].quantile(0.50) q2 = round(a-b, 2)
Q3: What is the pearson correlation between average dew point and average temperature.
q3- What is the pearson correlation between average dew point and average temperature.
Solution: 3
#Task 3: What is the pearson correlation between average dew point and average temperature. temp = fl["Average dewpoint (°F)"].corr(fl["Average temperature (°F)"]) q3 = round(temp, 2)
Q4: Out of all the available records which month has the lowest average humidity.
q4- Out of all the available records which month has the lowest average humidity.
Solution: 4
# Task 4: Out of all the available records which month has the lowest average humidity. effected_col = 'Average humidity (%)' return_col = "Day" lowst_Avg = min(fl[effected_col]) col = fl[effected_col] temp = fl.loc[col == lowst_Avg, return_col] dates = temp.iloc[0] dt_indx = pd.date_range(dates, periods = 1, freq ='M') q4 = dt_indx.month[0]
Q7: Average Tempture bwtn months - March 2010 to May 2012
q7- Average Tempture bwtn months - March 2010 to May 2012
Solution: 7
#Task 7 : Average Tempture bwtn months - March 2010 to May 2012 fl["Day_New"] = pd.to_datetime(fl['Day'], format='%d/%m/%Y') ans = fl[(fl["Day_New" ] >= '2010/03/01') & ( fl["Day_New" ] <= '2012/05/31' )] avg_temp = ans.describe().loc["mean"][0] # fl.drop( columns=["Day_New"], inplace=True) # to delete new column ans = round(avg_temp,2) print(ans) q7 = ans
Q8: Find the range of averange temperature on Dec 2010
q8- Find the range of averange temperature on Dec 2010
Solution: 8
#Task 8: Find the range of averange temperature on Dec 2010 fl["Day_New"] = pd.to_datetime(fl['Day'], format='%d/%m/%Y') temp = fl[(fl["Day_New" ] >= '2010/12/01') & ( fl["Day_New" ] <= '2010/12/31' )] # avg_temp_dec = temp.describe().loc["mean"][0] mx = temp["Average temperature (°F)"].max() mn = temp["Average temperature (°F)"].min() avg_temp_dec = round(mx-mn, 2) print(avg_temp_dec) q8 = avg_temp_dec
Q9: Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure
q9- Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure
Solution: 9
#Task9 : Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure fl['pressure_diff'] = fl['Maximum pressure '] - fl['Minimum pressure '] max_press_indx = fl['pressure_diff'].idxmax() max_press_date = fl['Day_New'][max_press_indx] ans = max_press_date.strftime('%Y-%m-%d') print(ans) q9 = ans
Q10: How many days falls under median (i.e equal to median value) of barrometer reading.
q10- How many days falls under median (i.e equal to median value) of barrometer reading.
Solution: 10
#Task 10: How many days falls under median (i.e equal to median value) of barrometer reading. medn = fl["Average barometer (in)"].median() medn_fltr = filter(lambda x : x==medn , fl["Average barometer (in)"]) ans = list(medn_fltr).count(medn) print(ans) q10 = ans
Q11: Out of all the available records how many days are within one standard deviation of average temperaturem
q11- Out of all the available records how many days are within one standard deviation of average temperaturem
Solution: 11
# Task 11: Out of all the available records how many days are within one standard deviation of average temperaturem avg_temp_std= round(fl["Average temperature (°F)"].std(),2) avg_temp_mean = round(fl.iloc[:,1].mean(),2) num_days_std= len(fl[(fl["Average temperature (°F)"] >= avg_temp_mean-avg_temp_std) & (fl.iloc[:,1] <= avg_temp_mean + avg_temp_std)] ) print('num_days_std =',num_days_std) q11 = num_days_std
Q5 - Q6: Which month has the highest median for maximum_gust_speed out of all the available records.
q5-q6 : Which month has the highest median for maximum_gust_speed out of all the available records.
Solution: 5-6
### Which month has the highest median for maximum_gust_speed out of all the available records. # Also find the repective value - hint: group by month # Try to write the code for this problem, and in case faced any isssue, feel free to write on Comment-Box. # If you write your solution properly then you will get below response. # q5 = 34.50 # q6 = 2