Learn Data Cleaning in python while working with the dataframe, DateTime module, iloc and loc uses, apply Lambda module to the Columns and more.
Welcome to Turing Machine Data Scientist Program: Use-case 2 - Exploratory Data Analysis

Q1: What is the standard deviation of maximum windspeed across all the days.

Solution: 1

# import the data file for this hands-on.
import pandas as pd
import numpy as np

#dataDOTcsv  = ""

fl = pd.read_csv("data.csv")

# Task 1:  What is the standard deviation of maximum windspeed across all the days

temp = np.std(fl["Maximum windspeed (mph)"]).round(2)
q1 = temp.round(2)


Q2: What is the difference between 50th percentile and 75th percentile of average temperature.

Solution: 2

# import the data file for this hands-on.
import pandas as pd
import numpy as np
fl = pd.read_csv("data.csv")

# Task 2: What is the standard deviation of maximum windspeed across all the days

a = fl["Average temperature (°F)"].quantile(0.75)
b = fl["Average temperature (°F)"].quantile(0.50)
q2 = round(a-b, 2)


Q3: What is the pearson correlation between average dew point and average temperature.

Solution: 3

#Task 3: What is the pearson correlation between average dew point and average temperature.

temp = fl["Average dewpoint (°F)"].corr(fl["Average temperature (°F)"])
q3 = round(temp, 2)

Q4: Out of all the available records which month has the lowest average humidity.

Solution: 4

# Task 4: Out of all the available records which month has the lowest average humidity.
effected_col = 'Average humidity (%)'
return_col = "Day"

lowst_Avg = min(fl[effected_col])

col = fl[effected_col]

temp = fl.loc[col ==  lowst_Avg, return_col]
dates = temp.iloc[0]

dt_indx = pd.date_range(dates, periods = 1, freq ='M') 
q4 = dt_indx.month[0]


Q7: Average Tempture bwtn months - March 2010 to May 2012

Solution: 7

#Task 7 : Average Tempture bwtn months - March 2010 to May 2012

fl["Day_New"] = pd.to_datetime(fl['Day'], format='%d/%m/%Y')
ans = fl[(fl["Day_New" ] >= '2010/03/01') & ( fl["Day_New" ] <= '2012/05/31' )]

avg_temp = ans.describe().loc["mean"][0]
# fl.drop( columns=["Day_New"], inplace=True)  # to delete new column

ans = round(avg_temp,2)
q7 = ans


Q8: Find the range of averange temperature on Dec 2010

Solution: 8

#Task 8: Find the range of averange temperature  on Dec 2010 

fl["Day_New"] = pd.to_datetime(fl['Day'], format='%d/%m/%Y')
temp = fl[(fl["Day_New" ] >= '2010/12/01') & ( fl["Day_New" ] <= '2010/12/31' )]
# avg_temp_dec = temp.describe().loc["mean"][0]

mx = temp["Average temperature (°F)"].max()
mn = temp["Average temperature (°F)"].min()
avg_temp_dec  = round(mx-mn, 2)
q8 = avg_temp_dec

Q9: Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure

Solution: 9

#Task9 : Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure

fl['pressure_diff'] = fl['Maximum pressure '] - fl['Minimum pressure ']

max_press_indx = fl['pressure_diff'].idxmax()
max_press_date = fl['Day_New'][max_press_indx]
ans = max_press_date.strftime('%Y-%m-%d')
q9 = ans

Q10: How many days falls under median (i.e equal to median value) of barrometer reading.

Solution: 10

#Task 10: How many days falls under median (i.e equal to median value) of barrometer reading.

medn = fl["Average barometer (in)"].median()
medn_fltr = filter(lambda x :   x==medn , fl["Average barometer (in)"])
ans = list(medn_fltr).count(medn)
q10 = ans

Q11: Out of all the available records how many days are within one standard deviation of average temperaturem

Solution: 11

# Task 11: Out of all the available records how many days are within one standard deviation of average temperaturem

avg_temp_std= round(fl["Average temperature (°F)"].std(),2)
avg_temp_mean = round(fl.iloc[:,1].mean(),2)

num_days_std= len(fl[(fl["Average temperature (°F)"] >= avg_temp_mean-avg_temp_std) & (fl.iloc[:,1] <= avg_temp_mean + avg_temp_std)] )

print('num_days_std =',num_days_std)
q11 = num_days_std


Q5 - Q6: Which month has the highest median for maximum_gust_speed out of all the available records.

Solution: 5-6

### Which month has the highest median for maximum_gust_speed out of all the available records. 
# Also find the repective value - hint: group by month

# Try to write the code for this problem, and in case faced any isssue, feel free to write on Comment-Box.

# If you write your solution properly then you will get below response.

# q5 = 34.50
# q6 = 2


