Mini-Project for T3 Wings - Exploratory Data Analysis Fresco Play Hands-on Solution HackerRank

Master Exploratory Data Analysis (EDA) with key functions & data preprocessing techniques to detect patterns, anomalies, and insights efficiently.

Wings - Machine First Al - Exploratory Data Analysis

Instructions:
  • The data required for this task has been provided in the file 'data.csv'
  • Read the questions provided for each cell and assign your answers to respective variables provided in the following cell.
  • If answers are floating point numbers round of upto two floating point after the decimal. For example, 10.546 should be read as 10.55; 10.544 as 10.54 and 10.1 as 10.10
  • pandas and numpy packages are preinstalled and these packages should to sufficient to solve this task.
  • Please don't change variable name meant to assign your answers.
#Run this cell to import the Packages.
import pandas as pd
import numpy as np
### Read the data (this will not be graded)
df = pd.read_csv('data.csv')
df.head()

Task 1: What is the standard deviation of maximum windspeed across all the days.

Note: ws_std should be of type float.

ws_std =  round(df['Maximum windspeed (mph)'].std(),2)
ws_std

Task 2: What is the difference between 50th percentile and 75th percentile of average temperature.

Note: p_range should be of type float

p50 = np.percentile(df['Average temperature (°F)'], 50)
p75= df['Average temperature (°F)'].quantile(0.75)

p_range = float(round(p75-p50, 2))
p_range 
p50 = np.percentile(df['Average temperature (°F)'], 50)
p75= df['Average temperature (°F)'].quantile(0.75)

p_range = float(round(p75-p50, 2))
p_range 

Task 3; What is the pearson correlation between average dew point and average temperature.

Note: corr should be of type float


Solution 1  
correlation_matrix = df[['Average temperature (°F)', 'Average dewpoint (°F)']].corr(method='pearson')
corr = float(round(correlation_matrix.loc['Average temperature (°F)', 'Average dewpoint (°F)'],2))
print(corr)


Solution B
import scipy.stats as stats
cor_coeff_sts, p_value = stats.pearsonr(df['Average temperature (°F)'], df['Average dewpoint (°F)'])
corr = float( round(cor_coeff_sts,2))
print(corr)

Task 4: Out of all the available records which month has the lowest average humidity.

- Assign your answer as month index, for example if its July index is 7. Note: dew_month should be of type int

df["Day"] = pd.to_datetime(df["Day"], format="%d/%m/%Y")

# Extract month
df["Month"] = df["Day"].dt.month

dew_month = int(df.groupby("Month")["Average humidity (%)"].min().idxmin())
dew_month 

Task 5: Which month has the highest median for maximum_gust_speed out of all the available records. Also find the respective value - hint: group by month Note:

max_gust_value should be of type float
max_gust_month should be of type int
  
  
max_gust_value = float(round(df.groupby("Month")["Maximum gust speed (mph)"].median().max(),2))
max_gust_month = int(df.groupby("Month")["Maximum gust speed (mph)"].median().idxmax())

max_gust_value , max_gust_month

Task 6: Determine the average temperature between the months of March 2010 to May 2012 (including both the months) Note: avg_temp should be of type float

df2 = df.set_index("Day", inplace=False)
avg_temp = float(round(df2.loc["2010-03":"2012-05"]['Average temperature (°F)'].mean(),2))
avg_temp

Task 7: Find the range of average temperature on Dec 2010 Note: temp_range should be of type float

  temp_range = float(round(df2.loc["2010-12"]['Average temperature (°F)'].agg(["min", "max"]).diff().iloc[-1],2))
temp_range

Task 8: Out of all available records which day has the highest difference between maximum_pressure and minimum_pressure - assign the date in string format as 'yyyy-mm-dd'. Make sure you enclose it with single quote

df2['Pressure_Diff'] = df2['Maximum pressure '] - df2['Minimum pressure ']
max_p_range_day = df2['Pressure_Diff'].idxmax().strftime('%Y-%m-%d')   # f"'{df2['Pressure_Diff'].idxmax().strftime('%Y-%m-%d')}'"
print( max_p_range_day)   

Task 9: How many days falls under median (i.e equal to median value) of barometer reading. Note: median_b_days should be of type int

median_b_days = int((df['Average barometer (in)'] == df['Average barometer (in)'].median()).sum())
median_b_days  

Task 10: Out of all the available records how many days are within one standard deviation of average temperature Note: num_days_std should be of type int

m, s =  df['Average temperature (°F)'].agg(['mean','std'])
ub = m+s
lb = m-s
print(ub, lb)

num_days_std = int(df[(df['Average temperature (°F)']>=lb) & (df['Average temperature (°F)']<=ub)].shape[0])
num_days_std  

About the author

D Shwari
I'm a professor at National University's Department of Computer Science. My main streams are data science and data analysis. Project management for many computer science-related sectors. Next working project on Al with deep Learning.....

Post a Comment