Welcome to Turing Machine Data Scientist Program: Use-case 3
Mini Project - Data Visualization
Welcome to your third case study on Data Visualization!
- In this task you will asked to draw few plots on the weather data you scraped earlier.
- The data is provided in the data.csv file in the current working directory.
- You will be specifically using seaborn package to draw the plots.
- Your plots must match with the expected plots provided in each task.
Task 1: Run the below cell to import the necessary packages.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
Task 2: Draw a box plot on the average_temperature column across each year.
- - use seaborn.boxplot()
- - set the height and width to 8 and 15 repectively.
- - assign the plot object to variable plot1.
Part A:
import pandas as pd
# Read the dataset form data.csv file
df = pd.read_csv('/projects/challenge/question/data.csv')
df["Day"] = pd.to_datetime(df["Day"])
df.set_index("Day", inplace=True)
df.head()
Part B:
# Start code here
import matplotlib.pyplot as plt
df2 = df.copy()
df2["year"] = df2.index.year
print(df2.shape)
df2.head(5)
# Set figure size
plt.figure(figsize=(15, 8))
# Create the box plot
plot1 = sns.boxplot(x="year", y="Average temperature (°F)", data=df2)
# Set labels and title
plt.xlabel("Day")
plt.ylabel("Average temperature (°F)")
plt.title("Average temperature across years")
# Show the plot
plt.show()
Task 3: Draw correlation heatmap for all the available features.
- - set the height and width as 10 repectively.
- - make sure the correlation values are annoted for each combination of features.
- - assign the plot object to variable plot2
###Start code here
import matplotlib.pyplot as plt
# Compute the correlation matrix
correlation_matrix = df.corr()
# Set figure size
plt.figure(figsize=(10, 10))
# Create the heatmap
plot2 = sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
# Set title
plt.title("Correlation")
# Show the plot
plt.show()
Task 4: Draw a distribution plot using seaborn for average_temperature column.
- set bins to 20
- set the height and width to 8,8.
- assign the plot object to plot3 variable
###Start code here
import matplotlib.pyplot as plt
# Set figure size
plt.figure(figsize=(8, 8))
# Create the distribution plot
plot3 = sns.distplot(df["Average temperature (°F)"], bins=20, kde=True)
plt.xlabel("Average temperature")
plt.title("Average temperature distribution")
# Show the plot
plt.show()
Task 5: Draw seaborn violin plot on mximum_pressure column.
- - set gridsize to 100
- - set figsize to 8,8
- - assign the plot object to variable plot4
###Start code here
import matplotlib.pyplot as plt
# Set figure size
plt.figure(figsize=(8, 8))
# Create violin plot for "Maximum pressure"
plot4 = sns.violinplot(x = df["Maximum pressure"], gridsize=100)
plt.xlabel("Maximum pressure")
plt.title("Violin plot")
# Show the plot
plt.show()
Task 6:Run the below cell to save your plot objects.
import pickle
with open("plot1.pickle", "wb") as file:
pickle.dump(plot1, file)
with open("plot2.pickle", "wb") as file:
pickle.dump(plot2, file)
with open("plot3.pickle", "wb") as file:
pickle.dump(plot3, file)
with open("plot4.pickle", "wb") as file:
pickle.dump(plot4, file)