Mini-Project for Data Scientist - Data Visualization Fresco Play Handson Solution

Learn Data visualization include using Matplotlib, Seaborn, and tools like scatter plots, histograms, box plots, and heatmaps to analyze data insights

Welcome to Turing Machine Data Scientist Program: Use-case 3

Mini Project - Data Visualization

Welcome to your third case study on Data Visualization!
  • In this task you will asked to draw few plots on the weather data you scraped earlier.
  • The data is provided in the data.csv file in the current working directory.
  • You will be specifically using seaborn package to draw the plots.
  • Your plots must match with the expected plots provided in each task.

Task 1: Run the below cell to import the necessary packages.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

Task 2: Draw a box plot on the average_temperature column across each year.

  • - use seaborn.boxplot()
  • - set the height and width to 8 and 15 repectively.
  • - assign the plot object to variable plot1.

Part A:

import pandas as pd
# Read the dataset form data.csv file
df = pd.read_csv('/projects/challenge/question/data.csv')

df["Day"] = pd.to_datetime(df["Day"])
df.set_index("Day", inplace=True)

df.head()

Part B:

# Start code here
import matplotlib.pyplot as plt
df2 = df.copy()
df2["year"] = df2.index.year

print(df2.shape)
df2.head(5)

# Set figure size
plt.figure(figsize=(15, 8))

# Create the box plot
plot1 = sns.boxplot(x="year", y="Average temperature (°F)", data=df2)
# Set labels and title
plt.xlabel("Day")
plt.ylabel("Average temperature (°F)")
plt.title("Average temperature across years")

# Show the plot
plt.show()

Task 3: Draw correlation heatmap for all the available features.

  • - set the height and width as 10 repectively.
  • - make sure the correlation values are annoted for each combination of features.
  • - assign the plot object to variable plot2
###Start code here
import matplotlib.pyplot as plt
# Compute the correlation matrix
correlation_matrix = df.corr()

# Set figure size
plt.figure(figsize=(10, 10))

# Create the heatmap
plot2 = sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)

# Set title
plt.title("Correlation")

# Show the plot
plt.show()

Task 4: Draw a distribution plot using seaborn for average_temperature column.

  • set bins to 20
  • set the height and width to 8,8.
  • assign the plot object to plot3 variable
###Start code here
import matplotlib.pyplot as plt
# Set figure size
plt.figure(figsize=(8, 8))

# Create the distribution plot
plot3 = sns.distplot(df["Average temperature (°F)"], bins=20, kde=True)
 
plt.xlabel("Average temperature")
plt.title("Average temperature distribution")
# Show the plot
plt.show()

Task 5: Draw seaborn violin plot on mximum_pressure column.

  • - set gridsize to 100
  • - set figsize to 8,8
  • - assign the plot object to variable plot4
###Start code here
import matplotlib.pyplot as plt
# Set figure size
plt.figure(figsize=(8, 8))

# Create violin plot for "Maximum pressure"
plot4 = sns.violinplot(x = df["Maximum pressure"], gridsize=100)
plt.xlabel("Maximum pressure")
plt.title("Violin plot")

# Show the plot
plt.show()

Task 6:Run the below cell to save your plot objects.

import pickle
with open("plot1.pickle", "wb") as file:
    pickle.dump(plot1, file)

with open("plot2.pickle", "wb") as file:
    pickle.dump(plot2, file)

with open("plot3.pickle", "wb") as file:
    pickle.dump(plot3, file)

with open("plot4.pickle", "wb") as file:
    pickle.dump(plot4, file)

About the author

D Shwari
I'm a professor at National University's Department of Computer Science. My main streams are data science and data analysis. Project management for many computer science-related sectors. Next working project on Al with deep Learning.....

Post a Comment