Welcome to Turing Machine Data Scientist Program: Use-case 3

Mini Project - Data Visualization

Welcome to your third case study on Data Visualization!
  • In this task you will asked to draw few plots on the weather data you scraped earlier.
  • The data is provided in the data.csv file in the current working directory.
  • You will be specifically using seaborn package to draw the plots.
  • Your plots must match with the expected plots provided in each task.

Task 1: Run the below cell to import the necessary packages.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

Task 2: Draw a box plot on the average_temperature column across each year.

  • - use seaborn.boxplot()
  • - set the height and width to 8 and 15 repectively.
  • - assign the plot object to variable plot1.

Part A:

import pandas as pd
# Read the dataset form data.csv file
df = pd.read_csv('/projects/challenge/question/data.csv')

df["Day"] = pd.to_datetime(df["Day"])
df.set_index("Day", inplace=True)


Part B:

# Start code here
import matplotlib.pyplot as plt
df2 = df.copy()
df2["year"] = df2.index.year


# Set figure size
plt.figure(figsize=(15, 8))

# Create the box plot
plot1 = sns.boxplot(x="year", y="Average temperature (°F)", data=df2)
# Set labels and title
plt.ylabel("Average temperature (°F)")
plt.title("Average temperature across years")

# Show the plot

Task 3: Draw correlation heatmap for all the available features.

  • - set the height and width as 10 repectively.
  • - make sure the correlation values are annoted for each combination of features.
  • - assign the plot object to variable plot2
###Start code here
import matplotlib.pyplot as plt
# Compute the correlation matrix
correlation_matrix = df.corr()

# Set figure size
plt.figure(figsize=(10, 10))

# Create the heatmap
plot2 = sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)

# Set title

# Show the plot

Task 4: Draw a distribution plot using seaborn for average_temperature column.

  • set bins to 20
  • set the height and width to 8,8.
  • assign the plot object to plot3 variable
###Start code here
import matplotlib.pyplot as plt
# Set figure size
plt.figure(figsize=(8, 8))

# Create the distribution plot
plot3 = sns.distplot(df["Average temperature (°F)"], bins=20, kde=True)
plt.xlabel("Average temperature")
plt.title("Average temperature distribution")
# Show the plot

Task 5: Draw seaborn violin plot on mximum_pressure column.

  • - set gridsize to 100
  • - set figsize to 8,8
  • - assign the plot object to variable plot4
###Start code here
import matplotlib.pyplot as plt
# Set figure size
plt.figure(figsize=(8, 8))

# Create violin plot for "Maximum pressure"
plot4 = sns.violinplot(x = df["Maximum pressure"], gridsize=100)
plt.xlabel("Maximum pressure")
plt.title("Violin plot")

# Show the plot

Task 6:Run the below cell to save your plot objects.

import pickle
with open("plot1.pickle", "wb") as file:
    pickle.dump(plot1, file)

with open("plot2.pickle", "wb") as file:
    pickle.dump(plot2, file)

with open("plot3.pickle", "wb") as file:
    pickle.dump(plot3, file)

with open("plot4.pickle", "wb") as file:
    pickle.dump(plot4, file)

