Sequence Models Fresco Play Handson Solution HackerRank

Sequence Models using Deep Learning Fresco Play Handson Solution HackerRank.

Lab 1: Welcome to Sequence Models Handson: LSTM

In this hands on you will build a model that once trained on a peice of text data can generate its own sequnce of words in a similar fashion as in trained data:

Follow the instructions provided for each cell and and code accordingly.
In order to run the cell press shift+enter.
make sure you have run all the cells before submitting the hands on.

Task 1: Run the below cell to import the necessary packages

# Task 1:
import numpy as np
import pandas as pd
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.utils import np_utils

Task 2: Assign the array of tokens to variable training_data

Read the text data from story.txt file and split the text into seperate tokens, assign the array of tokens to variable training_data:

Expected output:-
array(['long', 'ago', ',', 'the', 'mice', 'had', 'a', 'general', 'council', 'to']

#Task 2: 
### Start code here
def get_word_vectors(file_name):
    values = []    
    with open(file_name, 'r') as file:
        for line in file:
            values= np.array(line.split() )                
    return values

training_data = get_word_vectors('story.txt')
###End code
training_data[:10]

#['long', 'ago', ',', 'the', 'mice', 'had', 'a', 'general', 'council', 'to']

Task 3: Generate unique tokens:

Instructions:

Take the unique tokens in training_data and sort them alphabetical order and assign the sorted list to variable words.
Create dictionary ind_to_word to map index to word.
Create another dictionary word_to_ind to reverse map word to their respective index.

# Task 3: 
####Start code here
words = sorted(set(training_data))
ind_to_word = dict(enumerate(words))
word_to_ind = {word: index for index, word in ind_to_word.items()}
###End code
print("words: ", words[:10], "\n")
print("index_to_words: ", list(ind_to_word.items())[:10], "\n")
print("word_to_index: ", list(word_to_ind.items())[:10], "\n")

# Output:-
# words:  [',', '.', '?', 'a', 'about', 'ago', 'agree', 'all', 'always', 'an'] 
# index_to_words:  [(0, ','), (1, '.'), (2, '?'), (3, 'a'), (4, 'about'), (5, 'ago'), (6, 'agree'), (7, 'all'), (8, 'always'), (9, 'an')] 
# word_to_index:  [('attached', 16), ('we', 100), ('if', 41), ('long', 48), ('applause', 12), ('easily', 30), ('until', 94), ('neck', 58), ('she', 76), ('while', 105)]

Task 4: Write a function to generate training dataset

Instructions:

Parameters:
- dataset: orginal dataset
- look_back: the window size that tells the number of previous values in the series to look for to predict the next one.
- returns: feature and target arrays
Example 1: For window size 1:
- dataset = [1,2,3,4]
- feature = [[1],[2],[3]]
- target = [2,3,4]
Example 2: For window size 2:
- dataset = [1,2,3,4]
- feature = [[1,2],[2,3]]
- target = [3,4]

Expected output when you when you call generate_dataset on training_data and look_back = 10 : input: [[48, 5, 0, 85, 56, 37, 3, 35, 28, 92], [5, 0, 85, 56, 37, 3, 35, 28, 92, 25], [0, 85, 56, 37, 3, 35, 28, 92, 25, 102]]
labels: [25, 102, 53]

#Task 4:
####Start code here
def generate_dataset(dataset, look_back=10):
#     print(dataset)
    features = []
    targets = []
    
    # Iterate through the dataset to create features and targets
    for i in range(len(dataset) - look_back):
        # Create the feature set using the specified look_back window
        feature = dataset[i:i + look_back]
        
        features.extend([list(map(lambda k: word_to_ind.get(k)  ,feature))])
        
        # The target is the next value after the look_back window
        target = dataset[i + look_back]
        
        targets.append(word_to_ind.get(target))
        
    return features, targets

inputs, labels = generate_dataset(training_data, 10)
print("input: ", inputs[:3])
print("labels: ", labels[:3])

# Output:--
input:  [[48, 5, 0, 85, 56, 37, 3, 35, 28, 92], [5, 0, 85, 56, 37, 3, 35, 28, 92, 25], [0, 85, 56, 37, 3, 35, 28, 92, 25, 102]]
labels:  [25, 102, 53]

Task 5: Next step is to reshape the inputs and normalize them.

#Task 5: Just run the cell.
look_back = 10
X_modified = np.reshape(inputs, (len(inputs), look_back, 1))
X_modified = X_modified / float(len(words))
Y_modified = np_utils.to_categorical(labels)

print("X_modified shape:", X_modified.shape)
print("Y_modified shape:", Y_modified.shape) 

#Output: 
# X_train_pad shape: (194, 10, 1)
# Y_train shape: (194, 112)

TasK 6: Create a model having two LSTM block and one fully connected layer.

In this hands on you will build a model that once trained on a peice of text data can generate its own sequnce of words in a similar fashion as in trained data:

Using keras Sequential() class create a model having two LSTM block and one fully connected layer with activation softmax.
Apply dropout with probability p = 0.2 between the layers of LSTM.
Compile the model with categorical_crossentropy loss and adam optimizer.

#Task 6:
# Set the random seed for reproducibility
np.random.seed(51)

# Start code here
model = Sequential()

# First LSTM layer
model.add(LSTM(400, return_sequences=True, input_shape=(look_back, 1)))  # Adjust input_shape as necessary
model.add(Dropout(0.2))

# Second LSTM layer
model.add(LSTM(400))
model.add(Dropout(0.2))

# Fully connected layer with softmax activation
model.add(Dense(112, activation='softmax'))  # Adjust the number of units based on your class count

# End code

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam')

# Print the model summary
model.summary()

Task 7: Run model.fit() on train data with features as X_modified and target as Y_modified for 50 epoches and batch_size = 10.

#Task 7:
###Start code here
train_logs = model.fit(X_modified, Y_modified, epochs=50, batch_size=10) 
###End code
with open("output.txt", "w+") as file:
    file.write("train score {0:.2f}\n".format(train_logs.history["loss"][-1]))

Task 8: The below codes takes a random sequence of words and generates more sequnce using the model you trained above.

#bin/python8
import numpy as np

string_mapped = inputs[50].copy()
full_string = [ind_to_word[value] for value in string_mapped]
# generating characters
for i in range(100):
    x = np.reshape(string_mapped,(1,len(string_mapped), 1))
    x = x / float(len(words))
    pred_index = np.argmax(model.predict(x, verbose=0))
    seq = [ind_to_word[value] for value in string_mapped]
    full_string.append(ind_to_word[pred_index])

    string_mapped.append(pred_index)
    string_mapped = string_mapped[1:len(string_mapped)]

    
txt=""
for word in full_string:
    txt = txt+" "+word
print(txt)

Lab 2: Welcome to GRU-Time Series Prediction

Task 1: Run the below cell to import necessary packages

# Task 1:
import numpy
import numpy as np
import matplotlib.pyplot as plt
import pandas
import pandas as pd
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

Task 2: Read a CSV file and assign the "passenger_count" column to a variable.

Instruction:

Read data from air_line.csv file using pandas and assign the values of "passenger_count" column to variable dataset, typecast passenger count values values to float32
Expected output [[112.] [118.] [132.] [129.] [121.] [135.] [148.] [148.] [136.] [119.]]

# Task 2:
###Start code here
dataset = np.array(pd.read_csv("air_line.csv")["passenger_count"].astype(np.float32).values.reshape(-1, 1))
###End code
#dataset = dataset.astype('float32')
print(dataset[:10])

Task 3: Normalize the values of dataset

Instruction:

Use MinMaxScaler to normalize the values of dataset between the range 0 to 1
Expected output [[0.01544401] [0.02702703] [0.05405405] [0.04826255] [0.03281853] [0.05984557] [0.08494207] [0.08494207] [0.06177607] [0.02895753]]

# Task 3:
###Start code here
# Normalize the dataset between 0 and 1
scaler = MinMaxScaler(feature_range=(0, 1))
dataset  = scaler.fit_transform(dataset)

###End code
print(dataset[:10])

Task 4: Train test split the dataset Assign to first 100 values of dataset to variable train and remaining values to variable test

Expected output:
100 44

# Task 4:
###Start code here
train, test = (dataset[:100],dataset[100:])
###End code
print(len(train), len(test))

Task 5: Write a function to generate training dataset

Instructions:

Parameters:
- dataset: orginal dataset
- look_back: the window size that tells the number of previous values in the series to look for to predict the next one.
- returns: feature and target arrays
Example 1: For window size 1:
- dataset = [1,2,3,4]
- feature = [[1],[2],[3]]
- target = [2,3,4]
Example 2: For window size 2:
- dataset = [1,2,3,4]
- feature = [[1,2],[2,3]]
- target = [3,4]

# Task 5:
def generate_dataset(data, look_back=1):    
    X, y = [], []
    for i in range(len(data)-look_back-1):
        a = data[i:(i+look_back), 0]
        X.append(a)
        y.append(data[i + look_back, 0])
    return np.array(X), np.array(y)

Task 6: Run the below cell uses the method you defined above to generate feature and target datasets on train and teat data

Expected output!

(98, 1)
(98,)
(42, 1)
(42,)
[[0.01544401]
[0.02702703]]
[0.02702703 0.05405405]

# Task 6:
look_back = 1
trainX, trainY = generate_dataset(train, look_back)
testX, testY = generate_dataset(test, look_back)
print(trainX.shape)
print(trainY.shape)
print(testX.shape)
print(testY.shape)
print(trainX[:2])
print(trainY[:2])

Task 7: Reshape the trainX and testX dataset to (number of samples, 1, look_back)

Expected output!

(98,1,1)
(98,)
(42,1,1)
(42,)

# Task 7:
###Start code here
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))
###End code
print(trainX.shape)
print(trainY.shape)
print(testX.shape)
print(testY.shape)

Task 8: Using keras Sequential() class create a model having one GRU block (with 4 neurons) and one dense layer compile the model with mean_squared_error loss adam optimizer

# Task 8:
np.random.seed(51)
###Start code here
model = Sequential()


# Add a GRU layer with 4 neurons
model.add(GRU(4, input_shape=(look_back, 1), name='gru_5'))  # input_shape is (timesteps, features)

# Add a Dense layer
model.add(Dense(1, name='dense_6'))  # Output layer with 1 neuron

# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')

###End code
model.summary()

Task 9: Run model.fit() on train data for 30 epoches and batch_size = 1

# Task 9:
###Start code
model.fit(trainX, trainY, epochs=30, batch_size=1)
###End code

Task 10: Using model.predict assign the predicted output on trainX and test X to trainPredicy and testPredict variables respectively since the data was normalized previously invert the values to their original form (hint: use .invert_transform())

# Task 10:
import numpy as np
import math
from sklearn.metrics import mean_squared_error

# Assuming model, scaler, trainX, trainY, testX, and testY are already defined

# Start code here
trainPredict = model.predict(trainX)  # Predict on train data
testPredict = model.predict(testX)     # Predict on test data

# Invert the predictions back to their original form
trainPredict = scaler.inverse_transform(trainPredict)  # Invert train predictions
trainY = scaler.inverse_transform(trainY.reshape(-1, 1))  # Invert train targets
testPredict = scaler.inverse_transform(testPredict)    # Invert test predictions
testY = scaler.inverse_transform(testY.reshape(-1, 1))    # Invert test targets
### End code

# Calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict))  # Compare full arrays
print('Train Score: %.2f RMSE' % (trainScore))

testScore = math.sqrt(mean_squared_error(testY, testPredict))  # Compare full arrays
print('Test Score: %.2f RMSE' % (testScore))

# Writing results to output file
with open("output.txt", "w+") as file:
    file.write("train score {0:.2f}\n".format(trainScore))
    file.write("test score {0:.2f}".format(testScore))

PDFcup.com

Sequence Models Fresco Play Handson Solution HackerRank

Lab 1: Welcome to Sequence Models Handson: LSTM

Task 1: Run the below cell to import the necessary packages

Task 2: Assign the array of tokens to variable training_data

Task 3: Generate unique tokens:

Task 4: Write a function to generate training dataset

Task 5: Next step is to reshape the inputs and normalize them.

TasK 6: Create a model having two LSTM block and one fully connected layer.

Task 7: Run model.fit() on train data with features as X_modified and target as Y_modified for 50 epoches and batch_size = 10.

Task 8: The below codes takes a random sequence of words and generates more sequnce using the model you trained above.

Lab 2: Welcome to GRU-Time Series Prediction

Task 1: Run the below cell to import necessary packages

Task 2: Read a CSV file and assign the "passenger_count" column to a variable.

Task 3: Normalize the values of dataset

Task 4: Train test split the dataset Assign to first 100 values of dataset to variable train and remaining values to variable test

Task 5: Write a function to generate training dataset

Task 6: Run the below cell uses the method you defined above to generate feature and target datasets on train and teat data

Task 7: Reshape the trainX and testX dataset to (number of samples, 1, look_back)

Task 8: Using keras Sequential() class create a model having one GRU block (with 4 neurons) and one dense layer compile the model with mean_squared_error loss adam optimizer

Task 9: Run model.fit() on train data for 30 epoches and batch_size = 1

Task 10: Using model.predict assign the predicted output on trainX and test X to trainPredicy and testPredict variables respectively since the data was normalized previously invert the values to their original form (hint: use .invert_transform())

About the author

Post a Comment

PDFcup.com