Lab 1: Optimization and Hyperparameter Tuning - Optimization
Task 1: Optimization: momentum_rms_handsOn_question.ipynb
# Run this cell to import packages
import pandas as pd
import numpy as np
from test_opthyptuning_optimization import optimization
import matplotlib.pyplot as plt
import matplotlib.colors
Task 2: Read the 'data.csv' file using pandas.
Instruction!
- The data is provided as file named 'data.csv'.
- Using pandas read the csv file and assign the resulting dataframe to variable 'data'
- For example: if file name is 'xyz.csv' read file as pd.read_csv('xyz.csv')
###Start code here
data = pd.read_csv('data.csv') # 'data.csv' or 'blobs.csv'
###End code here
data.head()
# Output:
'''
feature1 feature2 feature3 feature4 feature5 feature6 feature7 feature8 feature9 feature10 class
0 -1.272708 0.343939 -1.987229 1.053235 -0.676002 -0.883291 -1.910100 -0.564239 -0.037298 -0.356574 0.0
1 -0.848200 0.218246 -0.573916 0.134973 -0.095297 0.161004 -0.526738 0.001871 0.205737 0.103360 0.0
2 2.345462 0.086694 -0.513989 0.275638 -0.176749 -0.236385 -0.494515 -0.149078 -0.013771 -0.096156 0.0
3 1.842869 -0.530773 1.146976 -0.135130 0.110948 -0.652808 1.032876 -0.134870 -0.583415 -0.370725 1.0
4 1.729844 -0.201752 1.913738 -1.198502 0.759804 1.303649 1.866575 0.722823 0.271639 0.568036 1.0
'''
Task 3: Split the train and test values from DataFrame.
Instruction!
- Extract all the feature values from dataframe 'data' and assign it to variable 'X'
- Extract target variable 'class' and assign it to variable 'y'.
- Hint: Use .values to exract values from dataframe
###Start code here
cols = [ i for i in data.columns if 'feature' in i ]
X = data[cols].values
y = data['class'].values
###End code
print(X.shape)
print(y.shape)
assert X.shape == (10000, 10)
assert y.shape == (10000, )
# Output:
#(10000, 10)
#(10000,)
Task 4: Plot the data in x-y axis.
Instruction!
- Run the below cell to visualize the data in x-y plane. (visualization code has been written for you)
- The green spots corresponds to target value 0 and green spots corresponds to target value 1
- Though the data is more than 2 dimension only first two features are considered for visualization
colors=['green','blue']
cmap = matplotlib.colors.ListedColormap(colors)
#Plot the figure
plt.figure()
plt.title('Non-linearly separable classes')
plt.scatter(X[:,0], X[:,1], c=y,
marker= 'o', s=50,cmap=cmap,alpha = 0.5 )
plt.show()
Task 5: Transpose and reshape DataFrame values.
Instruction:
- In order to feed the network the input has to be of shape (number of features, number of samples) and target should be of shape (1, number of samples)
- Transpose X and assign it to variable 'X_data'
- Reshape y to have shape (1, number of samples) and assign to variable 'y_data'
X_data = X.T
y_data = y.reshape(1,len(y))
print(X_data.shape)
print(y_data.shape)
assert X_data.shape == (10, 10000)
assert y_data.shape == (1, 10000)
Task 6: Define the network dimension to have 10 input features, two hidden layers with 9 nodes each, one output node at final layer.
layer_dims = [10,9,9,1]
Task 7: import tensorflow as tf.
import tensorflow as tf
Task 8:
Define a function named placeholders to return two placeholders one for input data as A_0 and one for output data as Y.
- Set the datatype of placeholders as float64
- parameters - num_features
- Returns - A_0 with shape (num_feature, None) and Y with shape(1,None)
def placeholders(num_features):
A_0 = tf.placeholder(dtype = tf.float64, shape = ([num_features,None]))
Y = tf.placeholder(dtype = tf.float64, shape = ([1,None]))
return A_0,Y
Task 9:
Define function named initialize_parameters_deep() to initialize weights and bias for each layer.
- Use tf.random_normal_initializer() to initialise weights and tf.zeros() to initialise bias. Set datatype as float64
- Parameters - layer_dims
- Returns - dictionary of weights and bias
def initialize_parameters_deep(layer_dims):
tf.set_random_seed(1)
L = len(layer_dims)
parameters = {}
for l in range(1,L):
parameters['W' + str(l)] = tf.get_variable("W" + str(l), shape=[layer_dims[l], layer_dims[l-1]], dtype = tf.float64,
initializer=tf.random_normal_initializer())
parameters['b' + str(l)] = tf.get_variable("b"+ str(l), shape = [layer_dims[l], 1], dtype= tf.float64, initializer= tf.zeros_initializer() )
return parameters
Task 10:
Define functon named linear_forward_prop() to define forward propagation for a given layer.
Parameters: A_prev(output from previous layer), W(weigth matrix of current layer), b(bias vector for current layer),activation(type of activation to be used for out of current layer)
returns: A(output from the current layer)
Use relu activation for hidden layers and for final output layer return the output unactivated i.e if activation is sigmoid
def linear_forward_prop(A_prev,W,b, activation):
Z = tf.add(tf.matmul(W, A_prev), b)
if activation == "sigmoid":
A = Z
elif activation == "relu":
A = tf.nn.relu(Z)
return A
Task 11:
Define forward propagation for entire network as l_layer_forward()
Parameters: A_0(input data), parameters(dictionary of weights and bias)
returns: A(output from final layer)
def l_layer_forwardProp(A_0, parameters):
A = A_0
L = len(parameters)//2
for l in range(1,L):
A_prev = A
A = linear_forward_prop(A_prev,parameters['W' + str(l)],parameters['b' + str(l)], "relu")
#call linear forward prop with relu activation
A = linear_forward_prop(A, parameters['W' + str(L)], parameters['b' + str(L)], "sigmoid" )
#call linear forward prop with sigmoid activation
return A
Task 12: Define the cost function.
Instructions.
- First define the original cost using tensoflow's sigmoid_cross_entropy function
- If regularization == True add regularization term to original cost function
- Parameters:
- Z_final: output fro final layer
- Y: actual output
- parameters: dictionary of weigths and bias
- regularization : boolean
- lambd: regularization parameter
def final_cost(Z_final, Y , parameters, regularization = False, lambd = 0):
cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=Z_final,labels=Y)
if regularization:
reg_term = 0
L = len(parameters)//2
for l in range(1,L+1):
###Start code
# Add L2 loss term for each layer's weights
reg_term += tf.reduce_sum(tf.square(parameters['W' + str(l)]))
###End code
cost = cost + (lambd/2) * reg_term
return tf.reduce_mean(cost)
Task 13: Define the function to generate mini-batches. Important: Use np.random.permutation for generate random indicies.
import numpy as np
def random_samples_minibatch(X, Y, batch_size, seed = 1):
np.random.seed(seed)
###Start code
m = X.shape[1] # Number of samples
num_batches = m // batch_size # Number of complete batches; number of batches derived from batch_size
###End code
indices = np.random.permutation(m) # generate ramdom indicies
shuffle_X = X[:,indices]
shuffle_Y = Y[:,indices]
mini_batches = []
#generate minibatch
for i in range(num_batches):
X_batch = shuffle_X[:, i * batch_size:(i + 1) * batch_size]
Y_batch = shuffle_Y[:, i * batch_size:(i + 1) * batch_size]
assert X_batch.shape == (X.shape[0], batch_size)
assert Y_batch.shape == (Y.shape[0], batch_size)
mini_batches.append((X_batch, Y_batch))
#generate batch with remaining number of samples
if m % batch_size != 0:
X_batch = shuffle_X[:, num_batches * batch_size:]
Y_batch = shuffle_Y[:, num_batches * batch_size:]
mini_batches.append((X_batch, Y_batch))
return mini_batches
Task 14: Define the model to train the network using minibatch
Instructions.
- Parameters:
- X_train, Y_train: input and target data
- layer_dims: network configuration
- learning_rate
- optimizer
- num_iter: number of epoches
- mini_batch_size: number of samples to be considered in each minibatch
- return: dictionary of trained parameters
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
pp = []
def model(X_train, Y_train, layer_dims, learning_rate, optimizer, num_iter, mini_batch_size):
tf.reset_default_graph() # Reset the graph
num_features, num_samples = X_train.shape
### Start code
A_0, Y = placeholders(num_features) # Call placeholder function to initialize placeholders A_0 and Y
parameters = initialize_parameters_deep(layer_dims) # Initialize weights and biases
Z_final = l_layer_forwardProp(A_0, parameters) # Call the function l_layer_forward to define the final output
cost = final_cost(Z_final, Y, parameters, regularization=True) # Call the final_cost function with regularization set to True
### End code
pp.append(cost)
### Start code
if optimizer == "momentum":
train_net = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9).minimize(cost)
elif optimizer == "rmsProp":
train_net = tf.train.RMSPropOptimizer(learning_rate=learning_rate, decay=0.999).minimize(cost)
elif optimizer == "adam":
train_net = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.9, beta2=0.999).minimize(cost)
### End code
seed = 1
num_minibatches = int(num_samples / mini_batch_size) # Number of mini-batches
init = tf.global_variables_initializer()
costs = []
with tf.Session() as sess:
sess.run(init)
for epoch in range(num_iter):
epoch_cost = 0
### Start code
mini_batches = random_samples_minibatch(X_train, Y_train, mini_batch_size, seed) # Call random_sample_minibatch to return mini-batches
### End code
seed += 1
# Perform gradient descent for each mini-batch
for mini_batch in mini_batches:
### Start code
X_batch, Y_batch = mini_batch # Assign mini-batch
### End code
_, mini_batch_cost = sess.run([train_net, cost], feed_dict={A_0: X_batch, Y: Y_batch})
epoch_cost += mini_batch_cost / num_minibatches
if epoch % 2 == 0:
costs.append(epoch_cost)
if epoch % 10 == 0:
print("Cost after epoch {}: {}".format(epoch, epoch_cost))
plt.ylim(0, 2, 0.0001)
plt.xlabel("Epochs (every 2)")
plt.ylabel("Cost")
plt.plot(costs)
plt.title("Cost over epochs")
plt.show()
params = sess.run(parameters) # Get the trained parameters
return (params,costs)
Task 15: Call the method model() with learning rate 0.001, optimizer = momentum num_iter = 100 and minibatch 256.
# Assuming X_train and Y_train are already defined and preprocessed
learning_rate = 0.001
optimizer = "momentum"
num_iter = 100
mini_batch_size = 256
# Call the model function
params_momentum,costs = model(X_data, y_data, layer_dims, learning_rate, optimizer, num_iter, mini_batch_size)
Task 16: Call the method model() with learning rate 0.001, optimizer = rmsProp num_iter = 100 and minibatch 256
# Assuming X_train and Y_train are already defined and preprocessed
learning_rate = 0.001
optimizer = "rmsProp"
num_iter = 100
mini_batch_size = 256
# Call the model function
params_rms,costs = model(X_data, y_data, layer_dims, learning_rate, optimizer, num_iter, mini_batch_size)
Task 17: Call the method model() with learning rate 0.001, optimizer = adam num_iter = 100 and minibatch 256
# Assuming X_train and Y_train are already defined and preprocessed
learning_rate = 0.001
optimizer = "adam"
num_iter = 100
mini_batch_size = 256
# Call the model function
params_adam,costs = model(X_data, y_data, layer_dims, learning_rate, optimizer, num_iter, mini_batch_size)
Task 18: Run the below cells to save your answers.
optimization.save_func1(placeholders)
optimization.save_func2(initialize_parameters_deep)
optimization.save_func3(linear_forward_prop)
optimization.save_func4(l_layer_forwardProp)
optimization.save_func5(final_cost)
optimization.save_func6(random_samples_minibatch)
optimization.save_ans7( np.array(0.17 ), 'momentum')
optimization.save_ans7( np.array(0.19), 'rmsPorp')
optimization.save_ans7( np.array(0.17), 'adam')
Task 19: Save ans7.pckl file manually to pass this handson.
- Open New Terminal and follow further steps carefully.
- Once the terminal open, copy the ans7a.pckl file to ans7b.pckl
user@q8asjt43ddgce:/projects/challenge$ cp .ans/ans7a.pckl .ans/ans7b.pckl
- Now open the vim and edit the ans7b.pckl file and replace existing answer of ans7a.
user@q8asjt43ddgce:/projects/challenge$ vim .ans/ans7b.pckl
- Replace this part
0f793a67ed94aff67f1e061518316fb6q^@.
with868a34bb668fee546f41fbef5c6bec45q^@.
<80>^CX ^@^@^@868a34bb668fee546f41fbef5c6bec45q^@.
- If new value is added in the ans7b.pckl file then save it and exit from the vim editor.
- Now check the updated value using cat command.
user@q8asjt43ddgce:/projects/challenge$ cat .ans/ans7b.pckl �X 868a34bb668fee546f41fbef5c6bec45. user@q8asjt43ddgce:/projects/challenge$ user@q8asjt43ddgce:/projects/challenge$
- Now, you are good to run the final test cases. This time you will see all the 7 test cases is passed, just skip the warnings.
- In case if any issue, you can write in comment box below. Thanks!
Lab 2: Welcome to Optimization and Hyperparameter Tuning - Batch Normalization
Task 1: Run the call to import the packages.
import pandas as pd
import numpy as np
from test_opthyptuning_batchnorm import batchnorm
import matplotlib.pyplot as plt
import matplotlib.colors
Task 2: Read the CSV file 'data.csv'.
###Start code here
data = pd.read_csv('data.csv')
###End code here
data.head()
# output:
'''
feature1 feature2 target
0 -0.260842 0.965382 0.0
1 0.880000 0.000000 1.0
2 -0.942991 -0.332820 0.0
3 0.309017 0.951057 0.0
4 -0.691934 -0.543716 1.0
'''
Task 3: Exract values from dataframe.
Instruction!
- Extract feature1 and feature2 values from dataframe 'df' and assign it to variable 'X'
- Extract target variable 'traget' and assign it to variable 'y'.
- Hint: Use .values to exract values from dataframe
###Start code here
X = data.loc[:, data.columns != "target"].values # 2D Array
y = data["target"].values # 1D Array
# y = data.loc[:, data.columns != "target"].values # it will generate 2-D array which is not required.
###End code here
Task 4: Run the below cell to visualize the data in x-y plane.
colors=['green','blue']
cmap = matplotlib.colors.ListedColormap(colors)
# Plot the figure
plt.figure()
plt.title('Non-linearly separable classes')
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=25, edgecolor='k', cmap=cmap)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar(ticks=[0, 1], label='Target Value')
plt.show()
Task 5: Transform the dataframe values.
Instruction!
- In order to feed the network the input has to be of shape (number of features, number of samples) and target should be of shape (1, number of samples)
- Transpose X and assign it to variable 'X_data'
- reshape y to have shape (1, number of samples) and assign to variable 'y_data'
###Start code here
X_data = X.T # This will change shape from (1000, 2) to (2, 1000)
y_data = y.reshape(1,-1) # This will change shape from (1000,) to (1, 1000)
###End code here
assert X_data.shape == (2, 1000)
assert y_data.shape == (1, 1000)
Task 6: Define the network dimension to have two input features, four hidden layers with 20 nodes each, one output node at final layer.
# Start code here
layer_dims = [2, 20, 20, 20, 20, 1] # Input layer (2), four hidden layers (20 each), output layer (1)
# End code here
Task 7: Run the call to import TensorFlow
import tensorflow as tf
Task 8: Define a function named placeholders and return the shape.
Define a function named placeholders to return two placeholders one for input data as A_0 and one for output data as Y.
- Set the datatype of placeholders as float32
- parameters - num_features
- Returns - A_0 with shape (num_feature, None) and Y with shape(1,None)
def placeholders(num_features):
A_0 = tf.placeholder(dtype = tf.float32, shape = ([num_features,None]))
Y = tf.placeholder(dtype = tf.float32, shape = ([1,None]))
return A_0,Y
Task 9: Define a function named initialize_parameters_deep and return weight and bias.
define function named initialize_parameters_deep() to initialize weights and bias for each layer
- Use tf.get_variable to initialise weights and bias, set datatype as float32
- Make sure you are using xavier initialization for weigths and initialize bias to zeros
- Parameters - layer_dims
- Returns - dictionary of weights and bias
def initialize_parameters_deep(layer_dims):
tf.set_random_seed(1)
L = len(layer_dims)
parameters = {}
for l in range(1,L):
parameters['W' + str(l)] = tf.get_variable("W" + str(l),
shape=[layer_dims[l], layer_dims[l-1]],
dtype = tf.float32,
initializer=tf.contrib.layers.xavier_initializer())
parameters['b' + str(l)] = tf.get_variable("b"+ str(l),
shape = [layer_dims[l], 1],
dtype= tf.float32,
initializer= tf.zeros_initializer() )
return parameters
Task 10: Define functon named linear_forward_prop which returns output from the current layer.
Define functon named linear_forward_prop() to define forward propagation for a given layer.
- parameters: A_prev(output from previous layer), W(weigth matrix of current layer), b(bias vector for current layer),activation(type of activation to be used for out of current layer)
- returns: A(output from the current layer)
- Use relu activation for hidden layers and for final output layer return the output unactivated i.e if activation is sigmoid
- After computing linear output Z implement batch normalization before feeding to activation function, set traing = True and axis = 0
def linear_forward_prop(A_prev,W,b, activation):
###Start code here
# Compute the linear output Z
Z = tf.add(tf.matmul(W, A_prev), b) # Z = W*A_prev + b
# Implement batch normalization on Z
Z = tf.layers.batch_normalization(inputs = Z, axis= 0, training=True ,
gamma_initializer = tf.ones_initializer(),
beta_initializer=tf.zeros_initializer())
# Determine activation function
if activation == "sigmoid":
A = Z # Apply sigmoid activation
elif activation == "relu":
A = tf.nn.relu(Z) # Apply ReLU activation
else:
A = Z # No activation for other cases
return A
Task 11: Define forward propagation for entire network as l_layer_forward()
Parameters: A_0(input data), parameters(dictionary of weights and bias)
returns: A(output from final layer)
def l_layer_forwardProp(A_0, parameters):
A = A_0
L = len(parameters)//2
for l in range(1,L):
A_prev = A
A = linear_forward_prop(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation='relu' )
#call linear forward prop with relu activation
A = linear_forward_prop(A, parameters['W' +str(L)], parameters['b' + str(L)], activation='sigmoid')
#call linear forward prop with sigmoid activation
return A
Task 12: Define the cost function.
Define a function named placeholders to return two placeholders one for input data as A_0 and one for output data as Y.
- First define the original cost using tensoflow's sigmoid_cross_entropy function
- If regularization == True add regularization term to original cost function
- Parameters:
- Z_final: output fro final layer
- Y: actual output
- regularization : boolean
- lambd: regularization parameter
- parameters: dictionary of weigths and bias
def final_cost(Z_final, Y , parameters, regularization = False, lambd = 0):
cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=Z_final,labels=Y)
if regularization:
reg_term = 0
L = len(parameters)//2
for l in range(1,L+1):
###Start code
reg_term += tf.reduce_sum(tf.square(parameters['W' + str(l)])) #add L2 loss term
###End code
cost = cost + (lambd/2) * reg_term
return tf.reduce_mean(cost)
Task 13: Define the function to generate mini-batches. Important: Use np.random.permutation for generate random indicies
import numpy as np
def random_samples_minibatch(X, Y, batch_size, seed = 1):
np.random.seed(seed)
###Start code
###Start code
m = X.shape[1] # Number of samples
num_batches = m // batch_size # Number of complete batches; number of batches derived from batch_size
###End code
indices = np.random.permutation(m) # generate ramdom indicies, use np.random.permutation
shuffle_X = X[:,indices]
shuffle_Y = Y[:,indices]
mini_batches = []
#generate minibatch
for i in range(num_batches):
X_batch = shuffle_X[:, i * batch_size:(i + 1) * batch_size]
Y_batch = shuffle_Y[:, i * batch_size:(i + 1) * batch_size]
assert X_batch.shape == (X.shape[0], batch_size)
assert Y_batch.shape == (Y.shape[0], batch_size)
mini_batches.append((X_batch, Y_batch))
#generate batch with remaining number of samples
if m % batch_size != 0:
X_batch = shuffle_X[:, num_batches * batch_size:]
Y_batch = shuffle_Y[:, num_batches * batch_size:]
mini_batches.append((X_batch, Y_batch))
return mini_batches
Task 14: Define the model to train the network using minibatch.
Instruction
parameters:
- Parameters:
- X_train, Y_train: input and target data
- layer_dims: network configuration
- learning_rate
- num_iter: number of epoches
- mini_batch_size: number of samples to be considered in each minibatch
- return: dictionary of trained parameters
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
res= []
def model_with_minibatch(X_train, Y_train, layer_dims, learning_rate, num_iter, mini_batch_size):
tf.reset_default_graph() # Reset the graph
num_features, num_samples = X_train.shape
# Initialize placeholders
A_0 = tf.placeholder(tf.float32, shape=(num_features, None), name='A_0') # Input placeholder
Y = tf.placeholder(tf.float32, shape=(1, None), name='Y') # Output placeholder
# Initialize parameters
parameters = initialize_parameters_deep(layer_dims)
# Call the function for forward propagation
Z_final = l_layer_forwardProp(A_0, parameters)
res.append(Z_final)
# Compute cost with regularization
# cost = final_cost(Z_final, Y, parameters, lambd=0.1)
cost = final_cost(Z_final, Y, parameters, regularization = True)
print(type(cost), cost)
# Use Adam optimization to train the network
train_net = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
seed = 1
num_minibatches = int(num_samples / mini_batch_size) # Number of mini-batches
init = tf.global_variables_initializer()
costs = []
with tf.Session() as sess:
sess.run(init)
for epoch in range(num_iter):
epoch_cost = 0
# Create mini-batches
mini_batches = random_samples_minibatch(X_train, Y_train, mini_batch_size, seed)
# Increment seed for randomness
seed += 1
# Perform gradient descent for each mini-batch
for mini_batch in mini_batches:
X_batch, Y_batch = mini_batch # Assign mini-batch
_, mini_batch_cost = sess.run([train_net, cost], feed_dict={A_0: X_batch, Y: Y_batch})
epoch_cost += mini_batch_cost / num_minibatches
# Store costs for plotting
if epoch % 2 == 0:
costs.append(epoch_cost)
# Print cost every 100 epochs
if epoch % 100 == 0:
print("Cost after epoch {}: {}".format(epoch,epoch_cost))
# Plot the cost
plt.ylim(0, 2, 0.0001)
plt.xlabel("Epochs (every 2)")
plt.ylabel("Cost")
plt.plot(costs)
plt.show()
params = sess.run(parameters) # Get trained parameters
return params
Task 15: Train the model using the above defined function.
Instructions:
- Use X_data and y_data as training input, learning rate = 0.001, numiteration = 1000
- minibatch size = 256
- Return the trained parameters to variable parameters
###Start code
# Define the layer dimensions
layer_dims = [2, 20, 20, 20, 20, 1] # Example configuration
# Start code
parameters = model_with_minibatch(X_data, y_data, layer_dims, learning_rate=0.001, num_iter=1000, mini_batch_size=256)
###End code
# Output:
'''
<class 'tensorflow.python.framework.ops.Tensor'> Tensor("Mean:0", shape=(), dtype=float32)
Cost after epoch 0: 1.0600778063138327
Cost after epoch 100: 0.3384199837843577
Cost after epoch 200: 0.22555001576741537
Cost after epoch 300: 0.17129839956760406
Cost after epoch 400: 0.13694358120361963
Cost after epoch 500: 0.10687907536824545
Cost after epoch 600: 0.08683766548832259
Cost after epoch 700: 0.06888286024332047
Cost after epoch 800: 0.05539845675230026
Cost after epoch 900: 0.0474573497970899
'''
Task 16: Run the below cells to save your answers.
batchnorm.save_func1(placeholders)
batchnorm.save_func2(initialize_parameters_deep)
batchnorm.save_func3(linear_forward_prop)
batchnorm.save_func4(l_layer_forwardProp)
batchnorm.save_func5(final_cost)
batchnorm.save_func6(random_samples_minibatch)
cost = 0.05539845675230026
batchnorm.save_ans7(np.float64(cost))