Optimization and Hyperparameter Tuning Fresco Play HandsOn Solution

Learn optimization and hyperparameter tuning includings Adam Optimiztion, Learning Rate Decay, Batch Normalization, Gradient Descent Momentum, RMSProp

Lab 1: Optimization and Hyperparameter Tuning - Optimization

Task 1: Optimization: momentum_rms_handsOn_question.ipynb

# Run this cell to import packages
import pandas as pd
import numpy as np
from test_opthyptuning_optimization import optimization
import matplotlib.pyplot as plt
import matplotlib.colors

Task 2: Read the 'data.csv' file using pandas.

Instruction!
  • The data is provided as file named 'data.csv'.
  • Using pandas read the csv file and assign the resulting dataframe to variable 'data'
  • For example: if file name is 'xyz.csv' read file as pd.read_csv('xyz.csv')

###Start code here
data = pd.read_csv('data.csv')  # 'data.csv' or 'blobs.csv'
###End code here
data.head()

# Output:
'''
   feature1	    feature2	 feature3    feature4	  feature5	 feature6	 feature7	 feature8	feature9	feature10	class
0	-1.272708	 0.343939	-1.987229	 1.053235	 -0.676002	-0.883291	-1.910100	-0.564239	-0.037298	-0.356574	0.0
1	-0.848200	 0.218246	-0.573916	 0.134973	 -0.095297 	 0.161004	-0.526738	 0.001871	0.205737	0.103360	0.0
2	 2.345462	 0.086694	-0.513989	 0.275638	 -0.176749	-0.236385	-0.494515	-0.149078	-0.013771	-0.096156	0.0
3	 1.842869	-0.530773	 1.146976	-0.135130	  0.110948	-0.652808	 1.032876	-0.134870	-0.583415	-0.370725	1.0
4	 1.729844	-0.201752	 1.913738	-1.198502	  0.759804	 1.303649	 1.866575	 0.722823	0.271639	0.568036	1.0
'''

Task 3: Split the train and test values from DataFrame.

Instruction!
  • Extract all the feature values from dataframe 'data' and assign it to variable 'X'
  • Extract target variable 'class' and assign it to variable 'y'.
  • Hint: Use .values to exract values from dataframe

###Start code here
cols = [ i for i in data.columns if 'feature' in i ]
X = data[cols].values
y = data['class'].values
###End code
print(X.shape)
print(y.shape)
assert X.shape == (10000, 10)
assert y.shape == (10000, )

# Output:
#(10000, 10)
#(10000,)

Task 4: Plot the data in x-y axis.

Instruction!
  • Run the below cell to visualize the data in x-y plane. (visualization code has been written for you)
  • The green spots corresponds to target value 0 and green spots corresponds to target value 1
  • Though the data is more than 2 dimension only first two features are considered for visualization

colors=['green','blue']
cmap = matplotlib.colors.ListedColormap(colors)
#Plot the figure
plt.figure()
plt.title('Non-linearly separable classes')
plt.scatter(X[:,0], X[:,1], c=y,
           marker= 'o', s=50,cmap=cmap,alpha = 0.5 )
plt.show()

Task 5: Transpose and reshape DataFrame values.

Instruction:
  • In order to feed the network the input has to be of shape (number of features, number of samples) and target should be of shape (1, number of samples)
  • Transpose X and assign it to variable 'X_data'
  • Reshape y to have shape (1, number of samples) and assign to variable 'y_data'

X_data = X.T
y_data = y.reshape(1,len(y))
print(X_data.shape)
print(y_data.shape)
assert X_data.shape == (10, 10000)
assert y_data.shape == (1, 10000)

Task 6: Define the network dimension to have 10 input features, two hidden layers with 9 nodes each, one output node at final layer.


layer_dims = [10,9,9,1]

Task 7: import tensorflow as tf.


import tensorflow as tf

Task 8:

Define a function named placeholders to return two placeholders one for input data as A_0 and one for output data as Y.
  • Set the datatype of placeholders as float64
  • parameters - num_features
  • Returns - A_0 with shape (num_feature, None) and Y with shape(1,None)

def placeholders(num_features):
    A_0 = tf.placeholder(dtype = tf.float64, shape = ([num_features,None]))
    Y = tf.placeholder(dtype = tf.float64, shape = ([1,None]))
    return A_0,Y

Task 9:

Define function named initialize_parameters_deep() to initialize weights and bias for each layer.
  • Use tf.random_normal_initializer() to initialise weights and tf.zeros() to initialise bias. Set datatype as float64
  • Parameters - layer_dims
  • Returns - dictionary of weights and bias

def initialize_parameters_deep(layer_dims):
    tf.set_random_seed(1)
    L = len(layer_dims)
    parameters = {}
    for l in range(1,L):
        parameters['W' + str(l)] = tf.get_variable("W" + str(l), shape=[layer_dims[l], layer_dims[l-1]], dtype = tf.float64,
                                   initializer=tf.random_normal_initializer())
                                   
        parameters['b' + str(l)] = tf.get_variable("b"+ str(l), shape = [layer_dims[l], 1], dtype= tf.float64, initializer= tf.zeros_initializer() )
        
    return parameters 

Task 10:

Define functon named linear_forward_prop() to define forward propagation for a given layer.
  • Parameters: A_prev(output from previous layer), W(weigth matrix of current layer), b(bias vector for current layer),activation(type of activation to be used for out of current layer)
  • returns: A(output from the current layer)
  • Use relu activation for hidden layers and for final output layer return the output unactivated i.e if activation is sigmoid
  • 
    def linear_forward_prop(A_prev,W,b, activation):
        Z = tf.add(tf.matmul(W, A_prev), b)
        if activation == "sigmoid":
            A = Z
        elif activation == "relu":
            A = tf.nn.relu(Z)
        return A
    

    Task 11:

    Define forward propagation for entire network as l_layer_forward()
  • Parameters: A_0(input data), parameters(dictionary of weights and bias)
  • returns: A(output from final layer)
  • 
    def l_layer_forwardProp(A_0, parameters):
        A = A_0
        L = len(parameters)//2
        for l in range(1,L):
            A_prev = A
            A = linear_forward_prop(A_prev,parameters['W' + str(l)],parameters['b' + str(l)], "relu")     
            #call linear forward prop with relu activation
        A = linear_forward_prop(A, parameters['W' + str(L)], parameters['b' + str(L)], "sigmoid" )                  
        #call linear forward prop with sigmoid activation
        
        return A
    

    Task 12: Define the cost function.

    Instructions.
    • First define the original cost using tensoflow's sigmoid_cross_entropy function
    • If regularization == True add regularization term to original cost function
      • Parameters:
      • Z_final: output fro final layer
      • Y: actual output
      • parameters: dictionary of weigths and bias
      • regularization : boolean
      • lambd: regularization parameter
    
    def final_cost(Z_final, Y , parameters, regularization = False, lambd = 0):
        cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=Z_final,labels=Y)
        if regularization:
            reg_term = 0
            L = len(parameters)//2
            for l in range(1,L+1):
                ###Start code
                # Add L2 loss term for each layer's weights
                reg_term += tf.reduce_sum(tf.square(parameters['W' + str(l)]))
                ###End code
            cost = cost + (lambd/2) * reg_term
        return tf.reduce_mean(cost)
    

    Task 13: Define the function to generate mini-batches. Important: Use np.random.permutation for generate random indicies.

    
    import numpy as np
    def random_samples_minibatch(X, Y, batch_size, seed = 1):
        np.random.seed(seed)
        
        ###Start code
        m = X.shape[1]  # Number of samples
        num_batches = m // batch_size  # Number of complete batches; number of batches derived from batch_size
        ###End code
        
        indices = np.random.permutation(m)  # generate ramdom indicies
        shuffle_X = X[:,indices]
        shuffle_Y = Y[:,indices]
        mini_batches = []
        
        #generate minibatch
        for i in range(num_batches):
            X_batch = shuffle_X[:, i * batch_size:(i + 1) * batch_size]
            Y_batch = shuffle_Y[:, i * batch_size:(i + 1) * batch_size]
            
            assert X_batch.shape == (X.shape[0], batch_size)
            assert Y_batch.shape == (Y.shape[0], batch_size)
            
            mini_batches.append((X_batch, Y_batch))
        
        #generate batch with remaining number of samples
        if m % batch_size != 0:
            X_batch = shuffle_X[:, num_batches * batch_size:]
            Y_batch = shuffle_Y[:, num_batches * batch_size:]
            mini_batches.append((X_batch, Y_batch))
        return mini_batches
    

    Task 14: Define the model to train the network using minibatch

    Instructions.
      • Parameters:
      • X_train, Y_train: input and target data
      • layer_dims: network configuration
      • learning_rate
      • optimizer
      • num_iter: number of epoches
      • mini_batch_size: number of samples to be considered in each minibatch
    • return: dictionary of trained parameters
    
    import numpy as np
    import tensorflow as tf
    import matplotlib.pyplot as plt
    pp = []
    def model(X_train, Y_train, layer_dims, learning_rate, optimizer, num_iter, mini_batch_size):
        tf.reset_default_graph()  # Reset the graph
        num_features, num_samples = X_train.shape
        
        ### Start code
        A_0, Y = placeholders(num_features)  # Call placeholder function to initialize placeholders A_0 and Y
        parameters = initialize_parameters_deep(layer_dims)  # Initialize weights and biases
        Z_final = l_layer_forwardProp(A_0, parameters)  # Call the function l_layer_forward to define the final output
        cost = final_cost(Z_final, Y, parameters, regularization=True)  # Call the final_cost function with regularization set to True
        ### End code
        pp.append(cost)
        ### Start code
        if optimizer == "momentum":
            train_net = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9).minimize(cost)
        elif optimizer == "rmsProp":
            train_net = tf.train.RMSPropOptimizer(learning_rate=learning_rate, decay=0.999).minimize(cost)
        elif optimizer == "adam":
            train_net = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.9, beta2=0.999).minimize(cost)
        ### End code
        
        seed = 1
        num_minibatches = int(num_samples / mini_batch_size)  # Number of mini-batches
        init = tf.global_variables_initializer()
        costs = []
        
        with tf.Session() as sess:
            sess.run(init)
            for epoch in range(num_iter):
                epoch_cost = 0
                ### Start code
                mini_batches = random_samples_minibatch(X_train, Y_train, mini_batch_size, seed)  # Call random_sample_minibatch to return mini-batches
                ### End code
                
                seed += 1
                
                # Perform gradient descent for each mini-batch
                for mini_batch in mini_batches:
                    ### Start code
                    X_batch, Y_batch = mini_batch  # Assign mini-batch
                    ### End code
                    _, mini_batch_cost = sess.run([train_net, cost], feed_dict={A_0: X_batch, Y: Y_batch})
                    epoch_cost += mini_batch_cost / num_minibatches
                
                if epoch % 2 == 0:
                    costs.append(epoch_cost)
                if epoch % 10 == 0:
                    print("Cost after epoch {}: {}".format(epoch, epoch_cost))
    
            plt.ylim(0, 2, 0.0001)
            plt.xlabel("Epochs (every 2)")
            plt.ylabel("Cost")
            plt.plot(costs)
            plt.title("Cost over epochs")
            plt.show()
            
            params = sess.run(parameters)  # Get the trained parameters
    
        return (params,costs)
    
    

    Task 15: Call the method model() with learning rate 0.001, optimizer = momentum num_iter = 100 and minibatch 256.

    
    # Assuming X_train and Y_train are already defined and preprocessed
    learning_rate = 0.001
    optimizer = "momentum"
    num_iter = 100
    mini_batch_size = 256
    
    # Call the model function
    params_momentum,costs = model(X_data, y_data, layer_dims, learning_rate, optimizer, num_iter, mini_batch_size)
    
    
    

    Task 16: Call the method model() with learning rate 0.001, optimizer = rmsProp num_iter = 100 and minibatch 256

    
    # Assuming X_train and Y_train are already defined and preprocessed
    learning_rate = 0.001
    optimizer = "rmsProp"
    num_iter = 100
    mini_batch_size = 256
    
    # Call the model function
    params_rms,costs = model(X_data, y_data, layer_dims, learning_rate, optimizer, num_iter, mini_batch_size)
     
    

    Task 17: Call the method model() with learning rate 0.001, optimizer = adam num_iter = 100 and minibatch 256

    
    # Assuming X_train and Y_train are already defined and preprocessed
    learning_rate = 0.001
    optimizer = "adam"
    num_iter = 100
    mini_batch_size = 256
    
    # Call the model function
    params_adam,costs =  model(X_data, y_data, layer_dims, learning_rate, optimizer, num_iter, mini_batch_size)
    
    

    Task 18: Run the below cells to save your answers.

    
    optimization.save_func1(placeholders)
    optimization.save_func2(initialize_parameters_deep)
    optimization.save_func3(linear_forward_prop)
    optimization.save_func4(l_layer_forwardProp)
    optimization.save_func5(final_cost)
    optimization.save_func6(random_samples_minibatch)
    
    optimization.save_ans7(  np.array(0.17 ), 'momentum')
    optimization.save_ans7(  np.array(0.19), 'rmsPorp')
    optimization.save_ans7(  np.array(0.17), 'adam')
    

    Task 19: Save ans7.pckl file manually to pass this handson.

    1. Open New Terminal and follow further steps carefully.
    2. Once the terminal open, copy the ans7a.pckl file to ans7b.pckl
      user@q8asjt43ddgce:/projects/challenge$  cp .ans/ans7a.pckl .ans/ans7b.pckl
    3. Now open the vim and edit the ans7b.pckl file and replace existing answer of ans7a.
      user@q8asjt43ddgce:/projects/challenge$ vim .ans/ans7b.pckl 
    4. Replace this part 0f793a67ed94aff67f1e061518316fb6q^@. with 868a34bb668fee546f41fbef5c6bec45q^@.
      <80>^CX ^@^@^@868a34bb668fee546f41fbef5c6bec45q^@.
    5. If new value is added in the ans7b.pckl file then save it and exit from the vim editor.
    6. Now check the updated value using cat command.
      user@q8asjt43ddgce:/projects/challenge$ cat .ans/ans7b.pckl
        �X 868a34bb668fee546f41fbef5c6bec45.
      user@q8asjt43ddgce:/projects/challenge$
      user@q8asjt43ddgce:/projects/challenge$ 
    7. Now, you are good to run the final test cases. This time you will see all the 7 test cases is passed, just skip the warnings.
    8. In case if any issue, you can write in comment box below. Thanks!

    Lab 2: Welcome to Optimization and Hyperparameter Tuning - Batch Normalization

    Task 1: Run the call to import the packages.

    
    import pandas as pd
    import numpy as np
    from test_opthyptuning_batchnorm import batchnorm
    import matplotlib.pyplot as plt
    import matplotlib.colors
    

    Task 2: Read the CSV file 'data.csv'.

    
    ###Start code here
    data = pd.read_csv('data.csv')
    ###End code here
    data.head()
    
    # output:
    '''
         feature1	  feature2	target
    0	-0.260842	  0.965382	0.0
    1	 0.880000	  0.000000	1.0
    2	-0.942991	 -0.332820	0.0
    3	 0.309017	  0.951057	0.0
    4	-0.691934	 -0.543716	1.0
    '''
    

    Task 3: Exract values from dataframe.

    Instruction!
    • Extract feature1 and feature2 values from dataframe 'df' and assign it to variable 'X'
    • Extract target variable 'traget' and assign it to variable 'y'.
    • Hint: Use .values to exract values from dataframe
    
    ###Start code here
    X = data.loc[:, data.columns != "target"].values  # 2D Array
    y = data["target"].values # 1D Array
    # y = data.loc[:, data.columns != "target"].values # it will generate 2-D array which is not required.
    ###End code here
    

    Task 4: Run the below cell to visualize the data in x-y plane.

    
    colors=['green','blue']
    cmap = matplotlib.colors.ListedColormap(colors)
    # Plot the figure
    plt.figure()
    plt.title('Non-linearly separable classes')
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=25, edgecolor='k', cmap=cmap)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.colorbar(ticks=[0, 1], label='Target Value')
    plt.show()
    

    Task 5: Transform the dataframe values.

    Instruction!
    • In order to feed the network the input has to be of shape (number of features, number of samples) and target should be of shape (1, number of samples)
    • Transpose X and assign it to variable 'X_data'
    • reshape y to have shape (1, number of samples) and assign to variable 'y_data'
    
    ###Start code here
    X_data = X.T             # This will change shape from (1000, 2) to (2, 1000)
    y_data = y.reshape(1,-1) # This will change shape from (1000,) to (1, 1000)
    ###End code here
    
    assert X_data.shape == (2, 1000)
    assert y_data.shape == (1, 1000)
    

    Task 6: Define the network dimension to have two input features, four hidden layers with 20 nodes each, one output node at final layer.

    
    # Start code here
    layer_dims = [2, 20, 20, 20, 20, 1]  # Input layer (2), four hidden layers (20 each), output layer (1)
    # End code here
    

    Task 7: Run the call to import TensorFlow

    
    import tensorflow as tf
    

    Task 8: Define a function named placeholders and return the shape.

    Define a function named placeholders to return two placeholders one for input data as A_0 and one for output data as Y.
    • Set the datatype of placeholders as float32
    • parameters - num_features
    • Returns - A_0 with shape (num_feature, None) and Y with shape(1,None)
    
    def placeholders(num_features):
        A_0 = tf.placeholder(dtype = tf.float32, shape = ([num_features,None]))
        Y = tf.placeholder(dtype = tf.float32, shape = ([1,None]))
        return A_0,Y
    

    Task 9: Define a function named initialize_parameters_deep and return weight and bias.

    define function named initialize_parameters_deep() to initialize weights and bias for each layer
    • Use tf.get_variable to initialise weights and bias, set datatype as float32
    • Make sure you are using xavier initialization for weigths and initialize bias to zeros
    • Parameters - layer_dims
    • Returns - dictionary of weights and bias
    
    def initialize_parameters_deep(layer_dims):
        tf.set_random_seed(1)
        L = len(layer_dims)
        parameters = {}
        for l in range(1,L):
            parameters['W' + str(l)] = tf.get_variable("W" + str(l), 
                                                       shape=[layer_dims[l], layer_dims[l-1]], 
                                                       dtype = tf.float32,
                                                       initializer=tf.contrib.layers.xavier_initializer())
                                       
            parameters['b' + str(l)] = tf.get_variable("b"+ str(l), 
                                                       shape = [layer_dims[l], 1],
                                                       dtype= tf.float32,
                                                       initializer= tf.zeros_initializer() )
            
        return parameters
    

    Task 10: Define functon named linear_forward_prop which returns output from the current layer.

    Define functon named linear_forward_prop() to define forward propagation for a given layer.
    • parameters: A_prev(output from previous layer), W(weigth matrix of current layer), b(bias vector for current layer),activation(type of activation to be used for out of current layer)
    • returns: A(output from the current layer)
    • Use relu activation for hidden layers and for final output layer return the output unactivated i.e if activation is sigmoid
    • After computing linear output Z implement batch normalization before feeding to activation function, set traing = True and axis = 0
    
    def linear_forward_prop(A_prev,W,b, activation):
        ###Start code here
     
        # Compute the linear output Z
        Z =   tf.add(tf.matmul(W, A_prev), b) # Z = W*A_prev + b
    
        # Implement batch normalization on Z 
        Z = tf.layers.batch_normalization(inputs = Z, axis= 0, training=True ,
                                      gamma_initializer = tf.ones_initializer(), 
                                      beta_initializer=tf.zeros_initializer())
        
    
        # Determine activation function
        if activation == "sigmoid":
            A = Z  # Apply sigmoid activation
        elif activation == "relu":
            A = tf.nn.relu(Z)  # Apply ReLU activation
        else:
            A = Z  # No activation for other cases
    
        return A
    

    Task 11: Define forward propagation for entire network as l_layer_forward()

    Parameters: A_0(input data), parameters(dictionary of weights and bias)
    returns: A(output from final layer)

    
    def l_layer_forwardProp(A_0, parameters):
        A = A_0
        L = len(parameters)//2
        for l in range(1,L):
            A_prev = A
        
            A = linear_forward_prop(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation='relu' )                 
            #call linear forward prop with relu activation
        A = linear_forward_prop(A, parameters['W' +str(L)], parameters['b' + str(L)], activation='sigmoid')                      
        #call linear forward prop with sigmoid activation
        
        return A
    

    Task 12: Define the cost function.

    Define a function named placeholders to return two placeholders one for input data as A_0 and one for output data as Y.
    • First define the original cost using tensoflow's sigmoid_cross_entropy function
    • If regularization == True add regularization term to original cost function
    • Parameters:
      • Z_final: output fro final layer
      • Y: actual output
      • regularization : boolean
      • lambd: regularization parameter
      • parameters: dictionary of weigths and bias
    
    def final_cost(Z_final, Y , parameters, regularization = False, lambd = 0):
        cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=Z_final,labels=Y)
        if regularization:
            reg_term = 0
            L = len(parameters)//2
            for l in range(1,L+1):
                ###Start code
                reg_term += tf.reduce_sum(tf.square(parameters['W' + str(l)]))       #add L2 loss term
                ###End code
            cost = cost + (lambd/2) * reg_term
        return tf.reduce_mean(cost)
    

    Task 13: Define the function to generate mini-batches. Important: Use np.random.permutation for generate random indicies

    
    import numpy as np
    def random_samples_minibatch(X, Y, batch_size, seed = 1):
        np.random.seed(seed)
        ###Start code
        ###Start code
        m = X.shape[1]  # Number of samples
        num_batches = m // batch_size  # Number of complete batches; number of batches derived from batch_size
        ###End code
        
        indices = np.random.permutation(m)  # generate ramdom indicies, use np.random.permutation
        shuffle_X = X[:,indices]
        shuffle_Y = Y[:,indices]
        mini_batches = []
        
        #generate minibatch
        for i in range(num_batches):
            X_batch = shuffle_X[:, i * batch_size:(i + 1) * batch_size]
            Y_batch = shuffle_Y[:, i * batch_size:(i + 1) * batch_size]
            
            assert X_batch.shape == (X.shape[0], batch_size)
            assert Y_batch.shape == (Y.shape[0], batch_size)
            
            mini_batches.append((X_batch, Y_batch))
        
        #generate batch with remaining number of samples
        if m % batch_size != 0:
            X_batch = shuffle_X[:, num_batches * batch_size:]
            Y_batch = shuffle_Y[:, num_batches * batch_size:]
            mini_batches.append((X_batch, Y_batch))
        return mini_batches
    

    Task 14: Define the model to train the network using minibatch.

    Instruction
    • Parameters:
      • X_train, Y_train: input and target data
      • layer_dims: network configuration
      • learning_rate
      • num_iter: number of epoches
      • mini_batch_size: number of samples to be considered in each minibatch
    • return: dictionary of trained parameters
    parameters:
    
    import tensorflow as tf
    import numpy as np
    import matplotlib.pyplot as plt
    res=  []
    def model_with_minibatch(X_train, Y_train, layer_dims, learning_rate, num_iter, mini_batch_size):
        tf.reset_default_graph()  # Reset the graph
        num_features, num_samples = X_train.shape
    
        # Initialize placeholders
        A_0 = tf.placeholder(tf.float32, shape=(num_features, None), name='A_0')  # Input placeholder
        Y = tf.placeholder(tf.float32, shape=(1, None), name='Y')  # Output placeholder
    
        # Initialize parameters
        parameters = initialize_parameters_deep(layer_dims)
    
        # Call the function for forward propagation
        Z_final = l_layer_forwardProp(A_0, parameters)
        res.append(Z_final)
    
        # Compute cost with regularization
    #     cost = final_cost(Z_final, Y, parameters, lambd=0.1)
        cost = final_cost(Z_final, Y, parameters, regularization = True)
        print(type(cost), cost)
        # Use Adam optimization to train the network
        train_net = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
    
        seed = 1
        num_minibatches = int(num_samples / mini_batch_size)  # Number of mini-batches
        init = tf.global_variables_initializer()
        costs = []
    
        with tf.Session() as sess:
            sess.run(init)
            for epoch in range(num_iter):
                epoch_cost = 0
    
                # Create mini-batches
                mini_batches = random_samples_minibatch(X_train, Y_train, mini_batch_size, seed)
    
                # Increment seed for randomness
                seed += 1
    
                # Perform gradient descent for each mini-batch
                for mini_batch in mini_batches:
                    X_batch, Y_batch = mini_batch  # Assign mini-batch
                    _, mini_batch_cost = sess.run([train_net, cost], feed_dict={A_0: X_batch, Y: Y_batch})
                    
                    epoch_cost += mini_batch_cost / num_minibatches
    
                # Store costs for plotting
                if epoch % 2 == 0:
                    costs.append(epoch_cost)
    
                # Print cost every 100 epochs
                if epoch % 100 == 0:
                    print("Cost after epoch {}: {}".format(epoch,epoch_cost))
    
            # Plot the cost
            plt.ylim(0, 2, 0.0001)
            plt.xlabel("Epochs (every 2)")
            plt.ylabel("Cost")
            plt.plot(costs)
            plt.show()
    
            params = sess.run(parameters)  # Get trained parameters
    
        return params
    
    

    Task 15: Train the model using the above defined function.

    Instructions:
    • Use X_data and y_data as training input, learning rate = 0.001, numiteration = 1000
    • minibatch size = 256
    • Return the trained parameters to variable parameters
    
    ###Start code
    # Define the layer dimensions
    layer_dims = [2, 20, 20, 20, 20, 1]  # Example configuration
    
    # Start code
    parameters = model_with_minibatch(X_data, y_data, layer_dims, learning_rate=0.001, num_iter=1000, mini_batch_size=256)
    ###End code
    
    # Output:
    '''
    <class 'tensorflow.python.framework.ops.Tensor'> Tensor("Mean:0", shape=(), dtype=float32)
    Cost after epoch 0: 1.0600778063138327
    Cost after epoch 100: 0.3384199837843577
    Cost after epoch 200: 0.22555001576741537
    Cost after epoch 300: 0.17129839956760406
    Cost after epoch 400: 0.13694358120361963
    Cost after epoch 500: 0.10687907536824545
    Cost after epoch 600: 0.08683766548832259
    Cost after epoch 700: 0.06888286024332047
    Cost after epoch 800: 0.05539845675230026
    Cost after epoch 900: 0.0474573497970899
    '''
    

    Task 16: Run the below cells to save your answers.

    
    batchnorm.save_func1(placeholders)
    batchnorm.save_func2(initialize_parameters_deep)
    batchnorm.save_func3(linear_forward_prop)
    batchnorm.save_func4(l_layer_forwardProp)
    batchnorm.save_func5(final_cost)
    batchnorm.save_func6(random_samples_minibatch)
    
    cost = 0.05539845675230026
    batchnorm.save_ans7(np.float64(cost)) 
    

    About the author

    D Shwari
    I'm a professor at National University's Department of Computer Science. My main streams are data science and data analysis. Project management for many computer science-related sectors. Next working project on Al with deep Learning.....

    Post a Comment