ML – A Novice Series

For the last couple of weeks, I have been trying to learn more about machine learning. The obvious path was to soak up as much as I could from various blog posts, but I wasn’t getting everything that I needed. I bought this book by Joel Grus Data Science from Scratch. It was really good and gave me a great introduction into the concepts and terms. I read that front to back and read some chapters many (many) times. I felt like I was off to a good start, but I felt like I needed more textbook style content. Deep Learning with Python turned out to be that book.

Deep Learning with Python turned is a book from 2018 by Francois Chollet, the creator of Keras. As I am going through the chapters, I am going to post here about what I understood about the text and an example if possible.

I watched a PluralSight video on ML and it talked about a google site called Colab. I had been using Kaggle, but I think Colab has more power. The first thing I noticed on Colab was the code completion. This tool was built for engineers so I should not have been surprised, they really did a great job. Did I mention that it is free, you only need a Google account.

Moving on. Disclaimer: I may make mistakes or omit pertinent concepts, but I am learning at the same time. After going over Tensors and what they are, the books jumps into a classification problem. This is a binary classification because it is reviewing reviews from IMDB to determine if the are positive or negative. Since there is only two states (positive/negative), this is defined as a binary classification. Another cool thing about Keras is that it comes with datasets for you to experiment with.

Starting with the IMDB import, you can extract out your training and testing datasets.

from keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels)  = imdb.load_data(num_words = 10000)

The load_dataset call takes an integer which specifies, like it name says, the number of frequent words that you want to load. I think it is obvious that the data is broken into training and testing sets. What might not be obvious is why the data and label are stored as a pair. If you think about a the basic linear equation, you have y= mx + b. In this case, you are given the x and the y (Data/Label) and you need to solve for the m and b. Thinking back to algebra we remember that m is equal to the slope of the line and b is the offset. We need to find the m and b to make the question correct. This is what the neural network will do. You will give it an equation and it will solve for the remaining variables and it will even adjust the m and b until it gets to a certain level of accuracy. Word soup.

Again, now that we have our training and test data partitioned we can start to see how we can use it to train the machine to predict if a review is positive or negative. Since a network can only take a number and more specifically a tensor, we need to change the words to a vector. There is a lot of information between where we are now and where we want to be, so I will just show how we do that.

def vectorize_sequences(sequences, dimension = 10000):
  results = np.zeros((len(sequences),dimension))
  for i, sequence in enumerate(sequences):
    results[i,sequence] = 1
  return results

This creates a tensor of 10000 entries and sets them all to zero. Then it loops over all of the data in the training data and sets the cell to 1 where the word is present in the sequence. The sequence is simply a list filled with the indices of the position of the words. Now that we can turn our individual lists into a tensor we can convert our training and test data to a collection of these tensors:

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

We need to make our labels a 1 dimension array of floats:

y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

Our data is now prepared and ready to go, so we need to configure our network. There is a key concept here that needs to be understood, activation function. The activation function is what determines whether the output of the neuron is 0 or 1. There are many different types of activation functions, but in this example he used the Rectified Linear Unit (relu) function. You can check out the link for more information.

Since a neural network consists of the input, output and one ore more hidden layers, we will need to do that configuration.

#We have to make sure that import the model and layer objects
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(16, activation='relu',input_shape = (10000,))) 
model.add(layers.Dense(16, activation='relu')) 
model.add(layers.Dense(1, activation='sigmoid')) 

Ok, we have set up the model. Let’s break it down. Since we are working in layers, we define this model to be sequential. Then we configure two hidden layers that will be 16 dimensions which my understanding is that there will be 16 neurons, could be wrong. We also define the shape of the input data, which is our 10,000 element wide tensor. Lastly, since this is a binary classification we will only have a single output.

Before we need to send our training data through the model, we need to compile it. We will compile it to use the RMS Prop optimizer function and the loss function of Binary Cross Entropy. These work well for classification problems. The last parameter will allow us to get some data in the form of history as the machine runs through its trials.

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

We will set aside the first 10,000 sequences for training and the rest for validation.

x_val = x_train[:10000]
partial_x_train = x_train[10000:]

y_val = y_train[:10000]
partial_y_train = y_train[10000:]

Now we are ready to train the machine! We need to define how many iterations it will attempt to train and the batch size. We also submit the validation data.

history = model.fit(partial_x_train, partial_y_train, epochs=5, batch_size=512, validation_data=(x_val, y_val))

Executing this command will start the machine to learn that given an input X, the model predicts Y. As it loops over, you should see an output like:

Epoch 1/5
30/30 [==============================] - 2s 58ms/step - loss: 0.5071 - accuracy: 0.7931 - val_loss: 0.3831 - val_accuracy: 0.8645

Then for visualization we can look at the training loss and the validation loss:

import matplotlib.pyplot as plt

loss_values =  history_dict['loss']
val_loss_values = history_dict['val_loss']
 
epochs = range(1,21)

plt.plot(epochs,loss_values, 'bo',label='Training Loss')
plt.plot(epochs,val_loss_values, 'b',label='Validation Loss')

plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

You should see something similar to

When I ran through this example, each time I received different values and this is because of the random sampling. The only thing left is to test out our machine and see how it performs on the test data.

model.predict(x_test)

Preview(opens in a new tab)

When I print out the results, I get so so results.

[[0.06688562]
 [0.99754685]
 [0.9261165 ]
 ...
 [0.11254096]
 [0.04644495]
 [0.90793926]]

You can see that some are good and some are terrible. I guess that makes sense since so much of what people right cannot be simply distilled down to positive or negative, but this is still impressive. The words in the post are mine and not copied from any other source, but all credit goes to Francois Chollet. Now this was a simple binary classification, the next on is a multiple classification problem where the answer can be 1 of 46 different classes. Here is the colab. Anyways, I will do more reading and report back. The full code is also on github.