5. Sentiment Analysis¶

You should be able to do this exercise after Lecture 8.

In this exercise we use the IMDb-dataset, which we will use to perform a sentiment analysis. The code below assumes that the data is placed in the same folder as this notebook. We see that the reviews are loaded as a pandas dataframe, and print the beginning of the first few reviews.

In [1]:
%matplotlib inline

import pandas as pd
import numpy as np
from numpy.random import seed
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import regularizers, optimizers
from tensorflow.keras.utils import to_categorical

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

reviews = pd.read_csv('reviews.txt', header=None)
labels = pd.read_csv('labels.txt', header=None)
Y = (labels=='positive').astype(np.int_)

print(type(reviews))
print(reviews.head())
<class 'pandas.core.frame.DataFrame'>
                                                   0
0  bromwell high is a cartoon comedy . it ran at ...
1  story of a man who has unnatural feelings for ...
2  homelessness  or houselessness as george carli...
3  airport    starts as a brand new luxury    pla...
4  brilliant over  acting by lesley ann warren . ...

First we load the reviews and labels, and convert the labels from positive and negative to numerical values 0 and 1.

(a) Split the reviews and labels in test, train and validation sets. The train and validation sets will be used to train your model and tune hyperparameters, the test set will be saved for testing.

In [2]:
X_train_val, X_test, Y_train_val, Y_test = train_test_split(reviews, labels, random_state=69, stratify=labels)
X_train, X_val, Y_train, Y_val = train_test_split(X_train_val, Y_train_val, random_state=69)
print("Size of training set:{}".format(X_train.shape[0]))
print("Size of validation set:{}".format(X_val.shape[0]))
print("Size of test set:{}".format(X_test.shape[0]))
X_train.head()
Size of training set:14062
Size of validation set:4688
Size of test set:6250
Out[2]:
0
3720 i always follow the dakar so when my husband ...
21332 as with that film we follow the implausible if...
16618 one of several musicals about sailors on leave...
9428 whereas the hard boiled detective stories of ...
3067 when i first watched robotboy i found it fres...

(b) Use the CountVectorizer from sklearn.feature_extraction.text to create a Bag-of-Words representation of the reviews. (See an example of how to do this in chapter 7 of "Muller and Guido"). Only use the 10,000 most frequent words (use the max_features-parameter of CountVectorizer).

In [3]:
vect = CountVectorizer(max_features=10_000).fit(X_train[0])
X_train = vect.transform(X_train[0])
X_val = vect.transform(X_val[0])
X_test = vect.transform(X_test[0])

(c) Explore the representation of the reviews. How is a single word represented? How about a whole review?

Each review will be vectorized into an array of how many times each word is present in the review. I.e. [0,1,1] means that the 2nd and 3rd word in the bag of words is present. This is shown below, where the 7th review is shown (index 6) and it has a '1' on index 2, which means the word 'abandoned' is present.

In [6]:
display(X_train[0].toarray())
display(vect.get_feature_names_out())
array([[0, 0, 0, ..., 0, 0, 0]], dtype=int64)
array(['aaron', 'abandon', 'abandoned', ..., 'zoom', 'zorro', 'zu'],
      dtype=object)

(d) Train a neural network with a single hidden layer on the dataset, tuning the relevant hyperparameters to optimize accuracy.

In [7]:
X_train, X_test, Y_train, Y_test = train_test_split(reviews, labels, random_state=69, stratify=labels)

# Transform T_train and Y_test to dummy data
Y_train_dummies = pd.get_dummies(Y_train)
Y_train = Y_train_dummies['0_positive'].values
Y_test_dummies = pd.get_dummies(Y_test)
Y_test = Y_test_dummies['0_positive'].values

vect = CountVectorizer(max_features=10_000).fit(X_train[0])
X_train = vect.transform(X_train[0])
X_test = vect.transform(X_test[0])

We already split the dataset into test, train and validation, however, this is not nessecary as tensorflow already has a validation split parameter we can use. Therefore in the above code we simply split the dataset into test and train, and later in the fit function we set the _validationsplit=0.2 to 20% validation set.

We will also use early stopping, which allows the algorithm to stop at an optimal point where the accuracy is at its peak.

In [15]:
seed(69)
tf.random.set_seed(69)
input_size = 10_000

callback = tf.keras.callbacks.EarlyStopping(monitor = 'loss', patience = 3)

#initialize a neural network
model = Sequential()
# hidden layer
model.add(Dense(units=512, activation='tanh', input_dim=input_size, kernel_regularizer=regularizers.l2(0.001)))
# output layer
model.add(Dense(units=2, activation='softmax'))

sgd = optimizers.SGD(learning_rate = 0.1)

model.compile(loss='sparse_categorical_crossentropy', optimizer = sgd, metrics = ['accuracy'])

history = model.fit(X_train.toarray(), Y_train, epochs=50, batch_size=50, verbose=1, validation_split=0.2, callbacks=[callback])
Epoch 1/50
300/300 [==============================] - 8s 25ms/step - loss: 1.6220 - accuracy: 0.6623 - val_loss: 1.3302 - val_accuracy: 0.7987
Epoch 2/50
300/300 [==============================] - 8s 25ms/step - loss: 1.3492 - accuracy: 0.7482 - val_loss: 1.3220 - val_accuracy: 0.7208
Epoch 3/50
300/300 [==============================] - 8s 26ms/step - loss: 1.2026 - accuracy: 0.7842 - val_loss: 1.1175 - val_accuracy: 0.8069
Epoch 4/50
300/300 [==============================] - 8s 25ms/step - loss: 1.1001 - accuracy: 0.7996 - val_loss: 1.0199 - val_accuracy: 0.8237
Epoch 5/50
300/300 [==============================] - 8s 25ms/step - loss: 1.0133 - accuracy: 0.8127 - val_loss: 0.9309 - val_accuracy: 0.8565
Epoch 6/50
300/300 [==============================] - 8s 25ms/step - loss: 0.9249 - accuracy: 0.8304 - val_loss: 0.8541 - val_accuracy: 0.8563
Epoch 7/50
300/300 [==============================] - 8s 25ms/step - loss: 0.8530 - accuracy: 0.8393 - val_loss: 0.7861 - val_accuracy: 0.8688
Epoch 8/50
300/300 [==============================] - 8s 25ms/step - loss: 0.7947 - accuracy: 0.8444 - val_loss: 0.7355 - val_accuracy: 0.8691
Epoch 9/50
300/300 [==============================] - 8s 26ms/step - loss: 0.7427 - accuracy: 0.8494 - val_loss: 0.8121 - val_accuracy: 0.8000
Epoch 10/50
300/300 [==============================] - 8s 25ms/step - loss: 0.7248 - accuracy: 0.8411 - val_loss: 0.6676 - val_accuracy: 0.8723
Epoch 11/50
300/300 [==============================] - 8s 25ms/step - loss: 0.6724 - accuracy: 0.8550 - val_loss: 0.6582 - val_accuracy: 0.8539
Epoch 12/50
300/300 [==============================] - 8s 25ms/step - loss: 0.6526 - accuracy: 0.8472 - val_loss: 0.6464 - val_accuracy: 0.8520
Epoch 13/50
300/300 [==============================] - 8s 25ms/step - loss: 0.6080 - accuracy: 0.8577 - val_loss: 0.5917 - val_accuracy: 0.8661
Epoch 14/50
300/300 [==============================] - 8s 25ms/step - loss: 0.5883 - accuracy: 0.8577 - val_loss: 0.5784 - val_accuracy: 0.8603
Epoch 15/50
300/300 [==============================] - 8s 25ms/step - loss: 0.5689 - accuracy: 0.8595 - val_loss: 0.9543 - val_accuracy: 0.7096
Epoch 16/50
300/300 [==============================] - 8s 25ms/step - loss: 0.5372 - accuracy: 0.8664 - val_loss: 0.5799 - val_accuracy: 0.8341
Epoch 17/50
300/300 [==============================] - 8s 25ms/step - loss: 0.5048 - accuracy: 0.8739 - val_loss: 0.6544 - val_accuracy: 0.8024
Epoch 18/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4976 - accuracy: 0.8717 - val_loss: 0.5281 - val_accuracy: 0.8520
Epoch 19/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4847 - accuracy: 0.8724 - val_loss: 0.5001 - val_accuracy: 0.8627
Epoch 20/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4676 - accuracy: 0.8788 - val_loss: 0.4791 - val_accuracy: 0.8725
Epoch 21/50
300/300 [==============================] - 8s 26ms/step - loss: 0.4680 - accuracy: 0.8729 - val_loss: 0.5696 - val_accuracy: 0.8179
Epoch 22/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4678 - accuracy: 0.8718 - val_loss: 0.4728 - val_accuracy: 0.8688
Epoch 23/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4312 - accuracy: 0.8828 - val_loss: 0.5651 - val_accuracy: 0.8189
Epoch 24/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4337 - accuracy: 0.8763 - val_loss: 0.6795 - val_accuracy: 0.7277
Epoch 25/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4389 - accuracy: 0.8766 - val_loss: 0.5237 - val_accuracy: 0.8291
Epoch 26/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4091 - accuracy: 0.8854 - val_loss: 0.4616 - val_accuracy: 0.8643
Epoch 27/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4230 - accuracy: 0.8821 - val_loss: 0.5265 - val_accuracy: 0.8363
Epoch 28/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4086 - accuracy: 0.8835 - val_loss: 0.6940 - val_accuracy: 0.7435
Epoch 29/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4169 - accuracy: 0.8786 - val_loss: 0.4600 - val_accuracy: 0.8621
Epoch 30/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4122 - accuracy: 0.8794 - val_loss: 0.4711 - val_accuracy: 0.8515
Epoch 31/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4059 - accuracy: 0.8809 - val_loss: 0.4623 - val_accuracy: 0.8557
Epoch 32/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4188 - accuracy: 0.8763 - val_loss: 0.5176 - val_accuracy: 0.8325
Epoch 33/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3928 - accuracy: 0.8855 - val_loss: 0.4537 - val_accuracy: 0.8629
Epoch 34/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3862 - accuracy: 0.8921 - val_loss: 0.4812 - val_accuracy: 0.8528
Epoch 35/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3885 - accuracy: 0.8867 - val_loss: 0.5025 - val_accuracy: 0.8400
Epoch 36/50
300/300 [==============================] - 8s 26ms/step - loss: 0.3968 - accuracy: 0.8845 - val_loss: 0.4552 - val_accuracy: 0.8571
Epoch 37/50
300/300 [==============================] - 8s 26ms/step - loss: 0.3920 - accuracy: 0.8866 - val_loss: 0.4860 - val_accuracy: 0.8493

As we can see above, the model stopped early after 37 Epoch, instead of the 50 we specified.

In [16]:
plt.figure()
plt.title("Loss curves")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.plot(history.history['loss'], label = 'train')
plt.plot(history.history['val_loss'], label = 'valid')
plt.legend()
plt.show()
In [17]:
plt.figure()
plt.title("Accuracy curves")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.plot(history.history['accuracy'], label = 'train')
plt.plot(history.history['val_accuracy'], label = 'valid')
plt.legend()
plt.show()

After running the model we can evaluate the loss and accuracy. We can see that our model stops with a final validation accuracy of 85% and a loss of 0.4860. As can be seen on the diagrams the scores varies and spikes a lot. lowering the learning rate or adding more layers could help this and smooth out the curves.

We can do another run with a learning rate of 0.05 and 50 Epochs to see the effect.

In [18]:
input_size = 10_000

callback = tf.keras.callbacks.EarlyStopping(monitor = 'loss', patience = 3)

#initialize a neural network
model2 = Sequential()
# hidden layer
model2.add(Dense(units=512, activation='tanh', input_dim=input_size, kernel_regularizer=regularizers.l2(0.001)))
# output layer
model2.add(Dense(units=2, activation='softmax'))

sgd = optimizers.SGD(learning_rate = 0.05)

model2.compile(loss='sparse_categorical_crossentropy', optimizer = sgd, metrics = ['accuracy'])

history = model2.fit(X_train.toarray(), Y_train, epochs=50, batch_size=50, verbose=1, validation_split=0.2, callbacks=[callback])
Epoch 1/50
300/300 [==============================] - 8s 26ms/step - loss: 1.6119 - accuracy: 0.6753 - val_loss: 1.3679 - val_accuracy: 0.8187
Epoch 2/50
300/300 [==============================] - 7s 25ms/step - loss: 1.3867 - accuracy: 0.7697 - val_loss: 1.3731 - val_accuracy: 0.7459
Epoch 3/50
300/300 [==============================] - 7s 25ms/step - loss: 1.2794 - accuracy: 0.8019 - val_loss: 1.2478 - val_accuracy: 0.7968
Epoch 4/50
300/300 [==============================] - 7s 25ms/step - loss: 1.2034 - accuracy: 0.8205 - val_loss: 1.1508 - val_accuracy: 0.8365
Epoch 5/50
300/300 [==============================] - 7s 25ms/step - loss: 1.1304 - accuracy: 0.8341 - val_loss: 1.1724 - val_accuracy: 0.8083
Epoch 6/50
300/300 [==============================] - 7s 25ms/step - loss: 1.0713 - accuracy: 0.8465 - val_loss: 1.0245 - val_accuracy: 0.8640
Epoch 7/50
300/300 [==============================] - 7s 25ms/step - loss: 1.0253 - accuracy: 0.8471 - val_loss: 1.0172 - val_accuracy: 0.8448
Epoch 8/50
300/300 [==============================] - 7s 25ms/step - loss: 0.9791 - accuracy: 0.8576 - val_loss: 0.9375 - val_accuracy: 0.8699
Epoch 9/50
300/300 [==============================] - 8s 25ms/step - loss: 0.9314 - accuracy: 0.8665 - val_loss: 0.9276 - val_accuracy: 0.8581
Epoch 10/50
300/300 [==============================] - 8s 25ms/step - loss: 0.8871 - accuracy: 0.8679 - val_loss: 0.8680 - val_accuracy: 0.8763
Epoch 11/50
300/300 [==============================] - 8s 25ms/step - loss: 0.8454 - accuracy: 0.8738 - val_loss: 0.8394 - val_accuracy: 0.8739
Epoch 12/50
300/300 [==============================] - 8s 25ms/step - loss: 0.8242 - accuracy: 0.8731 - val_loss: 0.8431 - val_accuracy: 0.8595
Epoch 13/50
300/300 [==============================] - 8s 25ms/step - loss: 0.7797 - accuracy: 0.8809 - val_loss: 0.8114 - val_accuracy: 0.8616
Epoch 14/50
300/300 [==============================] - 8s 25ms/step - loss: 0.7570 - accuracy: 0.8831 - val_loss: 0.7686 - val_accuracy: 0.8683
Epoch 15/50
300/300 [==============================] - 8s 25ms/step - loss: 0.7235 - accuracy: 0.8875 - val_loss: 0.9167 - val_accuracy: 0.8011
Epoch 16/50
300/300 [==============================] - 8s 25ms/step - loss: 0.6963 - accuracy: 0.8883 - val_loss: 0.7884 - val_accuracy: 0.8288
Epoch 17/50
300/300 [==============================] - 8s 25ms/step - loss: 0.6533 - accuracy: 0.8994 - val_loss: 0.7225 - val_accuracy: 0.8637
Epoch 18/50
300/300 [==============================] - 8s 25ms/step - loss: 0.6355 - accuracy: 0.9010 - val_loss: 0.7082 - val_accuracy: 0.8613
Epoch 19/50
300/300 [==============================] - 8s 25ms/step - loss: 0.6068 - accuracy: 0.9058 - val_loss: 0.6745 - val_accuracy: 0.8707
Epoch 20/50
300/300 [==============================] - 8s 25ms/step - loss: 0.5923 - accuracy: 0.9067 - val_loss: 0.6519 - val_accuracy: 0.8749
Epoch 21/50
300/300 [==============================] - 8s 26ms/step - loss: 0.5794 - accuracy: 0.9029 - val_loss: 1.0427 - val_accuracy: 0.7077
Epoch 22/50
300/300 [==============================] - 7s 25ms/step - loss: 0.5613 - accuracy: 0.9049 - val_loss: 0.6366 - val_accuracy: 0.8701
Epoch 23/50
300/300 [==============================] - 8s 25ms/step - loss: 0.5329 - accuracy: 0.9115 - val_loss: 1.2642 - val_accuracy: 0.6667
Epoch 24/50
300/300 [==============================] - 7s 25ms/step - loss: 0.5265 - accuracy: 0.9107 - val_loss: 0.7618 - val_accuracy: 0.7971
Epoch 25/50
300/300 [==============================] - 7s 25ms/step - loss: 0.5054 - accuracy: 0.9150 - val_loss: 0.6086 - val_accuracy: 0.8632
Epoch 26/50
300/300 [==============================] - 7s 25ms/step - loss: 0.4705 - accuracy: 0.9231 - val_loss: 0.6847 - val_accuracy: 0.8344
Epoch 27/50
300/300 [==============================] - 7s 25ms/step - loss: 0.4829 - accuracy: 0.9193 - val_loss: 0.6838 - val_accuracy: 0.8376
Epoch 28/50
300/300 [==============================] - 7s 25ms/step - loss: 0.4573 - accuracy: 0.9212 - val_loss: 0.5926 - val_accuracy: 0.8691
Epoch 29/50
300/300 [==============================] - 7s 25ms/step - loss: 0.4543 - accuracy: 0.9197 - val_loss: 0.5779 - val_accuracy: 0.8688
Epoch 30/50
300/300 [==============================] - 7s 25ms/step - loss: 0.4422 - accuracy: 0.9246 - val_loss: 0.5949 - val_accuracy: 0.8547
Epoch 31/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4240 - accuracy: 0.9297 - val_loss: 0.6040 - val_accuracy: 0.8571
Epoch 32/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4287 - accuracy: 0.9242 - val_loss: 0.6082 - val_accuracy: 0.8501
Epoch 33/50
300/300 [==============================] - 8s 25ms/step - loss: 0.4125 - accuracy: 0.9308 - val_loss: 0.5575 - val_accuracy: 0.8680
Epoch 34/50
300/300 [==============================] - 8s 26ms/step - loss: 0.3842 - accuracy: 0.9361 - val_loss: 0.5787 - val_accuracy: 0.8557
Epoch 35/50
300/300 [==============================] - 7s 25ms/step - loss: 0.3846 - accuracy: 0.9360 - val_loss: 0.5532 - val_accuracy: 0.8525
Epoch 36/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3767 - accuracy: 0.9374 - val_loss: 0.5956 - val_accuracy: 0.8461
Epoch 37/50
300/300 [==============================] - 8s 26ms/step - loss: 0.3985 - accuracy: 0.9277 - val_loss: 0.6476 - val_accuracy: 0.8224
Epoch 38/50
300/300 [==============================] - 8s 26ms/step - loss: 0.3588 - accuracy: 0.9405 - val_loss: 0.5793 - val_accuracy: 0.8467
Epoch 39/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3535 - accuracy: 0.9430 - val_loss: 0.5505 - val_accuracy: 0.8621
Epoch 40/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3542 - accuracy: 0.9433 - val_loss: 0.5214 - val_accuracy: 0.8584
Epoch 41/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3617 - accuracy: 0.9377 - val_loss: 0.5226 - val_accuracy: 0.8653
Epoch 42/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3518 - accuracy: 0.9439 - val_loss: 0.6717 - val_accuracy: 0.8069
Epoch 43/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3387 - accuracy: 0.9417 - val_loss: 0.6040 - val_accuracy: 0.8341
Epoch 44/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3298 - accuracy: 0.9434 - val_loss: 0.5819 - val_accuracy: 0.8475
Epoch 45/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3256 - accuracy: 0.9422 - val_loss: 0.5307 - val_accuracy: 0.8408
Epoch 46/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3106 - accuracy: 0.9499 - val_loss: 0.6801 - val_accuracy: 0.8144
Epoch 47/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3113 - accuracy: 0.9480 - val_loss: 0.7106 - val_accuracy: 0.7659
Epoch 48/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3168 - accuracy: 0.9469 - val_loss: 0.4893 - val_accuracy: 0.8643
Epoch 49/50
300/300 [==============================] - 8s 25ms/step - loss: 0.2864 - accuracy: 0.9549 - val_loss: 0.5341 - val_accuracy: 0.8619
Epoch 50/50
300/300 [==============================] - 8s 25ms/step - loss: 0.3178 - accuracy: 0.9475 - val_loss: 0.5115 - val_accuracy: 0.8643

We can see that with a lower learning rate, the final accuracy was 86% and loss 0.5115, which is very close to the previous run.

In [19]:
plt.figure()
plt.title("Loss curves")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.plot(history.history['loss'], label = 'train')
plt.plot(history.history['val_loss'], label = 'valid')
plt.legend()
plt.show()
In [20]:
plt.figure()
plt.title("Accuracy curves")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.plot(history.history['accuracy'], label = 'train')
plt.plot(history.history['val_accuracy'], label = 'valid')
plt.legend()
plt.show()

The second run is about the same as the previous, it is still spiky and the accuracy scores is about the same. The first run stopped early, however the second did notso running it for longer would probably give slightly better results as the accuracy and loss evened out.

(e) Test your sentiment-classifier on the test set.

We can evaluate both of our models to validate the accuracy and loss. First we validate the initial run and then the second run.

In [22]:
print("Loss + accuracy on train data: {}".format(model.evaluate(X_train, Y_train)))
print("Loss + accuracy on test data: {}".format(model.evaluate(X_test, Y_test)))
print()
print("Loss + accuracy on train data: {}".format(model2.evaluate(X_train, Y_train)))
print("Loss + accuracy on test data: {}".format(model2.evaluate(X_test, Y_test)))
586/586 [==============================] - 5s 8ms/step - loss: 0.3414 - accuracy: 0.9101
Loss + accuracy on train data: [0.3413655161857605, 0.9100800156593323]
196/196 [==============================] - 2s 8ms/step - loss: 0.4860 - accuracy: 0.8478
Loss + accuracy on test data: [0.48597320914268494, 0.8478400111198425]

586/586 [==============================] - 5s 8ms/step - loss: 0.2785 - accuracy: 0.9603
Loss + accuracy on train data: [0.27854329347610474, 0.960319995880127]
196/196 [==============================] - 2s 8ms/step - loss: 0.5076 - accuracy: 0.8693
Loss + accuracy on test data: [0.5076231360435486, 0.8692799806594849]

Doing a final evaluation on the training and test sets, we can compare them and see that the second model did slightly better.

(h) Use the classifier to classify a few sentences you write yourselves.

In [23]:
my_reviews = [
    "I don't like whatever the thing I am reviewing is about, it is simply terrible. Nothing about it is good it's all bad.",
    "I like whatever I am reviewing as it is very nice and fun. It has a good theme and pacing i believe."
]
my_reviews = vect.transform(my_reviews)

# Expects first review to be negative and second one positive.
model.predict(my_reviews)
1/1 [==============================] - 0s 73ms/step
Out[23]:
array([[0.99781275, 0.00218725],
       [0.05843586, 0.94156414]], dtype=float32)

As the prediction show, the first review is 99% negative and the second review is 94% positive, which is very good.

In [ ]: