0% found this document useful (0 votes)
21 views

Integer-Encoding-Simplernn - Ipynb - Colaboratory

The document shows how to tokenize text documents using Keras and build a simple RNN model to classify IMDB movie reviews. Text documents are tokenized and padded before being used to train an RNN model with one hidden layer and a dense output layer.

Uploaded by

noorullahmd461
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Integer-Encoding-Simplernn - Ipynb - Colaboratory

The document shows how to tokenize text documents using Keras and build a simple RNN model to classify IMDB movie reviews. Text documents are tokenized and padded before being used to train an RNN model with one hidden layer and a dense output layer.

Uploaded by

noorullahmd461
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1 import numpy as np

2
3 docs = ['go india',
4 'india india',
5 'hip hip hurray',
6 'jeetega bhai jeetega india jeetega',
7 'bharat mata ki jai',
8 'kohli kohli',
9 'sachin sachin',
10 'dhoni dhoni',
11 'modi ji ki jai',
12 'inquilab zindabad']

1 from keras.preprocessing.text import Tokenizer


2 tokenizer = Tokenizer(oov_token='<nothing>')

1 tokenizer.fit_on_texts(docs)

1 tokenizer.word_index

{'<nothing>': 1,
'india': 2,
'jeetega': 3,
'hip': 4,
'ki': 5,
'jai': 6,
'kohli': 7,
'sachin': 8,
'dhoni': 9,
'go': 10,
'hurray': 11,
'bhai': 12,
'bharat': 13,
'mata': 14,
'modi': 15,
'ji': 16,
'inquilab': 17,
'zindabad': 18}

1 tokenizer.word_counts

OrderedDict([('go', 1),
('india', 4),
('hip', 2),
('hurray', 1),
('jeetega', 3),
('bhai', 1),
('bharat', 1),
('mata', 1),
('ki', 2),
('jai', 2),
('kohli', 2),
('sachin', 2),
('dhoni', 2),
('modi', 1),
('ji', 1),
('inquilab', 1),
('zindabad', 1)])

1 tokenizer.document_count

10
1 sequences = tokenizer.texts_to_sequences(docs)
2 sequences

[[10, 2],
[2, 2],
[4, 4, 11],
[3, 12, 3, 2, 3],
[13, 14, 5, 6],
[7, 7],
[8, 8],
[9, 9],
[15, 16, 5, 6],
[17, 18]]

1 from keras.utils import pad_sequences

1 sequences = pad_sequences(sequences,padding='post')

1 sequences

array([[10, 2, 0, 0, 0],
[ 2, 2, 0, 0, 0],
[ 4, 4, 11, 0, 0],
[ 3, 12, 3, 2, 3],
[13, 14, 5, 6, 0],
[ 7, 7, 0, 0, 0],
[ 8, 8, 0, 0, 0],
[ 9, 9, 0, 0, 0],
[15, 16, 5, 6, 0],
[17, 18, 0, 0, 0]], dtype=int32)

1 from keras.datasets import imdb


2 from keras import Sequential
3 from keras.layers import Dense,SimpleRNN,Embedding,Flatten

1 (X_train,y_train),(X_test,y_test) = imdb.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


17464789/17464789 [==============================] - 1s 0us/step

1 X_train[0]

[1,
14,
22,
16,
43,
530,
973,
1622,
1385,
65,
458,
4468,
66,
3941,
4,
173,
36,
256,
5,
25,
100,
43,
838,
112,
50,
670,
22665,
9,
35,
480,
284,
5,
150,
4,
172,
112,
167,
21631,
336,
385,
39,
4,
172,
4536,
1111,
17,
546,
38,
13,
447,
4,
192,
50,
16,
6,
147,
2025,
19,
14
1 len(X_train[2])

141

1 X_train = pad_sequences(X_train,padding='post',maxlen=50)
2 X_test = pad_sequences(X_test,padding='post',maxlen=50)

1 X_train[0]

array([2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22,


21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51,
36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38,
1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32,
15, 16, 5345, 19, 178, 32], dtype=int32)

1 model = Sequential()
2
3 model.add(SimpleRNN(32,input_shape=(50,1),return_sequences=False))
4 model.add(Dense(1,activation='sigmoid'))
5
6 model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn (SimpleRNN) (None, 32) 1088

dense (Dense) (None, 1) 33

=================================================================
Total params: 1,121
Trainable params: 1,121
Non-trainable params: 0
_________________________________________________________________

1 model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
2
3 model.fit(X_train,y_train,epochs=5,validation_data=(X_test,y_test))

Epoch 1/5
782/782 [==============================] - 10s 12ms/step - loss: 0.6929 - accuracy: 0.5061 - va
Epoch 2/5
782/782 [==============================] - 9s 12ms/step - loss: 0.6931 - accuracy: 0.5031 - val
Epoch 3/5
782/782 [==============================] - 9s 12ms/step - loss: 0.6927 - accuracy: 0.5085 - val
Epoch 4/5
782/782 [==============================] - 9s 12ms/step - loss: 0.6927 - accuracy: 0.5090 - val
Epoch 5/5
782/782 [==============================] - 9s 12ms/step - loss: 0.6928 - accuracy: 0.5056 - val
<keras.callbacks.History at 0x7f8dc97f8810>

Colab paid products - Cancel contracts here

You might also like