Keras - TF2 - Book
Keras - TF2 - Book
Keras - TF2 - Book
Deep learning frameworks ranking computed by Jeff Hale, based on 11 data sources across 7 categories
With over 250,000 individual users as of mid-2018, Keras has stronger adoption in both the industry
and the research community than any other deep learning framework except TensorFlow itself (and the
Keras API is the official frontend of TensorFlow, via the tf.keras module).
You are already constantly interacting with features built with Keras -- it is in use at Netflix, Uber,
Yelp, Instacart, Zocdoc, Square, and many others. It is especially popular among startups that place
deep learning at the core of their products.
Keras is also a favorite among deep learning researchers, coming in #2 in terms of mentions in
scientific papers uploaded to the preprint server arXiv.org. Keras has also been adopted by researchers
at large scientific organizations, in particular CERN and NASA.
Keras supports multiple backend engines and does not lock you
into one ecosystem
Your Keras models can be developed with a range of different deep learning backends. Importantly,
any Keras model that only leverages built-in layers will be portable across all these backends: you can
train a model with one backend, and load it with another (e.g. for deployment). Available backends
include:
• The TensorFlow backend (from Google)
• The CNTK backend (from Microsoft)
• The Theano backend
Amazon also has a fork of Keras which uses MXNet as backend.
As such, your Keras model can be trained on a number of different hardware platforms beyond CPUs:
• NVIDIA GPUs
• Google TPUs, via the TensorFlow backend and Google Cloud
• OpenCL-enabled GPUs, such as those from AMD, via the PlaidML Keras backend
Keras has strong multi-GPU support and distributed training
support
• Keras has built-in support for multi-GPU data parallelism
• Horovod, from Uber, has first-class support for Keras models
• Keras models can be turned into TensorFlow Estimators and trained on clusters of GPUs on
Google Cloud
• Keras can be run on Spark via Dist-Keras (from CERN) and Elephas
Bugs present in multi-backend Keras will only be fixed until April 2020 (as part of minor releases).
For more information about the future of Keras, see the Keras meeting notes.
Guiding principles
• User friendliness. Keras is an API designed for human beings, not machines. It puts user
experience front and center. Keras follows best practices for reducing cognitive load: it offers
consistent & simple APIs, it minimizes the number of user actions required for common use
cases, and it provides clear and actionable feedback upon user error.
• Modularity. A model is understood as a sequence or a graph of standalone, fully configurable
modules that can be plugged together with as few restrictions as possible. In particular, neural
layers, cost functions, optimizers, initialization schemes, activation functions and regularization
schemes are all standalone modules that you can combine to create new models.
• Easy extensibility. New modules are simple to add (as new classes and functions), and existing
modules provide ample examples. To be able to easily create new modules allows for total
expressiveness, making Keras suitable for advanced research.
• Work with Python. No separate models configuration files in a declarative format. Models are
described in Python code, which is compact, easier to debug, and allows for ease of
extensibility.
model = Sequential()
Once your model looks good, configure its learning process with .compile():
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
If you need to, you can further configure your optimizer. A core principle of Keras is to make things
reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control
being the easy extensibility of the source code).
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True))
Building a question answering system, an image classification model, a Neural Turing Machine, or any
other model is just as fast. The ideas behind deep learning are simple, so why should their
implementation be painful?
For a more in-depth tutorial about Keras, you can check out:
• Getting started with the Sequential model
• Getting started with the functional API
In the examples folder of the repository, you will find more advanced models: question-answering with
memory networks, text generation with stacked LSTMs, etc.
Installation
Before installing Keras, please install one of its backend engines: TensorFlow, Theano, or CNTK. We
recommend the TensorFlow backend.
• TensorFlow installation instructions.
• Theano installation instructions.
• CNTK installation instructions.
You may also consider installing the following optional dependencies:
• cuDNN (recommended if you plan on running Keras on GPU).
• HDF5 and h5py (required if you plan on saving Keras models to disk).
• graphviz and pydot (used by visualization utilities to plot model graphs).
Then, you can install Keras itself. There are two ways to install Keras:
• Install Keras from PyPI (recommended):
Note: These installation steps assume that you are on a Linux or Mac environment. If you are on
Windows, you will need to remove sudo to run the commands below.
sudo pip install keras
If you are using a virtualenv, you may want to avoid using sudo:
pip install keras
Support
You can ask questions and join the development discussion:
• On the Keras Google group.
• On the Keras Slack channel. Use this link to request an invitation to the channel.
You can also post bug reports and feature requests (only) in GitHub issues. Make sure to read our
guidelines first.
You can create a Sequential model by passing a list of layer instances to the constructor:
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])
You can also simply add layers via the .add() method:
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
model = Sequential()
model.add(Dense(32, input_dim=784))
Compilation
Before training a model, you need to configure the learning process, which is done via the compile
method. It receives three arguments:
• An optimizer. This could be the string identifier of an existing optimizer (such as rmsprop or
adagrad), or an instance of the Optimizer class. See: optimizers.
• A loss function. This is the objective that the model will try to minimize. It can be the string
identifier of an existing loss function (such as categorical_crossentropy or mse), or it
can be an objective function. See: losses.
• A list of metrics. For any classification problem you will want to set this to
metrics=['accuracy']. A metric could be the string identifier of an existing metric or a
custom metric function. See: metrics.
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy', mean_pred])
Training
Keras models are trained on Numpy arrays of input data and labels. For training a model, you will
typically use the fit function. Read its documentation here.
# For a single-input model with 2 classes (binary classification):
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Examples
Here are a few examples to get you started!
In the examples folder, you will also find example models for real datasets:
• CIFAR10 small images classification: Convolutional Neural Network (CNN) with realtime data
augmentation
• IMDB movie review sentiment classification: LSTM over sequences of words
• Reuters newswires topic classification: Multilayer Perceptron (MLP)
• MNIST handwritten digits classification: MLP & CNN
• Character-level text generation with LSTM
...and more.
model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.fit(x_train, y_train,
epochs=20,
batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
model = Sequential()
model.add(Dense(64, input_dim=20, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=20,
batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
VGG-like convnet:
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD
model = Sequential()
# input: 100x100 images with 3 channels -> (100, 100, 3) tensors.
# this applies 32 convolution filters of size 3x3 each.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(100, 100, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
max_features = 1024
model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
seq_length = 64
model = Sequential()
model.add(Conv1D(64, 3, activation='relu', input_shape=(seq_length, 100)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(128, 3, activation='relu'))
model.add(Conv1D(128, 3, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
data_dim = 16
timesteps = 8
num_classes = 10
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
data_dim = 16
timesteps = 8
num_classes = 10
batch_size = 32
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size, epochs=5, shuffle=False,
validation_data=(x_val, y_val))
This can allow, for instance, to quickly create models that can process sequences of inputs. You could
turn an image classification model into a video classification model, in just one line.
from keras.layers import TimeDistributed
# This applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)
# Headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
Here we insert the auxiliary loss, allowing the LSTM and Embedding layer to be trained smoothly even
though the main loss will be much higher in the model.
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
At this point, we feed into the model our auxiliary input data by concatenating it with the LSTM
output:
auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])
We compile the model and assign a weight of 0.2 to the auxiliary loss. To specify different
loss_weights or loss for each different output, you can use a list or a dictionary. Here we pass a
single loss as the loss argument, so the same loss will be used on all outputs.
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])
We can train the model by passing it lists of input arrays and target arrays:
headline_data = np.round(np.abs(np.random.rand(12, 100) * 100))
additional_data = np.random.randn(12, 5)
headline_labels = np.random.randn(12, 1)
additional_labels = np.random.randn(12, 1)
model.fit([headline_data, additional_data], [headline_labels, additional_labels],
epochs=50, batch_size=32)
Since our inputs and outputs are named (we passed them a "name" argument), we could also have
compiled the model via:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output':
'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
or alternatively,
pred = model.predict([headline_data, additional_data])
Shared layers
Another good use for the functional API are models that use shared layers. Let's take a look at shared
layers.
Let's consider a dataset of tweets. We want to build a model that can tell whether two tweets are from
the same person or not (this can allow us to compare users by the similarity of their tweets, for
instance).
One way to achieve this is to build a model that encodes two tweets into two vectors, concatenates the
vectors and then adds a logistic regression; this outputs a probability that the two tweets share the same
author. The model would then be trained on positive tweet pairs and negative tweet pairs.
Because the problem is symmetric, the mechanism that encodes the first tweet should be reused
(weights and all) to encode the second tweet. Here we use a shared LSTM layer to encode the tweets.
Let's build this with the functional API. We will take as input for a tweet a binary matrix of shape
(280, 256), i.e. a sequence of 280 vectors of size 256, where each dimension in the 256-
dimensional vector encodes the presence/absence of a character (out of an alphabet of 256 frequent
characters).
import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model
To share a layer across different inputs, simply instantiate the layer once, then call it on as many inputs
as you want:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)
Let's pause to take a look at how to read the shared layer's output or output shape.
lstm = LSTM(32)
encoded_a = lstm(a)
lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)
lstm.output
conved_b = conv(b)
# now the `.input_shape` property wouldn't work, but this does:
assert conv.get_input_shape_at(0) == (None, 32, 32, 3)
assert conv.get_input_shape_at(1) == (None, 64, 64, 3)
More examples
Code examples are still the best way to get started, so here are a few more.
Inception module
For more information about the Inception architecture, see Going Deeper with Convolutions.
from keras.layers import Conv2D, MaxPooling2D, Input
# Now let's get a tensor with the output of our vision model:
image_input = Input(shape=(224, 224, 3))
encoded_image = vision_model(image_input)
# Next, let's define a language model to encode the question into a vector.
# Each question will be at most 100 words long,
# and we will index words as integers from 1 to 9999.
question_input = Input(shape=(100,), dtype='int32')
embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)
(question_input)
encoded_question = LSTM(256)(embedded_question)
The name 'gpu' might have to be changed depending on your device's identifier (e.g. gpu0, gpu1, etc).
Device parallelism
Device parallelism consists in running different parts of a same model on different devices. It works
best for models that have a parallel architecture, e.g. a model with two branches.
This can be achieved by using TensorFlow device scopes. Here is a quick example:
# Model where a shared LSTM is used to encode two different sequences in parallel
input_a = keras.Input(shape=(140, 256))
input_b = keras.Input(shape=(140, 256))
shared_lstm = keras.layers.LSTM(64)
Please also see How can I install HDF5 or h5py to save my models in Keras? for instructions on how to
install h5py.
Saving/loading only a model's architecture
If you only need to save the architecture of a model, and not its weights or its training configuration,
you can do:
# save as JSON
json_string = model.to_json()
# save as YAML
yaml_string = model.to_yaml()
The generated JSON / YAML files are human-readable and can be manually edited if needed.
You can then build a fresh model from this data:
# model reconstruction from JSON:
from keras.models import model_from_json
model = model_from_json(json_string)
Assuming you have code for instantiating your model, you can then load the weights you saved into a
model with the same architecture:
model.load_weights('my_model_weights.h5')
If you need to load the weights into a different architecture (with some layers in common), for instance
for fine-tuning or transfer-learning, you can load them by layer name:
model.load_weights('my_model_weights.h5', by_name=True)
Example:
"""
Assuming the original model looks like this:
model = Sequential()
model.add(Dense(2, input_dim=3, name='dense_1'))
model.add(Dense(3, name='dense_2'))
...
model.save_weights(fname)
"""
# new model
model = Sequential()
model.add(Dense(2, input_dim=3, name='dense_1')) # will be loaded
model.add(Dense(10, name='new_dense')) # will not be loaded
# load weights from first model; will only affect the first layer, dense_1.
model.load_weights(fname, by_name=True)
Please also see How can I install HDF5 or h5py to save my models in Keras? for instructions on how to
install h5py.
Custom objects handling works the same way for load_model, model_from_json,
model_from_yaml:
from keras.models import model_from_json
model = model_from_json(json_string, custom_objects={'AttentionLayer':
AttentionLayer})
Why is the training loss much higher than the testing loss?
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and
L1/L2 weight regularization, are turned off at testing time.
Besides, the training loss is the average of the losses over each batch of training data. Because your
model is changing over time, the loss over the first batches of an epoch is generally higher than over
the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at
the end of the epoch, resulting in a lower loss.
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a
certain input, for example:
from keras import backend as K
How can I use Keras with datasets that don't fit in memory?
You can do batch training using model.train_on_batch(x, y) and
model.test_on_batch(x, y). See the models documentation.
Alternatively, you can write a generator that yields batches of training data and use the method
model.fit_generator(data_generator, steps_per_epoch, epochs).
How can I interrupt training when the validation loss isn't decreasing anymore?
You can use an EarlyStopping callback:
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
model.fit(x, y, validation_split=0.2, callbacks=[early_stopping])
How can I record the training / validation loss / accuracy at each epoch?
The model.fit method returns a History callback, which has a history attribute containing the
lists of successive losses and other metrics.
hist = model.fit(x, y, validation_split=0.2)
print(hist.history)
Additionally, you can set the trainable property of a layer to True or False after instantiation.
For this to take effect, you will need to call compile() on your model after modifying the
trainable property. Here's an example:
x = Input(shape=(32,))
layer = Dense(32)
layer.trainable = False
y = layer(x)
frozen_model = Model(x, y)
# in the model below, the weights of `layer` will not be updated during training
frozen_model.compile(optimizer='rmsprop', loss='mse')
layer.trainable = True
trainable_model = Model(x, y)
# with this model the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model.compile(optimizer='rmsprop', loss='mse')
Example:
x # this is our input data, of shape (32, 21, 16)
# we will feed it to our model in sequences of length 10
model = Sequential()
model.add(LSTM(32, input_shape=(10, 16), batch_size=32, stateful=True))
model.add(Dense(16, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# we train the network to predict the 11th timestep given the first 10:
model.train_on_batch(x[:, :10, :], np.reshape(x[:, 10, :], (32, 16)))
# the state of the network has changed. We can feed the follow-up sequences:
model.train_on_batch(x[:, 10:20, :], np.reshape(x[:, 20, :], (32, 16)))
Note that the methods predict, fit, train_on_batch, predict_classes, etc. will all
update the states of the stateful layers in a model. This allows you to do not only stateful training, but
also stateful prediction.
print(len(model.layers)) # "2"
model.pop()
print(len(model.layers)) # "1"
For a few simple usage examples, see the documentation for the Applications module.
For a detailed example of how to use such a pre-trained model for feature extraction or for fine-tuning,
see this blog post.
The VGG16 model is also the basis for several Keras example scripts:
• Style transfer
• Feature visualization
• Deep dream
Please also see How can I install HDF5 or h5py to save my models in Keras? for instructions on how to
install h5py.
The Keras configuration file is a JSON file stored at $HOME/.keras/keras.json. The default
configuration file looks like this:
{
"image_data_format": "channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
Moreover, when using the TensorFlow backend and running on a GPU, some operations have non-
deterministic outputs, in particular tf.reduce_sum(). This is due to the fact that GPUs run many
operations in parallel, so the order of execution is not always guaranteed. Due to the limited precision
of floats, even adding several numbers together may give slightly different results depending on the
order in which you add them. You can try to avoid the non-deterministic operations, but some may be
created automatically by TensorFlow to compute the gradients, so it is much simpler to just run the
code on the CPU. For this, you can set the CUDA_VISIBLE_DEVICES environment variable to an
empty string, for example:
$ CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python your_program.py
The below snippet of code provides an example of how to obtain reproducible results - this is geared
towards a TensorFlow backend for a Python 3 environment:
import numpy as np
import tensorflow as tf
import random as rn
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
tf.set_random_seed(1234)
If you are unsure if h5py is installed you can open a Python shell and load the module via
import h5py
If it imports without error it is installed, otherwise you can find detailed installation instructions here:
http://docs.h5py.org/en/latest/build.html
• model.get_weights() returns a list of all weight tensors in the model, as Numpy arrays.
• model.set_weights(weights) sets the values of the weights of the model, from a list of
Numpy arrays. The arrays in the list should have the same shape as those returned by
get_weights().
• model.to_json() returns a representation of the model as a JSON string. Note that the
representation does not include the weights, only the architecture. You can reinstantiate the
same model (with reinitialized weights) from the JSON string via:
from keras.models import model_from_json
json_string = model.to_json()
model = model_from_json(json_string)
• model.to_yaml() returns a representation of the model as a YAML string. Note that the
representation does not include the weights, only the architecture. You can reinstantiate the
same model (with reinitialized weights) from the YAML string via:
from keras.models import model_from_yaml
yaml_string = model.to_yaml()
model = model_from_yaml(yaml_string)
Note: Please also see How can I install HDF5 or h5py to save my models in Keras? in the FAQ for
instructions on how to install h5py.
Model subclassing
In addition to these two types of models, you may create your own fully-customizable models by
subclassing the Model class and implementing your own forward pass in the call method (the
Model subclassing API was introduced in Keras 2.2.0).
class SimpleMLP(keras.Model):
model = SimpleMLP()
model.compile(...)
model.fit(...)
Layers are defined in __init__(self, ...), and the forward pass is specified in call(self,
inputs). In call, you may specify custom losses by calling self.add_loss(loss_tensor)
(like you would in a custom layer).
In subclassed models, the model's topology is defined as Python code (rather than as a static graph of
layers). That means the model's topology cannot be inspected or serialized. As a result, the following
methods and attributes are not available for subclassed models:
• model.inputs and model.outputs.
• model.to_yaml() and model.to_json()
• model.get_config() and model.save().
Key point: use the right API for the job. The Model subclassing API can provide you with greater
flexbility for implementing complex models, but it comes at a cost (in addition to these missing
features): it is more verbose, more complex, and has more opportunities for user errors. If possible,
prefer using the functional API, which is more user-friendly.
Raises
• ValueError: In case of invalid arguments for optimizer, loss, metrics or
sample_weight_mode.
fit
fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None,
validation_split=0.0, validation_data=None, shuffle=True, class_weight=None,
sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None,
validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False)
• shuffle: Boolean (whether to shuffle the training data before each epoch) or str (for 'batch').
'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-
sized chunks. Has no effect when steps_per_epoch is not None.
• class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value,
used for weighting the loss function (during training only). This can be useful to tell the model
to "pay more attention" to samples from an under-represented class.
• sample_weight: Optional Numpy array of weights for the training samples, used for weighting
the loss function (during training only). You can either pass a flat (1D) Numpy array with the
same length as the input samples (1:1 mapping between weights and samples), or in the case of
temporal data, you can pass a 2D array with shape (samples, sequence_length), to
apply a different weight to every timestep of every sample. In this case you should make sure to
specify sample_weight_mode="temporal" in compile(). This argument is not
supported when x generator, or Sequence instance, instead provide the sample_weights as the
third element of x.
• initial_epoch: Integer. Epoch at which to start training (useful for resuming a previous training
run).
• steps_per_epoch: Integer or None. Total number of steps (batches of samples) before declaring
one epoch finished and starting the next epoch. When training with input tensors such as
TensorFlow data tensors, the default None is equal to the number of samples in your dataset
divided by the batch size, or 1 if that cannot be determined.
• validation_steps: Only relevant if steps_per_epoch is specified. Total number of steps
(batches of samples) to validate before stopping.
• validation_steps: Only relevant if validation_data is provided and is a generator. Total
number of steps (batches of samples) to draw before stopping when performing validation at the
end of every epoch.
• validation_freq: Only relevant if validation data is provided. Integer or list/tuple/set. If an
integer, specifies how many training epochs to run before a new validation run is performed,
e.g. validation_freq=2 runs validation every 2 epochs. If a list, tuple, or set, specifies the
epochs on which to run validation, e.g. validation_freq=[1, 2, 10] runs validation at
the end of the 1st, 2nd, and 10th epochs.
• max_queue_size: Integer. Used for generator or keras.utils.Sequence input only.
Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
• workers: Integer. Used for generator or keras.utils.Sequence input only. Maximum
number of processes to spin up when using process-based threading. If unspecified, workers
will default to 1. If 0, will execute the generator on the main thread.
• use_multiprocessing: Boolean. Used for generator or keras.utils.Sequence input only.
If True, use process-based threading. If unspecified, use_multiprocessing will default
to False. Note that because this implementation relies on multiprocessing, you should not pass
non-picklable arguments to the generator as they can't be passed easily to children processes.
• **kwargs: Used for backwards compatibility.
Returns
A History object. Its History.history attribute is a record of training loss values and metrics
values at successive epochs, as well as validation loss values and validation metrics values (if
applicable).
Raises
• RuntimeError: If the model was never compiled.
• ValueError: In case of mismatch between the provided input data and what the model expects.
evaluate
evaluate(x=None, y=None, batch_size=None, verbose=1, sample_weight=None,
steps=None, callbacks=None, max_queue_size=10, workers=1,
use_multiprocessing=False)
Returns the loss value & metrics values for the model in test mode.
Computation is done in batches.
Arguments
• x: Input data. It could be:
• A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs).
• A dict mapping input names to the corresponding array/tensors, if the model has named
inputs.
• A generator or keras.utils.Sequence returning (inputs, targets) or
(inputs, targets, sample weights).
• None (default) if feeding from framework-native tensors (e.g. TensorFlow data tensors).
• y: Target data. Like the input data x, it could be either Numpy array(s), framework-native
tensor(s), list of Numpy arrays (if the model has multiple outputs) or None (default) if feeding
from framework-native tensors (e.g. TensorFlow data tensors). If output layers in the model are
named, you can also pass a dictionary mapping output names to Numpy arrays. If x is a
generator, or keras.utils.Sequence instance, y should not be specified (since targets
will be obtained from x).
• batch_size: Integer or None. Number of samples per gradient update. If unspecified,
batch_size will default to 32. Do not specify the batch_size is your data is in the form
of symbolic tensors, generators, or keras.utils.Sequence instances (since they generate
batches).
• verbose: 0 or 1. Verbosity mode. 0 = silent, 1 = progress bar.
• sample_weight: Optional Numpy array of weights for the test samples, used for weighting the
loss function. You can either pass a flat (1D) Numpy array with the same length as the input
samples (1:1 mapping between weights and samples), or in the case of temporal data, you can
pass a 2D array with shape (samples, sequence_length), to apply a different weight
to every timestep of every sample. In this case you should make sure to specify
sample_weight_mode="temporal" in compile().
• steps: Integer or None. Total number of steps (batches of samples) before declaring the
evaluation round finished. Ignored with the default value of None.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during evaluation. See callbacks.
• max_queue_size: Integer. Used for generator or keras.utils.Sequence input only.
Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
• workers: Integer. Used for generator or keras.utils.Sequence input only. Maximum
number of processes to spin up when using process-based threading. If unspecified, workers
will default to 1. If 0, will execute the generator on the main thread.
• use_multiprocessing: Boolean. Used for generator or keras.utils.Sequence input only.
If True, use process-based threading. If unspecified, use_multiprocessing will default
to False. Note that because this implementation relies on multiprocessing, you should not pass
non-picklable arguments to the generator as they can't be passed easily to children processes.
Raises
• ValueError: in case of invalid arguments.
Returns
Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has
multiple outputs and/or metrics). The attribute model.metrics_names will give you the display
labels for the scalar outputs.
predict
predict(x, batch_size=None, verbose=0, steps=None, callbacks=None,
max_queue_size=10, workers=1, use_multiprocessing=False)
train_on_batch
train_on_batch(x, y, sample_weight=None, class_weight=None, reset_metrics=True)
test_on_batch
test_on_batch(x, y, sample_weight=None, reset_metrics=True)
predict_on_batch
predict_on_batch(x)
fit_generator
fit_generator(generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None,
validation_data=None, validation_steps=None, validation_freq=1, class_weight=None,
max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True,
initial_epoch=0)
Trains the model on data generated batch-by-batch by a Python generator (or an instance of
Sequence).
The generator is run in parallel to the model, for efficiency. For instance, this allows you to do real-time
data augmentation on images on CPU in parallel to training your model on GPU.
The use of keras.utils.Sequence guarantees the ordering and guarantees the single use of
every input per epoch when using use_multiprocessing=True.
Arguments
• generator: A generator or an instance of Sequence (keras.utils.Sequence) object in
order to avoid duplicate data when using multiprocessing. The output of the generator must be
either
• a tuple (inputs, targets)
• a tuple (inputs, targets, sample_weights).
This tuple (a single output of the generator) makes a single batch. Therefore, all arrays in this
tuple must have the same length (equal to the size of this batch). Different batches may have
different sizes. For example, the last batch of the epoch is commonly smaller than the others, if
the size of the dataset is not divisible by the batch size. The generator is expected to loop over
its data indefinitely. An epoch finishes when steps_per_epoch batches have been seen by
the model.
• steps_per_epoch: Integer. Total number of steps (batches of samples) to yield from
generator before declaring one epoch finished and starting the next epoch. It should
typically be equal to ceil(num_samples / batch_size) Optional for Sequence: if
unspecified, will use the len(generator) as a number of steps.
• epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire
data provided, as defined by steps_per_epoch. Note that in conjunction with
initial_epoch, epochs is to be understood as "final epoch". The model is not trained for
a number of iterations given by epochs, but merely until the epoch of index epochs is
reached.
• verbose: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during training. See callbacks.
• validation_data: This can be either
• a generator or a Sequence object for the validation data
• tuple (x_val, y_val)
• tuple (x_val, y_val, val_sample_weights)
on which to evaluate the loss and any model metrics at the end of each epoch. The model will
not be trained on this data.
• validation_steps: Only relevant if validation_data is a generator. Total number of steps
(batches of samples) to yield from validation_data generator before stopping at the end
of every epoch. It should typically be equal to the number of samples of your validation dataset
divided by the batch size. Optional for Sequence: if unspecified, will use the
len(validation_data) as a number of steps.
model.fit_generator(generate_arrays_from_file('/my_file.txt'),
steps_per_epoch=10000, epochs=10)
evaluate_generator
evaluate_generator(generator, steps=None, callbacks=None, max_queue_size=10,
workers=1, use_multiprocessing=False, verbose=0)
Arguments
• generator: Generator yielding tuples (inputs, targets) or (inputs, targets, sample_weights) or an
instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using
multiprocessing.
• steps: Total number of steps (batches of samples) to yield from generator before stopping.
Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during training. See callbacks.
• max_queue_size: maximum size for the generator queue
• workers: Integer. Maximum number of processes to spin up when using process based
threading. If unspecified, workers will default to 1. If 0, will execute the generator on the
main thread.
• use_multiprocessing: if True, use process based threading. Note that because this
implementation relies on multiprocessing, you should not pass non picklable arguments to the
generator as they can't be passed easily to children processes.
• verbose: verbosity mode, 0 or 1.
Returns
Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has
multiple outputs and/or metrics). The attribute model.metrics_names will give you the display
labels for the scalar outputs.
Raises
• ValueError: In case the generator yields data in an invalid format.
predict_generator
predict_generator(generator, steps=None, callbacks=None, max_queue_size=10,
workers=1, use_multiprocessing=False, verbose=0)
Arguments
• generator: Generator yielding batches of input samples or an instance of Sequence
(keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing.
• steps: Total number of steps (batches of samples) to yield from generator before stopping.
Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during training. See callbacks.
• max_queue_size: Maximum size for the generator queue.
• workers: Integer. Maximum number of processes to spin up when using process based
threading. If unspecified, workers will default to 1. If 0, will execute the generator on the
main thread.
• use_multiprocessing: If True, use process based threading. Note that because this
implementation relies on multiprocessing, you should not pass non picklable arguments to the
generator as they can't be passed easily to children processes.
• verbose: verbosity mode, 0 or 1.
Returns
Numpy array(s) of predictions.
Raises
• ValueError: In case the generator yields data in an invalid format.
get_layer
get_layer(name=None, index=None)
a = Input(shape=(32,))
b = Dense(32)(a)
model = Model(inputs=a, outputs=b)
This model will include all layers required in the computation of b given a.
In the case of multi-input or multi-output models, you can use lists as well:
model = Model(inputs=[a1, a2], outputs=[b1, b2, b3])
For a detailed introduction of what Model can do, read this guide to the Keras functional API.
Methods
compile
compile(optimizer, loss=None, metrics=None, loss_weights=None,
sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
Configures the model for training.
Arguments
• optimizer: String (name of optimizer) or optimizer instance. See optimizers.
• loss: String (name of objective function) or objective function or Loss instance. See losses. If
the model has multiple outputs, you can use a different loss on each output by passing a
dictionary or a list of losses. The loss value that will be minimized by the model will then be the
sum of all individual losses.
• metrics: List of metrics to be evaluated by the model during training and testing. Typically you
will use metrics=['accuracy']. To specify different metrics for different outputs of a
multi-output model, you could also pass a dictionary, such as metrics={'output_a':
'accuracy', 'output_b': ['accuracy', 'mse']}. You can also pass a list (len
= len(outputs)) of lists of metrics such as metrics=[['accuracy'], ['accuracy',
'mse']] or metrics=['accuracy', ['accuracy', 'mse']].
• loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight
the loss contributions of different model outputs. The loss value that will be minimized by the
model will then be the weighted sum of all individual losses, weighted by the loss_weights
coefficients. If a list, it is expected to have a 1:1 mapping to the model's outputs. If a dict, it is
expected to map output names (strings) to scalar coefficients.
• sample_weight_mode: If you need to do timestep-wise sample weighting (2D weights), set this
to "temporal". None defaults to sample-wise weights (1D). If the model has multiple
outputs, you can use a different sample_weight_mode on each output by passing a
dictionary or a list of modes.
• weighted_metrics: List of metrics to be evaluated and weighted by sample_weight or
class_weight during training and testing.
• target_tensors: By default, Keras will create placeholders for the model's target, which will be
fed with the target data during training. If instead you would like to use your own target tensors
(in turn, Keras will not expect external Numpy data for these targets at training time), you can
specify them via the target_tensors argument. It can be a single tensor (for a single-
output model), a list of tensors, or a dict mapping output names to target tensors.
• **kwargs: When using the Theano/CNTK backends, these arguments are passed into
K.function. When using the TensorFlow backend, these arguments are passed into
tf.Session.run.
Raises
• ValueError: In case of invalid arguments for optimizer, loss, metrics or
sample_weight_mode.
fit
fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None,
validation_split=0.0, validation_data=None, shuffle=True, class_weight=None,
sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None,
validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False)
• shuffle: Boolean (whether to shuffle the training data before each epoch) or str (for 'batch').
'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-
sized chunks. Has no effect when steps_per_epoch is not None.
• class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value,
used for weighting the loss function (during training only). This can be useful to tell the model
to "pay more attention" to samples from an under-represented class.
• sample_weight: Optional Numpy array of weights for the training samples, used for weighting
the loss function (during training only). You can either pass a flat (1D) Numpy array with the
same length as the input samples (1:1 mapping between weights and samples), or in the case of
temporal data, you can pass a 2D array with shape (samples, sequence_length), to
apply a different weight to every timestep of every sample. In this case you should make sure to
specify sample_weight_mode="temporal" in compile(). This argument is not
supported when x generator, or Sequence instance, instead provide the sample_weights as the
third element of x.
• initial_epoch: Integer. Epoch at which to start training (useful for resuming a previous training
run).
• steps_per_epoch: Integer or None. Total number of steps (batches of samples) before declaring
one epoch finished and starting the next epoch. When training with input tensors such as
TensorFlow data tensors, the default None is equal to the number of samples in your dataset
divided by the batch size, or 1 if that cannot be determined.
• validation_steps: Only relevant if steps_per_epoch is specified. Total number of steps
(batches of samples) to validate before stopping.
• validation_steps: Only relevant if validation_data is provided and is a generator. Total
number of steps (batches of samples) to draw before stopping when performing validation at the
end of every epoch.
• validation_freq: Only relevant if validation data is provided. Integer or list/tuple/set. If an
integer, specifies how many training epochs to run before a new validation run is performed,
e.g. validation_freq=2 runs validation every 2 epochs. If a list, tuple, or set, specifies the
epochs on which to run validation, e.g. validation_freq=[1, 2, 10] runs validation at
the end of the 1st, 2nd, and 10th epochs.
• max_queue_size: Integer. Used for generator or keras.utils.Sequence input only.
Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
• workers: Integer. Used for generator or keras.utils.Sequence input only. Maximum
number of processes to spin up when using process-based threading. If unspecified, workers
will default to 1. If 0, will execute the generator on the main thread.
• use_multiprocessing: Boolean. Used for generator or keras.utils.Sequence input only.
If True, use process-based threading. If unspecified, use_multiprocessing will default
to False. Note that because this implementation relies on multiprocessing, you should not pass
non-picklable arguments to the generator as they can't be passed easily to children processes.
• **kwargs: Used for backwards compatibility.
Returns
A History object. Its History.history attribute is a record of training loss values and metrics
values at successive epochs, as well as validation loss values and validation metrics values (if
applicable).
Raises
• RuntimeError: If the model was never compiled.
• ValueError: In case of mismatch between the provided input data and what the model expects.
evaluate
evaluate(x=None, y=None, batch_size=None, verbose=1, sample_weight=None,
steps=None, callbacks=None, max_queue_size=10, workers=1,
use_multiprocessing=False)
Returns the loss value & metrics values for the model in test mode.
Computation is done in batches.
Arguments
• x: Input data. It could be:
• A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs).
• A dict mapping input names to the corresponding array/tensors, if the model has named
inputs.
• A generator or keras.utils.Sequence returning (inputs, targets) or
(inputs, targets, sample weights).
• None (default) if feeding from framework-native tensors (e.g. TensorFlow data tensors).
• y: Target data. Like the input data x, it could be either Numpy array(s), framework-native
tensor(s), list of Numpy arrays (if the model has multiple outputs) or None (default) if feeding
from framework-native tensors (e.g. TensorFlow data tensors). If output layers in the model are
named, you can also pass a dictionary mapping output names to Numpy arrays. If x is a
generator, or keras.utils.Sequence instance, y should not be specified (since targets
will be obtained from x).
• batch_size: Integer or None. Number of samples per gradient update. If unspecified,
batch_size will default to 32. Do not specify the batch_size is your data is in the form
of symbolic tensors, generators, or keras.utils.Sequence instances (since they generate
batches).
• verbose: 0 or 1. Verbosity mode. 0 = silent, 1 = progress bar.
• sample_weight: Optional Numpy array of weights for the test samples, used for weighting the
loss function. You can either pass a flat (1D) Numpy array with the same length as the input
samples (1:1 mapping between weights and samples), or in the case of temporal data, you can
pass a 2D array with shape (samples, sequence_length), to apply a different weight
to every timestep of every sample. In this case you should make sure to specify
sample_weight_mode="temporal" in compile().
• steps: Integer or None. Total number of steps (batches of samples) before declaring the
evaluation round finished. Ignored with the default value of None.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during evaluation. See callbacks.
• max_queue_size: Integer. Used for generator or keras.utils.Sequence input only.
Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
• workers: Integer. Used for generator or keras.utils.Sequence input only. Maximum
number of processes to spin up when using process-based threading. If unspecified, workers
will default to 1. If 0, will execute the generator on the main thread.
• use_multiprocessing: Boolean. Used for generator or keras.utils.Sequence input only.
If True, use process-based threading. If unspecified, use_multiprocessing will default
to False. Note that because this implementation relies on multiprocessing, you should not pass
non-picklable arguments to the generator as they can't be passed easily to children processes.
Raises
• ValueError: in case of invalid arguments.
Returns
Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has
multiple outputs and/or metrics). The attribute model.metrics_names will give you the display
labels for the scalar outputs.
predict
predict(x, batch_size=None, verbose=0, steps=None, callbacks=None,
max_queue_size=10, workers=1, use_multiprocessing=False)
train_on_batch
train_on_batch(x, y, sample_weight=None, class_weight=None, reset_metrics=True)
test_on_batch
test_on_batch(x, y, sample_weight=None, reset_metrics=True)
predict_on_batch
predict_on_batch(x)
fit_generator
fit_generator(generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None,
validation_data=None, validation_steps=None, validation_freq=1, class_weight=None,
max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True,
initial_epoch=0)
Trains the model on data generated batch-by-batch by a Python generator (or an instance of
Sequence).
The generator is run in parallel to the model, for efficiency. For instance, this allows you to do real-time
data augmentation on images on CPU in parallel to training your model on GPU.
The use of keras.utils.Sequence guarantees the ordering and guarantees the single use of
every input per epoch when using use_multiprocessing=True.
Arguments
• generator: A generator or an instance of Sequence (keras.utils.Sequence) object in
order to avoid duplicate data when using multiprocessing. The output of the generator must be
either
• a tuple (inputs, targets)
• a tuple (inputs, targets, sample_weights).
This tuple (a single output of the generator) makes a single batch. Therefore, all arrays in this
tuple must have the same length (equal to the size of this batch). Different batches may have
different sizes. For example, the last batch of the epoch is commonly smaller than the others, if
the size of the dataset is not divisible by the batch size. The generator is expected to loop over
its data indefinitely. An epoch finishes when steps_per_epoch batches have been seen by
the model.
• steps_per_epoch: Integer. Total number of steps (batches of samples) to yield from
generator before declaring one epoch finished and starting the next epoch. It should
typically be equal to ceil(num_samples / batch_size) Optional for Sequence: if
unspecified, will use the len(generator) as a number of steps.
• epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire
data provided, as defined by steps_per_epoch. Note that in conjunction with
initial_epoch, epochs is to be understood as "final epoch". The model is not trained for
a number of iterations given by epochs, but merely until the epoch of index epochs is
reached.
• verbose: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during training. See callbacks.
• validation_data: This can be either
• a generator or a Sequence object for the validation data
• tuple (x_val, y_val)
• tuple (x_val, y_val, val_sample_weights)
on which to evaluate the loss and any model metrics at the end of each epoch. The model will
not be trained on this data.
• validation_steps: Only relevant if validation_data is a generator. Total number of steps
(batches of samples) to yield from validation_data generator before stopping at the end
of every epoch. It should typically be equal to the number of samples of your validation dataset
divided by the batch size. Optional for Sequence: if unspecified, will use the
len(validation_data) as a number of steps.
model.fit_generator(generate_arrays_from_file('/my_file.txt'),
steps_per_epoch=10000, epochs=10)
evaluate_generator
evaluate_generator(generator, steps=None, callbacks=None, max_queue_size=10,
workers=1, use_multiprocessing=False, verbose=0)
Evaluates the model on a data generator.
The generator should return the same kind of data as accepted by test_on_batch.
Arguments
• generator: Generator yielding tuples (inputs, targets) or (inputs, targets, sample_weights) or an
instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using
multiprocessing.
• steps: Total number of steps (batches of samples) to yield from generator before stopping.
Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during training. See callbacks.
• max_queue_size: maximum size for the generator queue
• workers: Integer. Maximum number of processes to spin up when using process based
threading. If unspecified, workers will default to 1. If 0, will execute the generator on the
main thread.
• use_multiprocessing: if True, use process based threading. Note that because this
implementation relies on multiprocessing, you should not pass non picklable arguments to the
generator as they can't be passed easily to children processes.
• verbose: verbosity mode, 0 or 1.
Returns
Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has
multiple outputs and/or metrics). The attribute model.metrics_names will give you the display
labels for the scalar outputs.
Raises
• ValueError: In case the generator yields data in an invalid format.
predict_generator
predict_generator(generator, steps=None, callbacks=None, max_queue_size=10,
workers=1, use_multiprocessing=False, verbose=0)
Arguments
• generator: Generator yielding batches of input samples or an instance of Sequence
(keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing.
• steps: Total number of steps (batches of samples) to yield from generator before stopping.
Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during training. See callbacks.
• max_queue_size: Maximum size for the generator queue.
• workers: Integer. Maximum number of processes to spin up when using process based
threading. If unspecified, workers will default to 1. If 0, will execute the generator on the
main thread.
• use_multiprocessing: If True, use process based threading. Note that because this
implementation relies on multiprocessing, you should not pass non picklable arguments to the
generator as they can't be passed easily to children processes.
• verbose: verbosity mode, 0 or 1.
Returns
Numpy array(s) of predictions.
Raises
• ValueError: In case the generator yields data in an invalid format.
get_layer
get_layer(name=None, index=None)
Or:
from keras import layers
config = layer.get_config()
layer = layers.deserialize({'class_name': layer.__class__.__name__,
'config': config})
If a layer has a single node (i.e. if it isn't a shared layer), you can get its input tensor, output tensor,
input shape and output shape via:
• layer.input
• layer.output
• layer.input_shape
• layer.output_shape
If the layer has multiple nodes (see: the concept of layer node and shared layers), you can use the
following methods:
• layer.get_input_at(node_index)
• layer.get_output_at(node_index)
• layer.get_input_shape_at(node_index)
• layer.get_output_shape_at(node_index)
Dense
keras.layers.Dense(units, activation=None, use_bias=True,
kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product
with kernel.
Example
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)
Arguments
• units: Positive integer, dimensionality of the output space.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel weights matrix (see
constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
nD tensor with shape: (batch_size, ..., input_dim). The most common situation would be
a 2D input with shape (batch_size, input_dim).
Output shape
nD tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape
(batch_size, input_dim), the output would have shape (batch_size, units).
Activation
keras.layers.Activation(activation)
Dropout
keras.layers.Dropout(rate, noise_shape=None, seed=None)
Flatten
keras.layers.Flatten(data_format=None)
model.add(Flatten())
# now: model.output_shape == (None, 65536)
Input
keras.engine.input_layer.Input()
A Keras tensor is a tensor object from the underlying backend (Theano, TensorFlow or CNTK), which
we augment with certain attributes that allow us to build a Keras model just by knowing the inputs and
outputs of the model.
For instance, if a, b and c are Keras tensors, it becomes possible to do: model =
Model(input=[a, b], output=c)
The added Keras attributes are: _keras_shape: Integer shape tuple propagated via Keras-side shape
inference. _keras_history: Last layer applied to the tensor. the entire layer graph is retrievable
from that layer, recursively.
Arguments
• shape: A shape tuple (integer), not including the batch size. For instance, shape=(32,)
indicates that the expected input will be batches of 32-dimensional vectors.
• batch_shape: A shape tuple (integer), including the batch size. For instance,
batch_shape=(10, 32) indicates that the expected input will be batches of 10 32-
dimensional vectors. batch_shape=(None, 32) indicates batches of an arbitrary number
of 32-dimensional vectors.
• name: An optional name string for the layer. Should be unique in a model (do not reuse the
same name twice). It will be autogenerated if it isn't provided.
• dtype: The data type expected by the input, as a string (float32, float64, int32...)
• sparse: A boolean specifying whether the placeholder to be created is sparse.
• tensor: Optional existing tensor to wrap into the Input layer. If set, the layer will not create a
placeholder tensor.
Returns
A tensor.
Example
# this is a logistic regression in Keras
x = Input(shape=(32,))
y = Dense(16, activation='softmax')(x)
model = Model(x, y)
Reshape
keras.layers.Reshape(target_shape)
Example
# as first layer in a Sequential model
model = Sequential()
model.add(Reshape((3, 4), input_shape=(12,)))
# now: model.output_shape == (None, 3, 4)
# note: `None` is the batch dimension
Permute
keras.layers.Permute(dims)
Arguments
• dims: Tuple of integers. Permutation pattern, does not include the samples dimension. Indexing
starts at 1. For instance, (2, 1) permutes the first and second dimension of the input.
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
Output shape
Same as the input shape, but with the dimensions re-ordered according to the specified pattern.
RepeatVector
keras.layers.RepeatVector(n)
model.add(RepeatVector(3))
# now: model.output_shape == (None, 3, 32)
Arguments
• n: integer, repetition factor.
Input shape
2D tensor of shape (num_samples, features).
Output shape
3D tensor of shape (num_samples, n, features).
Lambda
keras.layers.Lambda(function, output_shape=None, mask=None, arguments=None)
Examples
# add a x -> x^2 layer
model.add(Lambda(lambda x: x ** 2))
def antirectifier(x):
x -= K.mean(x, axis=1, keepdims=True)
x = K.l2_normalize(x, axis=1)
pos = K.relu(x)
neg = K.relu(-x)
return K.concatenate([pos, neg], axis=1)
def antirectifier_output_shape(input_shape):
shape = list(input_shape)
assert len(shape) == 2 # only valid for 2D tensors
shape[-1] *= 2
return tuple(shape)
model.add(Lambda(antirectifier,
output_shape=antirectifier_output_shape))
def hadamard_product_sum(tensors):
out1 = tensors[0] * tensors[1]
out2 = K.sum(out1, axis=-1)
return [out1, out2]
def hadamard_product_sum_output_shape(input_shapes):
shape1 = list(input_shapes[0])
shape2 = list(input_shapes[1])
assert shape1 == shape2 # else hadamard product isn't possible
return [tuple(shape1), tuple(shape2[:-1])]
x1 = Dense(32)(input_1)
x2 = Dense(32)(input_2)
layer = Lambda(hadamard_product_sum, hadamard_product_sum_output_shape)
x_hadamard, x_sum = layer([x1, x2])
Arguments
• function: The function to be evaluated. Takes input tensor or list of tensors as first argument.
• output_shape: Expected output shape from function. Only relevant when using Theano. Can be
a tuple or function. If a tuple, it only specifies the first dimension onward; sample dimension is
assumed either the same as the input: output_shape = (input_shape[0], ) +
output_shape or, the input is None and the sample dimension is also None:
output_shape = (None, ) + output_shape If a function, it specifies the entire
shape as a function of the input shape: output_shape = f(input_shape)
• mask: Either None (indicating no masking) or a Tensor indicating the input mask for
Embedding.
• arguments: optional dictionary of keyword arguments to be passed to the function.
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
Output shape
Specified by output_shape argument (or auto-inferred when using TensorFlow or CNTK).
ActivityRegularization
keras.layers.ActivityRegularization(l1=0.0, l2=0.0)
Layer that applies an update to the cost function based input activity.
Arguments
• l1: L1 regularization factor (positive float).
• l2: L2 regularization factor (positive float).
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
Output shape
Same shape as input.
Masking
keras.layers.Masking(mask_value=0.0)
Arguments
• mask_value: Either None or mask value to skip
SpatialDropout1D
keras.layers.SpatialDropout1D(rate)
Output shape
Same as input
References
• Efficient Object Localization Using Convolutional Networks
SpatialDropout2D
keras.layers.SpatialDropout2D(rate, data_format=None)
Output shape
Same as input
References
• Efficient Object Localization Using Convolutional Networks
SpatialDropout3D
keras.layers.SpatialDropout3D(rate, data_format=None)
Output shape
Same as input
References
• Efficient Object Localization Using Convolutional Networks
Conv1D
keras.layers.Conv1D(filters, kernel_size, strides=1, padding='valid',
data_format='channels_last', dilation_rate=1, activation=None, use_bias=True,
kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
When using this layer as the first layer in a model, provide an input_shape argument (tuple of
integers or None, does not include the batch axis), e.g. input_shape=(10, 128) for time series
sequences of 10 time steps with 128 features per step in data_format="channels_last", or
(None, 128) for variable-length sequences with 128 features per step.
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the
convolution).
• kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D
convolution window.
• strides: An integer or tuple/list of a single integer, specifying the stride length of the
convolution. Specifying any stride value != 1 is incompatible with specifying any
dilation_rate value != 1.
• padding: One of "valid", "causal" or "same" (case-insensitive). "valid" means "no
padding". "same" results in padding the input such that the output has the same length as the
original input. "causal" results in causal (dilated) convolutions, e.g. output[t] does not
depend on input[t + 1:]. A zero padding is used such that the output has the same length
as the original input. Useful when modeling temporal data where the model should not violate
the temporal order. See WaveNet: A Generative Model for Raw Audio, section 2.1.
• data_format: A string, one of "channels_last" (default) or "channels_first". The
ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with
shape (batch, steps, channels) (default format for temporal data in Keras) while
"channels_first" corresponds to inputs with shape (batch, channels, steps).
• dilation_rate: an integer or tuple/list of a single integer, specifying the dilation rate to use for
dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible
with specifying any strides value != 1.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
3D tensor with shape: (batch, steps, channels)
Output shape
3D tensor with shape: (batch, new_steps, filters) steps value might have changed due
to padding or strides.
Conv2D
keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True,
kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
When using this layer as the first layer in a model, provide the keyword argument input_shape
(tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 3) for
128x128 RGB pictures in data_format="channels_last".
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the
convolution).
• kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D
convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
• strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the
height and width. Can be a single integer to specify the same value for all spatial dimensions.
Specifying any stride value != 1 is incompatible with specifying any dilation_rate value !
= 1.
• padding: one of "valid" or "same" (case-insensitive). Note that "same" is slightly
inconsistent across backends with strides != 1, as described here
• data_format: A string, one of "channels_last" or "channels_first". The ordering
of the dimensions in the inputs. "channels_last" corresponds to inputs with shape
(batch, height, width, channels) while "channels_first" corresponds to
inputs with shape (batch, channels, height, width). It defaults to the
image_data_format value found in your Keras config file at ~/.keras/keras.json.
If you never set it, then it will be "channels_last".
• dilation_rate: an integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated
convolution. Can be a single integer to specify the same value for all spatial dimensions.
Currently, specifying any dilation_rate value != 1 is incompatible with specifying any
stride value != 1.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
4D tensor with shape: (batch, channels, rows, cols) if data_format is
"channels_first" or 4D tensor with shape: (batch, rows, cols, channels) if
data_format is "channels_last".
Output shape
4D tensor with shape: (batch, filters, new_rows, new_cols) if data_format is
"channels_first" or 4D tensor with shape: (batch, new_rows, new_cols, filters)
if data_format is "channels_last". rows and cols values might have changed due to
padding.
SeparableConv1D
keras.layers.SeparableConv1D(filters, kernel_size, strides=1, padding='valid',
data_format='channels_last', dilation_rate=1, depth_multiplier=1, activation=None,
use_bias=True, depthwise_initializer='glorot_uniform',
pointwise_initializer='glorot_uniform', bias_initializer='zeros',
depthwise_regularizer=None, pointwise_regularizer=None, bias_regularizer=None,
activity_regularizer=None, depthwise_constraint=None, pointwise_constraint=None,
bias_constraint=None)
SeparableConv2D
keras.layers.SeparableConv2D(filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), depth_multiplier=1, activation=None,
use_bias=True, depthwise_initializer='glorot_uniform',
pointwise_initializer='glorot_uniform', bias_initializer='zeros',
depthwise_regularizer=None, pointwise_regularizer=None, bias_regularizer=None,
activity_regularizer=None, depthwise_constraint=None, pointwise_constraint=None,
bias_constraint=None)
Output shape
4D tensor with shape: (batch, filters, new_rows, new_cols) if data_format is
"channels_first" or 4D tensor with shape: (batch, new_rows, new_cols, filters)
if data_format is "channels_last". rows and cols values might have changed due to
padding.
DepthwiseConv2D
keras.layers.DepthwiseConv2D(kernel_size, strides=(1, 1), padding='valid',
depth_multiplier=1, data_format=None, dilation_rate=(1, 1), activation=None,
use_bias=True, depthwise_initializer='glorot_uniform', bias_initializer='zeros',
depthwise_regularizer=None, bias_regularizer=None, activity_regularizer=None,
depthwise_constraint=None, bias_constraint=None)
Depthwise 2D convolution.
Depthwise convolution performs just the first step of a depthwise spatial convolution (which acts on
each input channel separately). The depth_multiplier argument controls how many output
channels are generated per input channel in the depthwise step.
Arguments
• kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D
convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
• strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the
height and width. Can be a single integer to specify the same value for all spatial dimensions.
Specifying any stride value != 1 is incompatible with specifying any dilation_rate value !
= 1.
• padding: one of "valid" or "same" (case-insensitive).
• depth_multiplier: The number of depthwise convolution output channels for each input
channel. The total number of depthwise convolution output channels will be equal to
filters_in * depth_multiplier.
• data_format: A string, one of "channels_last" or "channels_first". The ordering
of the dimensions in the inputs. "channels_last" corresponds to inputs with shape
(batch, height, width, channels) while "channels_first" corresponds to
inputs with shape (batch, channels, height, width). It defaults to the
image_data_format value found in your Keras config file at ~/.keras/keras.json.
If you never set it, then it will be 'channels_last'.
• dilation_rate: an integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated
convolution. Can be a single integer to specify the same value for all spatial dimensions.
Currently, specifying any dilation_rate value != 1 is incompatible with specifying any
stride value != 1.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. 'linear' activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• depthwise_initializer: Initializer for the depthwise kernel matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• depthwise_regularizer: Regularizer function applied to the depthwise kernel matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its 'activation').
(see regularizer).
• depthwise_constraint: Constraint function applied to the depthwise kernel matrix (see
constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
4D tensor with shape: (batch, channels, rows, cols) if data_format is
"channels_first" or 4D tensor with shape: (batch, rows, cols, channels) if
data_format is "channels_last".
Output shape
4D tensor with shape: (batch, channels * depth_multiplier, new_rows,
new_cols) if data_format is "channels_first" or 4D tensor with shape: (batch,
new_rows, new_cols, channels * depth_multiplier) if data_format is
"channels_last". rows and cols values might have changed due to padding.
Conv2DTranspose
keras.layers.Conv2DTranspose(filters, kernel_size, strides=(1, 1), padding='valid',
output_padding=None, data_format=None, dilation_rate=(1, 1), activation=None,
use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the
convolution).
• kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D
convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
• strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the
height and width. Can be a single integer to specify the same value for all spatial dimensions.
Specifying any stride value != 1 is incompatible with specifying any dilation_rate value !
= 1.
• padding: one of "valid" or "same" (case-insensitive).
• output_padding: An integer or tuple/list of 2 integers, specifying the amount of padding along
the height and width of the output tensor. Can be a single integer to specify the same value for
all spatial dimensions. The amount of output padding along a given dimension must be lower
than the stride along that same dimension. If set to None (default), the output shape is inferred.
• data_format: A string, one of "channels_last" or "channels_first". The ordering
of the dimensions in the inputs. "channels_last" corresponds to inputs with shape
(batch, height, width, channels) while "channels_first" corresponds to
inputs with shape (batch, channels, height, width). It defaults to the
image_data_format value found in your Keras config file at ~/.keras/keras.json.
If you never set it, then it will be "channels_last".
• dilation_rate: an integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated
convolution. Can be a single integer to specify the same value for all spatial dimensions.
Currently, specifying any dilation_rate value != 1 is incompatible with specifying any
stride value != 1.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
4D tensor with shape: (batch, channels, rows, cols) if data_format is
"channels_first" or 4D tensor with shape: (batch, rows, cols, channels) if
data_format is "channels_last".
Output shape
4D tensor with shape: (batch, filters, new_rows, new_cols) if data_format is
"channels_first" or 4D tensor with shape: (batch, new_rows, new_cols, filters)
if data_format is "channels_last". rows and cols values might have changed due to
padding. If output_padding is specified:
new_rows = ((rows - 1) * strides[0] + kernel_size[0]
- 2 * padding[0] + output_padding[0])
new_cols = ((cols - 1) * strides[1] + kernel_size[1]
- 2 * padding[1] + output_padding[1])
References
• A guide to convolution arithmetic for deep learning
• Deconvolutional Networks
Conv3D
keras.layers.Conv3D(filters, kernel_size, strides=(1, 1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1, 1), activation=None, use_bias=True,
kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
When using this layer as the first layer in a model, provide the keyword argument input_shape
(tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 128, 1)
for 128x128x128 volumes with a single channel, in data_format="channels_last".
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the
convolution).
• kernel_size: An integer or tuple/list of 3 integers, specifying the depth, height and width of the
3D convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
• strides: An integer or tuple/list of 3 integers, specifying the strides of the convolution along
each spatial dimension. Can be a single integer to specify the same value for all spatial
dimensions. Specifying any stride value != 1 is incompatible with specifying any
dilation_rate value != 1.
• padding: one of "valid" or "same" (case-insensitive).
• data_format: A string, one of "channels_last" or "channels_first". The ordering
of the dimensions in the inputs. "channels_last" corresponds to inputs with shape
(batch, spatial_dim1, spatial_dim2, spatial_dim3, channels) while
"channels_first" corresponds to inputs with shape (batch, channels,
spatial_dim1, spatial_dim2, spatial_dim3). It defaults to the
image_data_format value found in your Keras config file at ~/.keras/keras.json.
If you never set it, then it will be "channels_last".
• dilation_rate: an integer or tuple/list of 3 integers, specifying the dilation rate to use for dilated
convolution. Can be a single integer to specify the same value for all spatial dimensions.
Currently, specifying any dilation_rate value != 1 is incompatible with specifying any
stride value != 1.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
5D tensor with shape: (batch, channels, conv_dim1, conv_dim2, conv_dim3) if
data_format is "channels_first" or 5D tensor with shape: (batch, conv_dim1,
conv_dim2, conv_dim3, channels) if data_format is "channels_last".
Output shape
5D tensor with shape: (batch, filters, new_conv_dim1, new_conv_dim2,
new_conv_dim3) if data_format is "channels_first" or 5D tensor with shape: (batch,
new_conv_dim1, new_conv_dim2, new_conv_dim3, filters) if data_format is
"channels_last". new_conv_dim1, new_conv_dim2 and new_conv_dim3 values might
have changed due to padding.
[source]
Conv3DTranspose
keras.layers.Conv3DTranspose(filters, kernel_size, strides=(1, 1, 1),
padding='valid', output_padding=None, data_format=None, activation=None,
use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the
convolution).
• kernel_size: An integer or tuple/list of 3 integers, specifying the depth, height and width of the
3D convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
• strides: An integer or tuple/list of 3 integers, specifying the strides of the convolution along the
depth, height and width. Can be a single integer to specify the same value for all spatial
dimensions. Specifying any stride value != 1 is incompatible with specifying any
dilation_rate value != 1.
• padding: one of "valid" or "same" (case-insensitive).
• output_padding: An integer or tuple/list of 3 integers, specifying the amount of padding along
the depth, height, and width. Can be a single integer to specify the same value for all spatial
dimensions. The amount of output padding along a given dimension must be lower than the
stride along that same dimension. If set to None (default), the output shape is inferred.
• data_format: A string, one of "channels_last" or "channels_first". The ordering
of the dimensions in the inputs. "channels_last" corresponds to inputs with shape
(batch, depth, height, width, channels) while "channels_first"
corresponds to inputs with shape (batch, channels, depth, height, width). It
defaults to the image_data_format value found in your Keras config file at
~/.keras/keras.json. If you never set it, then it will be "channels_last".
• dilation_rate: an integer or tuple/list of 3 integers, specifying the dilation rate to use for dilated
convolution. Can be a single integer to specify the same value for all spatial dimensions.
Currently, specifying any dilation_rate value != 1 is incompatible with specifying any
stride value != 1.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
5D tensor with shape: (batch, channels, depth, rows, cols) if data_format is
"channels_first" or 5D tensor with shape: (batch, depth, rows, cols, channels)
if data_format is "channels_last".
Output shape
5D tensor with shape: (batch, filters, new_depth, new_rows, new_cols) if
data_format is "channels_first" or 5D tensor with shape: (batch, new_depth,
new_rows, new_cols, filters) if data_format is "channels_last". depth and
rows and cols values might have changed due to padding. If output_padding is specified::
new_depth = ((depth - 1) * strides[0] + kernel_size[0]
- 2 * padding[0] + output_padding[0])
new_rows = ((rows - 1) * strides[1] + kernel_size[1]
- 2 * padding[1] + output_padding[1])
new_cols = ((cols - 1) * strides[2] + kernel_size[2]
- 2 * padding[2] + output_padding[2])
References
• A guide to convolution arithmetic for deep learning
• Deconvolutional Networks
[source]
Cropping1D
keras.layers.Cropping1D(cropping=(1, 1))
Output shape
3D tensor with shape (batch, cropped_axis, features)
[source]
Cropping2D
keras.layers.Cropping2D(cropping=((0, 0), (0, 0)), data_format=None)
Output shape
4D tensor with shape: - If data_format is "channels_last": (batch, cropped_rows,
cropped_cols, channels) - If data_format is "channels_first": (batch,
channels, cropped_rows, cropped_cols)
Examples
# Crop the input 2D images or feature maps
model = Sequential()
model.add(Cropping2D(cropping=((2, 2), (4, 4)),
input_shape=(28, 28, 3)))
# now model.output_shape == (None, 24, 20, 3)
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Cropping2D(cropping=((2, 2), (2, 2))))
# now model.output_shape == (None, 20, 16, 64)
[source]
Cropping3D
keras.layers.Cropping3D(cropping=((1, 1), (1, 1), (1, 1)), data_format=None)
Output shape
5D tensor with shape: - If data_format is "channels_last": (batch,
first_cropped_axis, second_cropped_axis, third_cropped_axis, depth) -
If data_format is "channels_first": (batch, depth, first_cropped_axis,
second_cropped_axis, third_cropped_axis)
[source]
UpSampling1D
keras.layers.UpSampling1D(size=2)
Arguments
• size: integer. Upsampling factor.
Input shape
3D tensor with shape: (batch, steps, features).
Output shape
3D tensor with shape: (batch, upsampled_steps, features).
[source]
UpSampling2D
keras.layers.UpSampling2D(size=(2, 2), data_format=None, interpolation='nearest')
Output shape
4D tensor with shape: - If data_format is "channels_last": (batch,
upsampled_rows, upsampled_cols, channels) - If data_format is
"channels_first": (batch, channels, upsampled_rows, upsampled_cols)
[source]
UpSampling3D
keras.layers.UpSampling3D(size=(2, 2, 2), data_format=None)
Output shape
5D tensor with shape: - If data_format is "channels_last": (batch,
upsampled_dim1, upsampled_dim2, upsampled_dim3, channels) - If
data_format is "channels_first": (batch, channels, upsampled_dim1,
upsampled_dim2, upsampled_dim3)
[source]
ZeroPadding1D
keras.layers.ZeroPadding1D(padding=1)
Input shape
3D tensor with shape (batch, axis_to_pad, features)
Output shape
3D tensor with shape (batch, padded_axis, features)
[source]
ZeroPadding2D
keras.layers.ZeroPadding2D(padding=(1, 1), data_format=None)
Output shape
4D tensor with shape: - If data_format is "channels_last": (batch, padded_rows,
padded_cols, channels) - If data_format is "channels_first": (batch,
channels, padded_rows, padded_cols)
[source]
ZeroPadding3D
keras.layers.ZeroPadding3D(padding=(1, 1, 1), data_format=None)
Output shape
5D tensor with shape: - If data_format is "channels_last": (batch,
first_padded_axis, second_padded_axis, third_axis_to_pad, depth) - If
data_format is "channels_first": (batch, depth, first_padded_axis,
second_padded_axis, third_axis_to_pad)
MaxPooling1D
keras.layers.MaxPooling1D(pool_size=2, strides=None, padding='valid',
data_format='channels_last')
Input shape
• If data_format='channels_last': 3D tensor with shape: (batch_size, steps,
features)
• If data_format='channels_first': 3D tensor with shape: (batch_size,
features, steps)
Output shape
• If data_format='channels_last': 3D tensor with shape: (batch_size,
downsampled_steps, features)
• If data_format='channels_first': 3D tensor with shape: (batch_size,
features, downsampled_steps)
MaxPooling2D
keras.layers.MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid',
data_format=None)
Output shape
• If data_format='channels_last': 4D tensor with shape: (batch_size,
pooled_rows, pooled_cols, channels)
• If data_format='channels_first': 4D tensor with shape: (batch_size,
channels, pooled_rows, pooled_cols)
MaxPooling3D
keras.layers.MaxPooling3D(pool_size=(2, 2, 2), strides=None, padding='valid',
data_format=None)
Output shape
• If data_format='channels_last': 5D tensor with shape: (batch_size,
pooled_dim1, pooled_dim2, pooled_dim3, channels)
• If data_format='channels_first': 5D tensor with shape: (batch_size,
channels, pooled_dim1, pooled_dim2, pooled_dim3)
AveragePooling1D
keras.layers.AveragePooling1D(pool_size=2, strides=None, padding='valid',
data_format='channels_last')
Output shape
• If data_format='channels_last': 3D tensor with shape: (batch_size,
downsampled_steps, features)
• If data_format='channels_first': 3D tensor with shape: (batch_size,
features, downsampled_steps)
AveragePooling2D
keras.layers.AveragePooling2D(pool_size=(2, 2), strides=None, padding='valid',
data_format=None)
Output shape
• If data_format='channels_last': 4D tensor with shape: (batch_size,
pooled_rows, pooled_cols, channels)
• If data_format='channels_first': 4D tensor with shape: (batch_size,
channels, pooled_rows, pooled_cols)
AveragePooling3D
keras.layers.AveragePooling3D(pool_size=(2, 2, 2), strides=None, padding='valid',
data_format=None)
Output shape
• If data_format='channels_last': 5D tensor with shape: (batch_size,
pooled_dim1, pooled_dim2, pooled_dim3, channels)
• If data_format='channels_first': 5D tensor with shape: (batch_size,
channels, pooled_dim1, pooled_dim2, pooled_dim3)
GlobalMaxPooling1D
keras.layers.GlobalMaxPooling1D(data_format='channels_last')
Global max pooling operation for temporal data.
Arguments
• data_format: A string, one of channels_last (default) or channels_first. The
ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape
(batch, steps, features) while channels_first corresponds to inputs with
shape (batch, features, steps).
Input shape
• If data_format='channels_last': 3D tensor with shape: (batch_size, steps,
features)
• If data_format='channels_first': 3D tensor with shape: (batch_size,
features, steps)
Output shape
2D tensor with shape: (batch_size, features)
GlobalAveragePooling1D
keras.layers.GlobalAveragePooling1D(data_format='channels_last')
Input shape
• If data_format='channels_last': 3D tensor with shape: (batch_size, steps,
features)
• If data_format='channels_first': 3D tensor with shape: (batch_size,
features, steps)
Output shape
2D tensor with shape: (batch_size, features)
GlobalMaxPooling2D
keras.layers.GlobalMaxPooling2D(data_format=None)
Output shape
2D tensor with shape: (batch_size, channels)
GlobalAveragePooling2D
keras.layers.GlobalAveragePooling2D(data_format=None)
Output shape
2D tensor with shape: (batch_size, channels)
GlobalMaxPooling3D
keras.layers.GlobalMaxPooling3D(data_format=None)
Output shape
2D tensor with shape: (batch_size, channels)
GlobalAveragePooling3D
keras.layers.GlobalAveragePooling3D(data_format=None)
Output shape
2D tensor with shape: (batch_size, channels)
LocallyConnected1D
keras.layers.LocallyConnected1D(filters, kernel_size, strides=1, padding='valid',
data_format=None, activation=None, use_bias=True,
kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the
convolution).
• kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D
convolution window.
• strides: An integer or tuple/list of a single integer, specifying the stride length of the
convolution. Specifying any stride value != 1 is incompatible with specifying any
dilation_rate value != 1.
• padding: Currently only supports "valid" (case-insensitive). "same" may be supported in
the future.
• data_format: String, one of channels_first, channels_last.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
3D tensor with shape: (batch_size, steps, input_dim)
Output shape
3D tensor with shape: (batch_size, new_steps, filters) steps value might have
changed due to padding or strides.
LocallyConnected2D
keras.layers.LocallyConnected2D(filters, kernel_size, strides=(1, 1),
padding='valid', data_format=None, activation=None, use_bias=True,
kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the
convolution).
• kernel_size: An integer or tuple/list of 2 integers, specifying the width and height of the 2D
convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
• strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the
width and height. Can be a single integer to specify the same value for all spatial dimensions.
• padding: Currently only support "valid" (case-insensitive). "same" will be supported in
future.
• data_format: A string, one of channels_last (default) or channels_first. The
ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape
(batch, height, width, channels) while channels_first corresponds to
inputs with shape (batch, channels, height, width). It defaults to the
image_data_format value found in your Keras config file at ~/.keras/keras.json.
If you never set it, then it will be "channels_last".
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
4D tensor with shape: (samples, channels, rows, cols) if data_format='channels_first' or
4D tensor with shape: (samples, rows, cols, channels) if data_format='channels_last'.
Output shape
4D tensor with shape: (samples, filters, new_rows, new_cols) if
data_format='channels_first' or 4D tensor with shape: (samples, new_rows, new_cols,
filters) if data_format='channels_last'. rows and cols values might have changed due to
padding.
RNN
keras.engine.base_layer.wrapped_fn()
It is also possible for cell to be a list of RNN cell instances, in which cases the cells get
stacked one after the other in the RNN, implementing an efficient stacked RNN.
• return_sequences: Boolean. Whether to return the last output in the output sequence, or the full
sequence.
• return_state: Boolean. Whether to return the last state in addition to the output.
• go_backwards: Boolean (default False). If True, process the input sequence backwards and
return the reversed sequence.
• stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will
be used as initial state for the sample of index i in the following batch.
• unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will
be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive.
Unrolling is only suitable for short sequences.
• input_dim: dimensionality of the input (integer). This argument (or alternatively, the keyword
argument input_shape) is required when using this layer as the first layer in a model.
• input_length: Length of input sequences, to be specified when it is constant. This argument is
required if you are going to connect Flatten then Dense layers upstream (without it, the
shape of the dense outputs cannot be computed). Note that if the recurrent layer is not the first
layer in your model, you would need to specify the input length at the level of the first layer
(e.g. via the input_shape argument)
Input shape
3D tensor with shape (batch_size, timesteps, input_dim).
Output shape
• if return_state: a list of tensors. The first tensor is the output. The remaining tensors are
the last states, each with shape (batch_size, units). For example, the number of state
tensors is 1 (for RNN and GRU) or 2 (for LSTM).
• if return_sequences: 3D tensor with shape (batch_size, timesteps, units).
• else, 2D tensor with shape (batch_size, units).
Masking
This layer supports masking for input data with a variable number of timesteps. To introduce masks to
your data, use an Embedding layer with the mask_zero parameter set to True.
To reset the states of your model, call .reset_states() on either a specific layer, or on your entire
model.
Note on specifying the initial state of RNNs
You can specify the initial state of RNN layers symbolically by calling them with the keyword
argument initial_state. The value of initial_state should be a tensor or list of tensors
representing the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by calling reset_states with the
keyword argument states. The value of states should be a numpy array or list of numpy arrays
representing the initial state of the RNN layer.
Note on passing external constants to RNNs
You can pass "external" constants to the cell using the constants keyword argument of
RNN.__call__ (as well as RNN.call) method. This requires that the cell.call method accepts
the same keyword argument constants. Such constants can be used to condition the cell
transformation on additional static inputs (not changing over time), a.k.a. an attention mechanism.
Examples
# First, let's define a RNN Cell, as a layer subclass.
class MinimalRNNCell(keras.layers.Layer):
cell = MinimalRNNCell(32)
x = keras.Input((None, 5))
layer = RNN(cell)
y = layer(x)
SimpleRNN
keras.layers.SimpleRNN(units, activation='tanh', use_bias=True,
kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal',
bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None,
bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
recurrent_constraint=None, bias_constraint=None, dropout=0.0,
recurrent_dropout=0.0, return_sequences=False, return_state=False,
go_backwards=False, stateful=False, unroll=False)
Fully-connected RNN where the output is to be fed back to input.
Arguments
• units: Positive integer, dimensionality of the output space.
• activation: Activation function to use (see activations). Default: hyperbolic tangent (tanh). If
you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation
of the inputs (see initializers).
• recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the
linear transformation of the recurrent state (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights
matrix (see regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel weights matrix (see
constraints).
• recurrent_constraint: Constraint function applied to the recurrent_kernel weights
matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
• dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the
inputs.
• recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear
transformation of the recurrent state.
• return_sequences: Boolean. Whether to return the last output in the output sequence, or the full
sequence.
• return_state: Boolean. Whether to return the last state in addition to the output.
• go_backwards: Boolean (default False). If True, process the input sequence backwards and
return the reversed sequence.
• stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will
be used as initial state for the sample of index i in the following batch.
• unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will
be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive.
Unrolling is only suitable for short sequences.
GRU
keras.layers.GRU(units, activation='tanh', recurrent_activation='sigmoid',
use_bias=True, kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal', bias_initializer='zeros',
kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=2,
return_sequences=False, return_state=False, go_backwards=False, stateful=False,
unroll=False, reset_after=False)
Arguments
• units: Positive integer, dimensionality of the output space.
• activation: Activation function to use (see activations). Default: hyperbolic tangent (tanh). If
you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
• recurrent_activation: Activation function to use for the recurrent step (see activations).
Default: hard sigmoid (hard_sigmoid). If you pass None, no activation is applied (ie.
"linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation
of the inputs (see initializers).
• recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the
linear transformation of the recurrent state (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights
matrix (see regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel weights matrix (see
constraints).
• recurrent_constraint: Constraint function applied to the recurrent_kernel weights
matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
• dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the
inputs.
• recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear
transformation of the recurrent state.
• implementation: Implementation mode, either 1 or 2. Mode 1 will structure its operations as a
larger number of smaller dot products and additions, whereas mode 2 will batch them into
fewer, larger operations. These modes will have different performance profiles on different
hardware and for different applications.
• return_sequences: Boolean. Whether to return the last output in the output sequence, or the full
sequence.
• return_state: Boolean. Whether to return the last state in addition to the output.
• go_backwards: Boolean (default False). If True, process the input sequence backwards and
return the reversed sequence.
• stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will
be used as initial state for the sample of index i in the following batch.
• unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will
be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive.
Unrolling is only suitable for short sequences.
• reset_after: GRU convention (whether to apply reset gate after or before matrix multiplication).
False = "before" (default), True = "after" (CuDNN compatible).
References
• Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine
Translation
• On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
• Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
• A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
LSTM
keras.layers.LSTM(units, activation='tanh', recurrent_activation='sigmoid',
use_bias=True, kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal', bias_initializer='zeros',
unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None,
bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
recurrent_constraint=None, bias_constraint=None, dropout=0.0,
recurrent_dropout=0.0, implementation=2, return_sequences=False,
return_state=False, go_backwards=False, stateful=False, unroll=False)
ConvLSTM2D
keras.layers.ConvLSTM2D(filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), activation='tanh',
recurrent_activation='hard_sigmoid', use_bias=True,
kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal',
bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None,
recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,
return_sequences=False, go_backwards=False, stateful=False, dropout=0.0,
recurrent_dropout=0.0)
Convolutional LSTM.
It is similar to an LSTM layer, but the input transformations and recurrent transformations are both
convolutional.
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number output of filters in the
convolution).
• kernel_size: An integer or tuple/list of n integers, specifying the dimensions of the convolution
window.
• strides: An integer or tuple/list of n integers, specifying the strides of the convolution.
Specifying any stride value != 1 is incompatible with specifying any dilation_rate value !
= 1.
• padding: One of "valid" or "same" (case-insensitive).
• data_format: A string, one of "channels_last" (default) or "channels_first". The
ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with
shape (batch, time, ..., channels) while "channels_first" corresponds to
inputs with shape (batch, time, channels, ...). It defaults to the
image_data_format value found in your Keras config file at ~/.keras/keras.json.
If you never set it, then it will be "channels_last".
• dilation_rate: An integer or tuple/list of n integers, specifying the dilation rate to use for dilated
convolution. Currently, specifying any dilation_rate value != 1 is incompatible with
specifying any strides value != 1.
• activation: Activation function to use (see activations).
• recurrent_activation: Activation function to use for the recurrent step (see activations).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation
of the inputs. (see initializers).
• recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the
linear transformation of the recurrent state. (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• unit_forget_bias: Boolean. If True, add 1 to the bias of the forget gate at initialization. Use in
combination with bias_initializer="zeros". This is recommended in Jozefowicz et
al. (2015).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights
matrix (see regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel weights matrix (see
constraints).
• recurrent_constraint: Constraint function applied to the recurrent_kernel weights
matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
• return_sequences: Boolean. Whether to return the last output in the output sequence, or the full
sequence.
• go_backwards: Boolean (default False). If True, process the input sequence backwards.
• stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will
be used as initial state for the sample of index i in the following batch.
• dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the
inputs.
• recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear
transformation of the recurrent state.
Input shape
• if data_format='channels_first' 5D tensor with shape: (samples, time, channels,
rows, cols)
• if data_format='channels_last' 5D tensor with shape: (samples, time, rows, cols,
channels)
Output shape
• if return_sequences
• if data_format='channels_first' 5D tensor with shape: (samples, time,
filters, output_row, output_col)
• if data_format='channels_last' 5D tensor with shape: (samples, time,
output_row, output_col, filters)
• else
• if data_format='channels_first' 4D tensor with shape: (samples, filters,
output_row, output_col)
• if data_format='channels_last' 4D tensor with shape: (samples, output_row,
output_col, filters)
where o_row and o_col depend on the shape of the filter and the padding
Raises
• ValueError: in case of invalid constructor arguments.
References
• Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
The current implementation does not include the feedback loop on the cells output
ConvLSTM2DCell
keras.layers.ConvLSTM2DCell(filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), activation='tanh',
recurrent_activation='hard_sigmoid', use_bias=True,
kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal',
bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None,
recurrent_regularizer=None, bias_regularizer=None, kernel_constraint=None,
recurrent_constraint=None, bias_constraint=None, dropout=0.0,
recurrent_dropout=0.0)
GRUCell
keras.layers.GRUCell(units, activation='tanh', recurrent_activation='sigmoid',
use_bias=True, kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal', bias_initializer='zeros',
kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None,
kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,
dropout=0.0, recurrent_dropout=0.0, implementation=2, reset_after=False)
Cell class for the GRU layer.
Arguments
• units: Positive integer, dimensionality of the output space.
• activation: Activation function to use (see activations). Default: hyperbolic tangent (tanh). If
you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
• recurrent_activation: Activation function to use for the recurrent step (see activations).
Default: hard sigmoid (hard_sigmoid). If you pass None, no activation is applied (ie.
"linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation
of the inputs (see initializers).
• recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the
linear transformation of the recurrent state (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights
matrix (see regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• kernel_constraint: Constraint function applied to the kernel weights matrix (see
constraints).
• recurrent_constraint: Constraint function applied to the recurrent_kernel weights
matrix (see constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
• dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the
inputs.
• recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear
transformation of the recurrent state.
• implementation: Implementation mode, either 1 or 2. Mode 1 will structure its operations as a
larger number of smaller dot products and additions, whereas mode 2 will batch them into
fewer, larger operations. These modes will have different performance profiles on different
hardware and for different applications.
• reset_after: GRU convention (whether to apply reset gate after or before matrix multiplication).
False = "before" (default), True = "after" (CuDNN compatible).
LSTMCell
keras.layers.LSTMCell(units, activation='tanh', recurrent_activation='sigmoid',
use_bias=True, kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal', bias_initializer='zeros',
unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None,
bias_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=2)
CuDNNLSTM
keras.layers.CuDNNLSTM(units, kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal', bias_initializer='zeros',
unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None,
bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
recurrent_constraint=None, bias_constraint=None, return_sequences=False,
return_state=False, stateful=False)
Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -
0.2]]
This layer can only be used as the first layer in a model.
Example
model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch, input_length).
# the largest integer (i.e. word index) in the input should be
# no larger than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch dimension.
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
assert output_array.shape == (32, 10, 64)
Arguments
• input_dim: int > 0. Size of the vocabulary, i.e. maximum integer index + 1.
• output_dim: int >= 0. Dimension of the dense embedding.
• embeddings_initializer: Initializer for the embeddings matrix (see initializers).
• embeddings_regularizer: Regularizer function applied to the embeddings matrix (see
regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• embeddings_constraint: Constraint function applied to the embeddings matrix (see
constraints).
• mask_zero: Whether or not the input value 0 is a special "padding" value that should be masked
out. This is useful when using recurrent layers which may take variable length input. If this is
True then all subsequent layers in the model need to support masking or an exception will be
raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary
(input_dim should equal size of vocabulary + 1).
• input_length: Length of input sequences, when it is constant. This argument is required if you
are going to connect Flatten then Dense layers upstream (without it, the shape of the dense
outputs cannot be computed).
Input shape
2D tensor with shape: (batch_size, sequence_length).
Output shape
3D tensor with shape: (batch_size, sequence_length, output_dim).
References
• A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Add
keras.layers.Add()
Layer that adds a list of inputs.
It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same
shape).
Examples
import keras
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
# equivalent to added = keras.layers.add([x1, x2])
added = keras.layers.Add()([x1, x2])
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
Subtract
keras.layers.Subtract()
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
# Equivalent to subtracted = keras.layers.subtract([x1, x2])
subtracted = keras.layers.Subtract()([x1, x2])
out = keras.layers.Dense(4)(subtracted)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
Multiply
keras.layers.Multiply()
Maximum
keras.layers.Maximum()
Minimum
keras.layers.Minimum()
Concatenate
keras.layers.Concatenate(axis=-1)
Dot
keras.layers.Dot(axes, normalize=False)
Layer that computes a dot product between samples in two tensors.
E.g. if applied to a list of two tensors a and b of shape (batch_size, n), the output will be a
tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and
b[i].
Arguments
• axes: Integer or tuple of integers, axis or axes along which to take the dot product.
• normalize: Whether to L2-normalize samples along the dot product axis before taking the dot
product. If set to True, then the output of the dot product is the cosine proximity between the
two samples.
• **kwargs: Standard layer keyword arguments.
add
keras.layers.add(inputs)
Arguments
• inputs: A list of input tensors (at least 2).
• **kwargs: Standard layer keyword arguments.
Returns
A tensor, the sum of the inputs.
Examples
import keras
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
added = keras.layers.add([x1, x2])
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
subtract
keras.layers.subtract(inputs)
Arguments
• inputs: A list of input tensors (exactly 2).
• **kwargs: Standard layer keyword arguments.
Returns
A tensor, the difference of the inputs.
Examples
import keras
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
subtracted = keras.layers.subtract([x1, x2])
out = keras.layers.Dense(4)(subtracted)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
multiply
keras.layers.multiply(inputs)
Arguments
• inputs: A list of input tensors (at least 2).
• **kwargs: Standard layer keyword arguments.
Returns
A tensor, the element-wise product of the inputs.
average
keras.layers.average(inputs)
Arguments
• inputs: A list of input tensors (at least 2).
• **kwargs: Standard layer keyword arguments.
Returns
A tensor, the average of the inputs.
maximum
keras.layers.maximum(inputs)
Arguments
• inputs: A list of input tensors (at least 2).
• **kwargs: Standard layer keyword arguments.
Returns
A tensor, the element-wise maximum of the inputs.
minimum
keras.layers.minimum(inputs)
Arguments
• inputs: A list of input tensors (at least 2).
• **kwargs: Standard layer keyword arguments.
Returns
A tensor, the element-wise minimum of the inputs.
concatenate
keras.layers.concatenate(inputs, axis=-1)
Arguments
• inputs: A list of input tensors (at least 2).
• axis: Concatenation axis.
• **kwargs: Standard layer keyword arguments.
Returns
A tensor, the concatenation of the inputs alongside axis axis.
dot
keras.layers.dot(inputs, axes, normalize=False)
Arguments
• inputs: A list of input tensors (at least 2).
• axes: Integer or tuple of integers, axis or axes along which to take the dot product.
• normalize: Whether to L2-normalize samples along the dot product axis before taking the dot
product. If set to True, then the output of the dot product is the cosine proximity between the
two samples.
• **kwargs: Standard layer keyword arguments.
Returns
A tensor, the dot product of the samples from the inputs.
LeakyReLU
keras.layers.LeakyReLU(alpha=0.3)
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
Output shape
Same shape as the input.
Arguments
• alpha: float >= 0. Negative slope coefficient.
References
• Rectifier Nonlinearities Improve Neural Network Acoustic Models
PReLU
keras.layers.PReLU(alpha_initializer='zeros', alpha_regularizer=None,
alpha_constraint=None, shared_axes=None)
References
• Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet
Classification
ELU
keras.layers.ELU(alpha=1.0)
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
Output shape
Same shape as the input.
Arguments
• alpha: scale for the negative factor.
References
• Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
ThresholdedReLU
keras.layers.ThresholdedReLU(theta=1.0)
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
Output shape
Same shape as the input.
Arguments
• theta: float >= 0. Threshold location of activation.
References
• Zero-Bias Autoencoders and the Benefits of Co-Adapting Features
Softmax
keras.layers.Softmax(axis=-1)
ReLU
keras.layers.ReLU(max_value=None, negative_slope=0.0, threshold=0.0)
Rectified Linear Unit activation function.
With default values, it returns element-wise max(x, 0).
Otherwise, it follows: f(x) = max_value for x >= max_value, f(x) = x for threshold
<= x < max_value, f(x) = negative_slope * (x - threshold) otherwise.
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
Output shape
Same shape as the input.
Arguments
• max_value: float >= 0. Maximum activation value.
• negative_slope: float >= 0. Negative slope coefficient.
• threshold: float. Threshold value for thresholded activation.
BatchNormalization
keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True,
scale=True, beta_initializer='zeros', gamma_initializer='ones',
moving_mean_initializer='zeros', moving_variance_initializer='ones',
beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
gamma_constraint=None)
GaussianNoise
keras.layers.GaussianNoise(stddev)
GaussianDropout
keras.layers.GaussianDropout(rate)
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
Output shape
Same shape as input.
References
• Dropout: A Simple Way to Prevent Neural Networks from Overfitting
AlphaDropout
keras.layers.AlphaDropout(rate, noise_shape=None, seed=None)
You can then use TimeDistributed to apply a Dense layer to each of the 10 timesteps,
independently:
# as the first layer in a model
model = Sequential()
model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))
# now model.output_shape == (None, 10, 8)
TimeDistributed can be used with arbitrary layers, not just Dense, for instance with a Conv2D
layer:
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3)),
input_shape=(10, 299, 299, 3)))
Arguments
• layer: a layer instance.
Bidirectional
keras.engine.base_layer.wrapped_fn()
Examples
model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True),
input_shape=(5, 10)))
model.add(Bidirectional(LSTM(10)))
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
class MyLayer(Layer):
It is also possible to define Keras layers which have multiple input tensors and multiple output tensors.
To do this, you should assume that the inputs and outputs of the methods build(input_shape),
call(x) and compute_output_shape(input_shape) are lists. Here is an example, similar
to the one above:
from keras import backend as K
from keras.layers import Layer
class MyLayer(Layer):
The existing Keras layers provide examples of how to implement almost anything. Never hesitate to
read the source code!
TimeseriesGenerator
keras.preprocessing.sequence.TimeseriesGenerator(data, targets, length,
sampling_rate=1, stride=1, start_index=0, end_index=None, shuffle=False,
reverse=False, batch_size=128)
batch_0 = data_gen[0]
x, y = batch_0
assert np.array_equal(x,
np.array([[[0], [2], [4], [6], [8]],
[[1], [3], [5], [7], [9]]]))
assert np.array_equal(y,
np.array([[10], [11]]))
pad_sequences
keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32',
padding='pre', truncating='pre', value=0.0)
Sequences longer than num_timesteps are truncated so that they fit the desired length. The position
where padding or truncation happens is determined by the arguments padding and truncating,
respectively.
Pre-padding is the default.
Arguments
• sequences: List of lists, where each element is a sequence.
• maxlen: Int, maximum length of all sequences.
• dtype: Type of the output sequences. To pad sequences with variable length strings, you can use
object.
• padding: String, 'pre' or 'post': pad either before or after each sequence.
• truncating: String, 'pre' or 'post': remove values from sequences larger than maxlen, either at
the beginning or at the end of the sequences.
• value: Float or String, padding value.
Returns
• x: Numpy array with shape (len(sequences), maxlen)
Raises
• ValueError: In case of invalid values for truncating or padding, or in case of invalid
shape for a sequences entry.
skipgrams
keras.preprocessing.sequence.skipgrams(sequence, vocabulary_size, window_size=4,
negative_samples=1.0, shuffle=True, categorical=False, sampling_table=None,
seed=None)
Note
By convention, index 0 in the vocabulary is a non-word and will be skipped.
make_sampling_table
keras.preprocessing.sequence.make_sampling_table(size, sampling_factor=1e-05)
Arguments
• size: Int, number of possible words to sample.
• sampling_factor: The sampling factor in the word2vec formula.
Returns
A 1D Numpy array of length size where the ith entry is the probability that a word of rank i should be
sampled.
Text Preprocessing
Tokenizer
keras.preprocessing.text.Tokenizer(num_words=None, filters='!"#$%&()*+,-./:;<=>?
@[\\]^_`{|}~\t\n', lower=True, split=' ', char_level=False, oov_token=None,
document_count=0)
Two or more words may be assigned to the same index, due to possible collisions by the hashing
function. The probability of a collision is in relation to the dimension of the hashing space and the
number of distinct objects.
one_hot
keras.preprocessing.text.one_hot(text, n, filters='!"#$%&()*+,-./:;<=>?
@[\\]^_`{|}~\t\n', lower=True, split=' ')
text_to_word_sequence
keras.preprocessing.text.text_to_word_sequence(text, filters='!"#$%&()*+,-./:;<=>?
@[\\]^_`{|}~\t\n', lower=True, split=' ')
Image Preprocessing
ImageDataGenerator class
keras.preprocessing.image.ImageDataGenerator(featurewise_center=False,
samplewise_center=False, featurewise_std_normalization=False,
samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06,
rotation_range=0, width_shift_range=0.0, height_shift_range=0.0,
brightness_range=None, shear_range=0.0, zoom_range=0.0, channel_shift_range=0.0,
fill_mode='nearest', cval=0.0, horizontal_flip=False, vertical_flip=False,
rescale=None, preprocessing_function=None, data_format='channels_last',
validation_split=0.0, interpolation_order=1, dtype='float32')
Generate batches of tensor image data with real-time data augmentation. The data will be looped over
(in batches).
Arguments
• featurewise_center: Boolean. Set input mean to 0 over the dataset, feature-wise.
• samplewise_center: Boolean. Set each sample mean to 0.
• featurewise_std_normalization: Boolean. Divide inputs by std of the dataset, feature-wise.
• samplewise_std_normalization: Boolean. Divide each input by its std.
• zca_epsilon: epsilon for ZCA whitening. Default is 1e-6.
• zca_whitening: Boolean. Apply ZCA whitening.
• rotation_range: Int. Degree range for random rotations.
• width_shift_range: Float, 1-D array-like or int
• float: fraction of total width, if < 1, or pixels if >= 1.
• 1-D array-like: random elements from the array.
• int: integer number of pixels from interval (-width_shift_range,
+width_shift_range)
• With width_shift_range=2 possible values are integers [-1, 0, +1], same as
with width_shift_range=[-1, 0, +1], while with
width_shift_range=1.0 possible values are floats in the interval [-1.0, +1.0).
• height_shift_range: Float, 1-D array-like or int
• float: fraction of total height, if < 1, or pixels if >= 1.
• 1-D array-like: random elements from the array.
• int: integer number of pixels from interval (-height_shift_range,
+height_shift_range)
• With height_shift_range=2 possible values are integers [-1, 0, +1], same
as with height_shift_range=[-1, 0, +1], while with
height_shift_range=1.0 possible values are floats in the interval [-1.0, +1.0).
• brightness_range: Tuple or list of two floats. Range for picking a brightness shift value from.
• shear_range: Float. Shear Intensity (Shear angle in counter-clockwise direction in degrees)
• zoom_range: Float or [lower, upper]. Range for random zoom. If a float, [lower, upper]
= [1-zoom_range, 1+zoom_range].
• channel_shift_range: Float. Range for random channel shifts.
• fill_mode: One of {"constant", "nearest", "reflect" or "wrap"}. Default is 'nearest'. Points
outside the boundaries of the input are filled according to the given mode:
• 'constant': kkkkkkkk|abcd|kkkkkkkk (cval=k)
• 'nearest': aaaaaaaa|abcd|dddddddd
• 'reflect': abcddcba|abcd|dcbaabcd
• 'wrap': abcdabcd|abcd|abcdabcd
• cval: Float or Int. Value used for points outside the boundaries when fill_mode =
"constant".
• horizontal_flip: Boolean. Randomly flip inputs horizontally.
• vertical_flip: Boolean. Randomly flip inputs vertically.
• rescale: rescaling factor. Defaults to None. If None or 0, no rescaling is applied, otherwise we
multiply the data by the value provided (after applying all other transformations).
• preprocessing_function: function that will be applied on each input. The function will run after
the image is resized and augmented. The function should take one argument: one image
(Numpy tensor with rank 3), and should output a Numpy tensor with the same shape.
• data_format: Image data format, either "channels_first" or "channels_last". "channels_last"
mode means that the images should have shape (samples, height, width,
channels), "channels_first" mode means that the images should have shape (samples,
channels, height, width). It defaults to the image_data_format value found in
your Keras config file at ~/.keras/keras.json. If you never set it, then it will be
"channels_last".
• validation_split: Float. Fraction of images reserved for validation (strictly between 0 and 1).
• dtype: Dtype to use for the generated arrays.
Examples
Example of using .flow(x, y):
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=50,
validation_data=validation_generator,
validation_steps=800)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_datagen.fit(images, augment=True, seed=seed)
mask_datagen.fit(masks, augment=True, seed=seed)
image_generator = image_datagen.flow_from_directory(
'data/images',
class_mode=None,
seed=seed)
mask_generator = mask_datagen.flow_from_directory(
'data/masks',
class_mode=None,
seed=seed)
model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=50)
train_df = pandas.read_csv("./train.csv")
valid_df = pandas.read_csv("./valid.csv")
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_dataframe(
dataframe=train_df,
directory='data/train',
x_col="filename",
y_col="class",
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_dataframe(
dataframe=valid_df,
directory='data/validation',
x_col="filename",
y_col="class",
target_size=(150, 150),
batch_size=32,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=50,
validation_data=validation_generator,
validation_steps=800)
ImageDataGenerator methods
apply_transform
apply_transform(x, transform_parameters)
Returns
A transformed version of the input (same shape).
fit
fit(x, augment=False, rounds=1, seed=None)
Arguments
• x: Sample data. Should have rank 4. In case of grayscale data, the channels axis should have
value 1, in case of RGB data, it should have value 3, and in case of RGBA data, it should have
value 4.
• augment: Boolean (default: False). Whether to fit on randomly augmented samples.
• rounds: Int (default: 1). If using data augmentation (augment=True), this is how many
augmentation passes over the data to use.
• seed: Int (default: None). Random seed.
flow
flow(x, y=None, batch_size=32, shuffle=True, sample_weight=None, seed=None,
save_to_dir=None, save_prefix='', save_format='png', subset=None)
Returns
An Iterator yielding tuples of (x, y) where x is a numpy array of image data (in the case of a
single image input) or a list of numpy arrays (in the case with additional inputs) and y is a numpy array
of corresponding labels. If 'sample_weight' is not None, the yielded tuples are of the form (x, y,
sample_weight). If y is None, only the numpy array x is returned.
flow_from_dataframe
flow_from_dataframe(dataframe, directory=None, x_col='filename', y_col='class',
weight_col=None, target_size=(256, 256), color_mode='rgb', classes=None,
class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None,
save_prefix='', save_format='png', subset=None, interpolation='nearest',
validate_filenames=True)
Takes the dataframe and the path to a directory and generates batches of augmented/normalized data.
A simple tutorial can be found here.
Arguments
• dataframe: Pandas dataframe containing the filepaths relative to directory (or absolute
paths if directory is None) of the images in a string column. It should include other column/
s depending on the class_mode:
• if class_mode is "categorical" (default value) it must include the y_col
column with the class/es of each image. Values in column can be string/list/tuple if a
single class or list/tuple if multiple classes.
• if class_mode is "binary" or "sparse" it must include the given y_col column
with class values as strings.
• if class_mode is "raw" or "multi_output" it should contain
Returns
A DataFrameIterator yielding tuples of (x, y) where x is a numpy array containing a batch of
images with shape (batch_size, *target_size, channels) and y is a numpy array of
corresponding labels.
flow_from_directory
flow_from_directory(directory, target_size=(256, 256), color_mode='rgb',
classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None,
save_to_dir=None, save_prefix='', save_format='png', follow_links=False,
subset=None, interpolation='nearest')
Returns
A DirectoryIterator yielding tuples of (x, y) where x is a numpy array containing a batch of
images with shape (batch_size, *target_size, channels) and y is a numpy array of
corresponding labels.
get_random_transform
get_random_transform(img_shape, seed=None)
random_transform
random_transform(x, seed=None)
standardize
standardize(x)
model.compile(loss=losses.mean_squared_error, optimizer='sgd')
You can either pass the name of an existing loss function, or pass a TensorFlow/Theano symbolic
function that returns a scalar for each data-point and takes the following two arguments:
• y_true: True labels. TensorFlow/Theano tensor.
• y_pred: Predictions. TensorFlow/Theano tensor of the same shape as y_true.
The actual optimized objective is the mean of the output array across all datapoints.
For a few examples of such functions, check out the losses source.
mean_absolute_error
keras.losses.mean_absolute_error(y_true, y_pred)
mean_absolute_percentage_error
keras.losses.mean_absolute_percentage_error(y_true, y_pred)
mean_squared_logarithmic_error
keras.losses.mean_squared_logarithmic_error(y_true, y_pred)
squared_hinge
keras.losses.squared_hinge(y_true, y_pred)
hinge
keras.losses.hinge(y_true, y_pred)
categorical_hinge
keras.losses.categorical_hinge(y_true, y_pred)
logcosh
keras.losses.logcosh(y_true, y_pred)
huber_loss
keras.losses.huber_loss(y_true, y_pred, delta=1.0)
categorical_crossentropy
keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=False,
label_smoothing=0)
sparse_categorical_crossentropy
keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False,
axis=-1)
binary_crossentropy
keras.losses.binary_crossentropy(y_true, y_pred, from_logits=False,
label_smoothing=0)
kullback_leibler_divergence
keras.losses.kullback_leibler_divergence(y_true, y_pred)
poisson
keras.losses.poisson(y_true, y_pred)
cosine_proximity
keras.losses.cosine_proximity(y_true, y_pred, axis=-1)
is_categorical_crossentropy
keras.losses.is_categorical_crossentropy(loss)
Note: when using the categorical_crossentropy loss, your targets should be in categorical
format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is
all-zeros except for a 1 at the index corresponding to the class of the sample). In order to convert
integer targets into categorical targets, you can use the Keras utility to_categorical:
from keras.utils import to_categorical
Usage of metrics
A metric is a function that is used to judge the performance of your model. Metric functions are to be
supplied in the metrics parameter when a model is compiled.
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['mae', 'acc'])
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=[metrics.mae, metrics.categorical_accuracy])
A metric function is similar to a loss function, except that the results from evaluating a metric are not
used when training the model. You may use any of the loss functions as a metric function.
You can either pass the name of an existing metric, or pass a Theano/TensorFlow symbolic function
(see Custom metrics).
Arguments
• y_true: True labels. Theano/TensorFlow tensor.
• y_pred: Predictions. Theano/TensorFlow tensor of the same shape as y_true.
Returns
Single tensor value representing the mean of the output array across all datapoints.
Available metrics
accuracy
keras.metrics.accuracy(y_true, y_pred)
binary_accuracy
keras.metrics.binary_accuracy(y_true, y_pred, threshold=0.5)
categorical_accuracy
keras.metrics.categorical_accuracy(y_true, y_pred)
sparse_categorical_accuracy
keras.metrics.sparse_categorical_accuracy(y_true, y_pred)
top_k_categorical_accuracy
keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=5)
sparse_top_k_categorical_accuracy
keras.metrics.sparse_top_k_categorical_accuracy(y_true, y_pred, k=5)
cosine_proximity
keras.metrics.cosine_proximity(y_true, y_pred, axis=-1)
clone_metric
keras.metrics.clone_metric(metric)
Custom metrics
Custom metrics can be passed at the compilation step. The function would need to take (y_true,
y_pred) as arguments and return a single tensor value.
import keras.backend as K
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy', mean_pred])
Usage of optimizers
An optimizer is one of the two arguments required for compiling a Keras model:
from keras import optimizers
model = Sequential()
model.add(Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(Activation('softmax'))
You can either instantiate an optimizer before passing it to model.compile() , as in the above
example, or you can call it by its name. In the latter case, the default parameters for the optimizer will
be used.
# pass optimizer by name: default parameters will be used
model.compile(loss='mean_squared_error', optimizer='sgd')
SGD
keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False)
RMSprop
keras.optimizers.RMSprop(learning_rate=0.001, rho=0.9)
RMSProp optimizer.
It is recommended to leave the parameters of this optimizer at their default values (except the learning
rate, which can be freely tuned).
Arguments
• learning_rate: float >= 0. Learning rate.
• rho: float >= 0.
References
• rmsprop: Divide the gradient by a running average of its recent magnitude
Adagrad
keras.optimizers.Adagrad(learning_rate=0.01)
Adagrad optimizer.
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how
frequently a parameter gets updated during training. The more updates a parameter receives, the
smaller the learning rate.
It is recommended to leave the parameters of this optimizer at their default values.
Arguments
• learning_rate: float >= 0. Initial learning rate.
References
• Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Adadelta
keras.optimizers.Adadelta(learning_rate=1.0, rho=0.95)
Adadelta optimizer.
Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window
of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning
even when many updates have been done. Compared to Adagrad, in the original version of Adadelta
you don't have to set an initial learning rate. In this version, initial learning rate and decay factor can be
set, as in most other Keras optimizers.
It is recommended to leave the parameters of this optimizer at their default values.
Arguments
• learning_rate: float >= 0. Initial learning rate, defaults to 1. It is recommended to leave it at the
default value.
• rho: float >= 0. Adadelta decay factor, corresponding to fraction of gradient to keep at each
time step.
References
• Adadelta - an adaptive learning rate method
Adam
keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
Adam optimizer.
Default parameters follow those provided in the original paper.
Arguments
• learning_rate: float >= 0. Learning rate.
• beta_1: float, 0 < beta < 1. Generally close to 1.
• beta_2: float, 0 < beta < 1. Generally close to 1.
• amsgrad: boolean. Whether to apply the AMSGrad variant of this algorithm from the paper
"On the Convergence of Adam and Beyond".
References
• Adam - A Method for Stochastic Optimization
• On the Convergence of Adam and Beyond
Adamax
keras.optimizers.Adamax(learning_rate=0.002, beta_1=0.9, beta_2=0.999)
Nadam
keras.optimizers.Nadam(learning_rate=0.002, beta_1=0.9, beta_2=0.999)
Usage of activations
Activations can either be used through an Activation layer, or through the activation argument
supported by all forward layers:
from keras.layers import Activation, Dense
model.add(Dense(64))
model.add(Activation('tanh'))
model.add(Dense(64, activation=K.tanh))
Available activations
elu
keras.activations.elu(x, alpha=1.0)
References
• Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
softmax
keras.activations.softmax(x, axis=-1)
Note
• To be used together with the initialization "lecun_normal".
• To be used together with the dropout variant "AlphaDropout".
References
• Self-Normalizing Neural Networks
softplus
keras.activations.softplus(x)
softsign
keras.activations.softsign(x)
relu
keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0.0)
Otherwise, it follows: f(x) = max_value for x >= max_value, f(x) = x for threshold
<= x < max_value, f(x) = alpha * (x - threshold) otherwise.
Arguments
• x: Input tensor.
• alpha: float. Slope of the negative part. Defaults to zero.
• max_value: float. Saturation threshold.
• threshold: float. Threshold value for thresholded activation.
Returns
A tensor.
tanh
keras.activations.tanh(x)
sigmoid
keras.activations.sigmoid(x)
hard_sigmoid
keras.activations.hard_sigmoid(x)
exponential
keras.activations.exponential(x)
linear
keras.activations.linear(x)
On "Advanced Activations"
Activations that are more complex than a simple TensorFlow/Theano/CNTK function (eg. learnable
activations, which maintain a state) are available as Advanced Activation layers, and can be found in
the module keras.layers.advanced_activations. These include PReLU and LeakyReLU.
Usage of callbacks
A callback is a set of functions to be applied at given stages of the training procedure. You can use
callbacks to get a view on internal states and statistics of the model during training. You can pass a list
of callbacks (as the keyword argument callbacks) to the .fit() method of the Sequential or
Model classes. The relevant methods of the callbacks will then be called at each stage of the training.
Callback
keras.callbacks.callbacks.Callback()
The logs dictionary that callback methods take as argument will contain keys for quantities relevant
to the current batch or epoch.
Currently, the .fit() method of the Sequential model class will include the following quantities
in the logs that it passes to its callbacks:
on_epoch_end: logs include acc and loss, and optionally include val_loss (if validation is
enabled in fit), and val_acc (if validation and accuracy monitoring are enabled). on_batch_begin:
logs include size, the number of samples in the current batch. on_batch_end: logs include loss, and
optionally acc (if accuracy monitoring is enabled).
BaseLogger
keras.callbacks.callbacks.BaseLogger(stateful_metrics=None)
TerminateOnNaN
keras.callbacks.callbacks.TerminateOnNaN()
ProgbarLogger
keras.callbacks.callbacks.ProgbarLogger(count_mode='samples',
stateful_metrics=None)
History
keras.callbacks.callbacks.History()
This callback is automatically applied to every Keras model. The History object gets returned by the
fit method of models.
ModelCheckpoint
keras.callbacks.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0,
save_best_only=False, save_weights_only=False, mode='auto', period=1)
EarlyStopping
keras.callbacks.callbacks.EarlyStopping(monitor='val_loss', min_delta=0,
patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)
RemoteMonitor
keras.callbacks.callbacks.RemoteMonitor(root='http://localhost:9000',
path='/publish/epoch/end/', field='data', headers=None, send_as_json=False)
LearningRateScheduler
keras.callbacks.callbacks.LearningRateScheduler(schedule, verbose=0)
Arguments
• monitor: quantity to be monitored.
• factor: factor by which the learning rate will be reduced. new_lr = lr * factor
• patience: number of epochs that produced the monitored quantity with no improvement after
which training will be stopped. Validation quantities may not be produced for every epoch, if
the validation frequency (model.fit(validation_freq=5)) is greater than one.
• verbose: int. 0: quiet, 1: update messages.
• mode: one of {auto, min, max}. In min mode, lr will be reduced when the quantity monitored
has stopped decreasing; in max mode it will be reduced when the quantity monitored has
stopped increasing; in auto mode, the direction is automatically inferred from the name of the
monitored quantity.
• min_delta: threshold for measuring the new optimum, to only focus on significant changes.
• cooldown: number of epochs to wait before resuming normal operation after lr has been
reduced.
• min_lr: lower bound on the learning rate.
CSVLogger
keras.callbacks.callbacks.CSVLogger(filename, separator=',', append=False)
Arguments
• filename: filename of the csv file, e.g. 'run/log.csv'.
• separator: string used to separate elements in the csv file.
• append: True: append if file exists (useful for continuing training). False: overwrite existing
file,
LambdaCallback
keras.callbacks.callbacks.LambdaCallback(on_epoch_begin=None, on_epoch_end=None,
on_batch_begin=None, on_batch_end=None, on_train_begin=None, on_train_end=None)
Arguments
• on_epoch_begin: called at the beginning of every epoch.
• on_epoch_end: called at the end of every epoch.
• on_batch_begin: called at the beginning of every batch.
• on_batch_end: called at the end of every batch.
• on_train_begin: called at the beginning of model training.
• on_train_end: called at the end of model training.
Example
# Print the batch number at the beginning of every batch.
batch_print_callback = LambdaCallback(
on_batch_begin=lambda batch,logs: print(batch))
# Stream the epoch loss to a file in JSON format. The file content
# is not well-formed JSON but rather has a JSON object per line.
import json
json_log = open('loss_log.json', mode='wt', buffering=1)
json_logging_callback = LambdaCallback(
on_epoch_end=lambda epoch, logs: json_log.write(
json.dumps({'epoch': epoch, 'loss': logs['loss']}) + '\n'),
on_train_end=lambda logs: json_log.close()
)
model.fit(...,
callbacks=[batch_print_callback,
json_logging_callback,
cleanup_callback])
TensorBoard
keras.callbacks.tensorboard_v1.TensorBoard(log_dir='./logs', histogram_freq=0,
batch_size=32, write_graph=True, write_grads=False, write_images=False,
embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None,
embeddings_data=None, update_freq='epoch')
When using a backend other than TensorFlow, TensorBoard will still work (if you have TensorFlow
installed), but the only feature available will be the display of the losses and metrics plots.
Arguments
• log_dir: the path of the directory where to save the log files to be parsed by TensorBoard.
• histogram_freq: frequency (in epochs) at which to compute activation and weight histograms
for the layers of the model. If set to 0, histograms won't be computed. Validation data (or split)
must be specified for histogram visualizations.
• batch_size: size of batch of inputs to feed to the network for histograms computation.
• write_graph: whether to visualize the graph in TensorBoard. The log file can become quite
large when write_graph is set to True.
• write_grads: whether to visualize gradient histograms in TensorBoard. histogram_freq
must be greater than 0.
• write_images: whether to write model weights to visualize as image in TensorBoard.
• embeddings_freq: frequency (in epochs) at which selected embedding layers will be saved. If
set to 0, embeddings won't be computed. Data to be visualized in TensorBoard's Embedding tab
must be passed as embeddings_data.
• embeddings_layer_names: a list of names of layers to keep eye on. If None or empty list all
the embedding layer will be watched.
• embeddings_metadata: a dictionary which maps layer name to a file name in which metadata
for this embedding layer is saved. See the details about metadata files format. In case if the
same metadata file is used for all embedding layers, string can be passed.
• embeddings_data: data to be embedded at layers specified in embeddings_layer_names.
Numpy array (if the model has a single input) or list of Numpy arrays (if the model has multiple
inputs). Learn more about embeddings.
• update_freq: 'batch' or 'epoch' or integer. When using 'batch', writes the losses and
metrics to TensorBoard after each batch. The same applies for 'epoch'. If using an integer,
let's say 10000, the callback will write the metrics and losses to TensorBoard every 10000
samples. Note that writing too frequently to TensorBoard can slow down your training.
Create a callback
You can create a custom callback by extending the base class keras.callbacks.Callback. A
callback has access to its associated model through the class property self.model.
Here's a simple example saving a list of losses over each batch during training:
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
model = Sequential()
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
history = LossHistory()
model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0,
callbacks=[history])
print(history.losses)
# outputs
'''
[0.66047596406559383, 0.3547245744908703, ..., 0.25953155204159617,
0.25901699725311789]
'''
Example: model checkpoints
from keras.callbacks import ModelCheckpoint
model = Sequential()
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
'''
saves the model weights after each epoch if the validation loss decreased
'''
checkpointer = ModelCheckpoint(filepath='/tmp/weights.hdf5', verbose=1,
save_best_only=True)
model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0,
validation_data=(X_test, Y_test), callbacks=[checkpointer])
Datasets
CIFAR10 small image classification
Dataset of 50,000 32x32 color training images, labeled over 10 categories, and 10,000 test images.
Usage:
from keras.datasets import cifar10
• Returns:
• 2 tuples:
• x_train, x_test: uint8 array of RGB image data with shape (num_samples, 3, 32,
32) or (num_samples, 32, 32, 3) based on the image_data_format backend
setting of either channels_first or channels_last respectively.
• y_train, y_test: uint8 array of category labels (integers in range 0-9) with shape
(num_samples, 1).
Usage:
from keras.datasets import cifar100
• Returns:
• 2 tuples:
• x_train, x_test: uint8 array of RGB image data with shape (num_samples, 3, 32,
32) or (num_samples, 32, 32, 3) based on the image_data_format backend
setting of either channels_first or channels_last respectively.
• y_train, y_test: uint8 array of category labels with shape (num_samples, 1).
• Arguments:
• label_mode: "fine" or "coarse".
Usage:
from keras.datasets import imdb
• Returns:
• 2 tuples:
• x_train, x_test: list of sequences, which are lists of indexes (integers). If the
num_words argument was specific, the maximum possible index value is
num_words-1. If the maxlen argument was specified, the largest possible
sequence length is maxlen.
• y_train, y_test: list of integer labels (1 or 0).
• Arguments:
• path: if you do not have the data locally (at '~/.keras/datasets/' + path), it
will be downloaded to this location.
• num_words: integer or None. Top most frequent words to consider. Any less frequent
word will appear as oov_char value in the sequence data.
• skip_top: integer. Top most frequent words to ignore (they will appear as oov_char
value in the sequence data).
• maxlen: int. Maximum sequence length. Any longer sequence will be truncated.
• seed: int. Seed for reproducible data shuffling.
• start_char: int. The start of a sequence will be marked with this character. Set to 1
because 0 is usually the padding character.
• oov_char: int. words that were cut out because of the num_words or skip_top limit
will be replaced with this character.
• index_from: int. Index actual words with this index and higher.
Usage:
from keras.datasets import reuters
The specifications are the same as that of the IMDB dataset, with the addition of:
• test_split: float. Fraction of the dataset to be used as test data.
This dataset also makes available the word index used for encoding the sequences:
word_index = reuters.get_word_index(path="reuters_word_index.json")
• Returns: A dictionary where key are words (str) and values are indexes (integer). eg.
word_index["giraffe"] might return 1234.
• Arguments:
• path: if you do not have the index file locally (at '~/.keras/datasets/' +
path), it will be downloaded to this location.
MNIST database of handwritten digits
Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
Usage:
from keras.datasets import mnist
• Returns:
• 2 tuples:
• x_train, x_test: uint8 array of grayscale image data with shape (num_samples,
28, 28).
• y_train, y_test: uint8 array of digit labels (integers in range 0-9) with shape
(num_samples,).
• Arguments:
• path: if you do not have the index file locally (at '~/.keras/datasets/' +
path), it will be downloaded to this location.
Label Description
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
Usage:
from keras.datasets import fashion_mnist
• Returns:
• 2 tuples:
• x_train, x_test: uint8 array of grayscale image data with shape (num_samples,
28, 28).
• y_train, y_test: uint8 array of labels (integers in range 0-9) with shape
(num_samples,).
Usage:
from keras.datasets import boston_housing
• Arguments:
• path: path where to cache the dataset locally (relative to ~/.keras/datasets).
• seed: Random seed for shuffling the data before computing the test split.
• test_split: fraction of the data to reserve as test set.
• Returns: Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).
Applications
Keras Applications are deep learning models that are made available alongside pre-trained weights.
These models can be used for prediction, feature extraction, and fine-tuning.
Weights are downloaded automatically when instantiating a model. They are stored at
~/.keras/models/.
Available models
Models for image classification with weights trained on ImageNet:
• Xception
• VGG16
• VGG19
• ResNet, ResNetV2
• InceptionV3
• InceptionResNetV2
• MobileNet
• MobileNetV2
• DenseNet
• NASNet
All of these architectures are compatible with all the backends (TensorFlow, Theano, and CNTK), and
upon instantiation the models will be built according to the image data format set in your Keras
configuration file at ~/.keras/keras.json. For instance, if you have set
image_data_format=channels_last, then any model loaded from this repository will get
built according to the TensorFlow data format convention, "Height-Width-Depth".
Note that: - For Keras < 2.2.0, The Xception model is only available for TensorFlow, due to its
reliance on SeparableConvolution layers. - For Keras < 2.1.5, The MobileNet model is
only available for TensorFlow, due to its reliance on DepthwiseConvolution layers.
model = ResNet50(weights='imagenet')
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265',
u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]
features = model.predict(x)
base_model = VGG19(weights='imagenet')
model = Model(inputs=base_model.input,
outputs=base_model.get_layer('block4_pool').output)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
block4_pool_features = model.predict(x)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
print(i, layer.name)
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)
Xception
keras.applications.xception.Xception(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
Arguments
• include_top: whether to include the fully-connected layer at the top of the network.
• weights: one of None (random initialization) or 'imagenet' (pre-training on ImageNet).
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (299, 299, 3). It should have exactly 3 inputs channels, and
width and height should be no smaller than 71. E.g. (150, 150, 3) would be one valid
value.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras Model instance.
References
• Xception: Deep Learning with Depthwise Separable Convolutions
License
These weights are trained by ourselves and are released under the MIT license.
VGG16
keras.applications.vgg16.VGG16(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
Arguments
• include_top: whether to include the 3 fully-connected layers at the top of the network.
• weights: one of None (random initialization) or 'imagenet' (pre-training on ImageNet).
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3,
224, 224) (with 'channels_first' data format). It should have exactly 3 inputs
channels, and width and height should be no smaller than 32. E.g. (200, 200, 3) would be
one valid value.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras Model instance.
References
• Very Deep Convolutional Networks for Large-Scale Image Recognition: please cite this paper if
you use the VGG models in your work.
License
These weights are ported from the ones released by VGG at Oxford under the Creative Commons
Attribution License.
VGG19
keras.applications.vgg19.VGG19(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
Arguments
• include_top: whether to include the 3 fully-connected layers at the top of the network.
• weights: one of None (random initialization) or 'imagenet' (pre-training on ImageNet).
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3,
224, 224) (with 'channels_first' data format). It should have exactly 3 inputs
channels, and width and height should be no smaller than 32. E.g. (200, 200, 3) would be
one valid value.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras Model instance.
References
• Very Deep Convolutional Networks for Large-Scale Image Recognition
License
These weights are ported from the ones released by VGG at Oxford under the Creative Commons
Attribution License.
ResNet
keras.applications.resnet.ResNet50(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.resnet.ResNet101(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.resnet.ResNet152(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.resnet_v2.ResNet50V2(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.resnet_v2.ResNet101V2(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.resnet_v2.ResNet152V2(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
Returns
A Keras Model instance.
References
• ResNet: Deep Residual Learning for Image Recognition
• ResNetV2: Identity Mappings in Deep Residual Networks
License
These weights are ported from the following:
• ResNet: The original repository of Kaiming He under the MIT license.
• ResNetV2: Facebook under the BSD license.
InceptionV3
keras.applications.inception_v3.InceptionV3(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
Arguments
• include_top: whether to include the fully-connected layer at the top of the network.
• weights: one of None (random initialization) or 'imagenet' (pre-training on ImageNet).
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (299, 299, 3) (with 'channels_last' data format) or (3,
299, 299) (with 'channels_first' data format). It should have exactly 3 inputs
channels, and width and height should be no smaller than 75. E.g. (150, 150, 3) would be
one valid value.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras Model instance.
References
• Rethinking the Inception Architecture for Computer Vision
License
These weights are released under the Apache License.
InceptionResNetV2
keras.applications.inception_resnet_v2.InceptionResNetV2(include_top=True,
weights='imagenet', input_tensor=None, input_shape=None, pooling=None,
classes=1000)
Arguments
• include_top: whether to include the fully-connected layer at the top of the network.
• weights: one of None (random initialization) or 'imagenet' (pre-training on ImageNet).
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (299, 299, 3) (with 'channels_last' data format) or (3,
299, 299) (with 'channels_first' data format). It should have exactly 3 inputs
channels, and width and height should be no smaller than 75. E.g. (150, 150, 3) would be
one valid value.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras Model instance.
References
• Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
License
These weights are released under the Apache License.
MobileNet
keras.applications.mobilenet.MobileNet(input_shape=None, alpha=1.0,
depth_multiplier=1, dropout=1e-3, include_top=True, weights='imagenet',
input_tensor=None, pooling=None, classes=1000)
Arguments
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3,
224, 224) (with 'channels_first' data format). It should have exactly 3 inputs
channels, and width and height should be no smaller than 32. E.g. (200, 200, 3) would be
one valid value.
• alpha: controls the width of the network.
• If alpha < 1.0, proportionally decreases the number of filters in each layer.
• If alpha > 1.0, proportionally increases the number of filters in each layer.
• If alpha = 1, default number of filters from the paper are used at each layer.
• depth_multiplier: depth multiplier for depthwise convolution (also called the resolution
multiplier)
• dropout: dropout rate
• include_top: whether to include the fully-connected layer at the top of the network.
• weights: None (random initialization) or 'imagenet' (ImageNet weights)
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras Model instance.
References
• MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
License
These weights are released under the Apache License.
DenseNet
keras.applications.densenet.DenseNet121(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.densenet.DenseNet169(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.densenet.DenseNet201(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
Arguments
• blocks: numbers of building blocks for the four dense layers.
• include_top: whether to include the fully-connected layer at the top of the network.
• weights: one of None (random initialization), 'imagenet' (pre-training on ImageNet), or the path
to the weights file to be loaded.
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3,
224, 224) (with 'channels_first' data format). It should have exactly 3 inputs
channels, and width and height should be no smaller than 32. E.g. (200, 200, 3) would be
one valid value.
• pooling: optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• avg means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• max means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras model instance.
References
• Densely Connected Convolutional Networks (CVPR 2017 Best Paper Award)
License
These weights are released under the BSD 3-clause License.
NASNet
keras.applications.nasnet.NASNetLarge(input_shape=None, include_top=True,
weights='imagenet', input_tensor=None, pooling=None, classes=1000)
keras.applications.nasnet.NASNetMobile(input_shape=None, include_top=True,
weights='imagenet', input_tensor=None, pooling=None, classes=1000)
Neural Architecture Search Network (NASNet) models, with weights pre-trained on ImageNet.
The default input size for the NASNetLarge model is 331x331 and for the NASNetMobile model is
224x224.
Arguments
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3,
224, 224) (with 'channels_first' data format) for NASNetMobile or (331, 331,
3) (with 'channels_last' data format) or (3, 331, 331) (with
'channels_first' data format) for NASNetLarge. It should have exactly 3 inputs
channels, and width and height should be no smaller than 32. E.g. (200, 200, 3) would be
one valid value.
• include_top: whether to include the fully-connected layer at the top of the network.
• weights: None (random initialization) or 'imagenet' (ImageNet weights)
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras Model instance.
References
• Learning Transferable Architectures for Scalable Image Recognition
License
These weights are released under the Apache License.
MobileNetV2
keras.applications.mobilenet_v2.MobileNetV2(input_shape=None, alpha=1.0,
include_top=True, weights='imagenet', input_tensor=None, pooling=None,
classes=1000)
Arguments
• input_shape: optional shape tuple, to be specified if you would like to use a model with an input
img resolution that is not (224, 224, 3). It should have exactly 3 inputs channels (224, 224, 3).
You can also omit this option if you would like to infer input_shape from an input_tensor. If you
choose to include both input_tensor and input_shape then input_shape will be used if they
match, if the shapes do not match then we will throw an error. E.g. (160, 160, 3) would
be one valid value.
• alpha: controls the width of the network. This is known as the width multiplier in the
MobileNetV2 paper.
• If alpha < 1.0, proportionally decreases the number of filters in each layer.
• If alpha > 1.0, proportionally increases the number of filters in each layer.
• If alpha = 1, default number of filters from the paper are used at each layer.
• include_top: whether to include the fully-connected layer at the top of the network.
• weights: one of None (random initialization), 'imagenet' (pre-training on ImageNet), or the path
to the weights file to be loaded.
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras model instance.
Raises
ValueError: in case of invalid argument for weights, or invalid input shape, alpha, rows when
weights='imagenet'
References
• MobileNetV2: Inverted Residuals and Linear Bottlenecks
License
These weights are released under the Apache License.
Keras backends
What is a "backend"?
Keras is a model-level library, providing high-level building blocks for developing deep learning
models. It does not handle low-level operations such as tensor products, convolutions and so on itself.
Instead, it relies on a specialized, well optimized tensor manipulation library to do so, serving as the
"backend engine" of Keras. Rather than picking one single tensor library and making the
implementation of Keras tied to that library, Keras handles the problem in a modular way, and several
different backend engines can be plugged seamlessly into Keras.
At this time, Keras has three backend implementations available: the TensorFlow backend, the Theano
backend, and the CNTK backend.
• TensorFlow is an open-source symbolic tensor manipulation framework developed by Google.
• Theano is an open-source symbolic tensor manipulation framework developed by LISA Lab at
Université de Montréal.
• CNTK is an open-source toolkit for deep learning developed by Microsoft.
In the future, we are likely to add more backend options.
Simply change the field backend to "theano", "tensorflow", or "cntk", and Keras will use
the new configuration next time you run any Keras code.
You can also define the environment variable KERAS_BACKEND and this will override what is defined
in your config file :
KERAS_BACKEND=tensorflow python -c "from keras import backend"
Using TensorFlow backend.
In Keras it is possible to load more backends than "tensorflow", "theano", and "cntk". Keras
can use external backends as well, and this can be performed by changing the keras.json
configuration file, and the "backend" setting. Suppose you have a Python module called
my_module that you wanted to use as your external backend. The keras.json configuration file
would be changed as follows:
{
"image_data_format": "channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "my_package.my_module"
}
An external backend must be validated in order to be used, a valid backend must have the following
functions: placeholder, variable and function.
If an external backend is not valid due to missing a required entry, an error will be logged notifying
which entry/entries are missing.
keras.json details
The keras.json configuration file contains the following settings:
{
"image_data_format": "channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
# all-zeros variable:
var = K.zeros(shape=(3, 4, 5))
# all-ones:
var = K.ones(shape=(3, 4, 5))
Most tensor operations you will need can be done as you would in TensorFlow or Theano:
# Initializing Tensors with Random Numbers
b = K.random_uniform_variable(shape=(3, 4), low=0, high=1) # Uniform distribution
c = K.random_normal_variable(shape=(3, 4), mean=0, scale=1) # Gaussian distribution
d = K.random_normal_variable(shape=(3, 4), mean=0, scale=1)
# Tensor Arithmetic
a = b + c * K.abs(d)
c = K.dot(a, K.transpose(b))
a = K.sum(b, axis=1)
a = K.softmax(b)
a = K.concatenate([b, c], axis=-1)
# etc...
Backend functions
backend
keras.backend.backend()
symbolic
keras.backend.symbolic(func)
eager
keras.backend.eager(func)
get_uid
keras.backend.get_uid(prefix='')
manual_variable_initialization
keras.backend.manual_variable_initialization(value)
reset_uids
keras.backend.reset_uids()
floatx
keras.backend.floatx()
Returns the default float type, as a string. (e.g. 'float16', 'float32', 'float64').
Returns
String, the current default float type.
Example
>>> keras.backend.floatx()
'float32'
set_floatx
keras.backend.set_floatx(floatx)
cast_to_floatx
keras.backend.cast_to_floatx(x)
Example
>>> keras.backend.image_data_format()
'channels_first'
set_image_data_format
keras.backend.set_image_data_format(data_format)
Example
>>> from keras import backend as K
>>> K.image_data_format()
'channels_first'
>>> K.set_image_data_format('channels_last')
>>> K.image_data_format()
'channels_last'
learning_phase
keras.backend.learning_phase()
clear_session
keras.backend.clear_session()
is_sparse
keras.backend.is_sparse(tensor)
to_dense
keras.backend.to_dense(tensor)
Converts a sparse tensor into a dense tensor and returns it.
Arguments
• tensor: A tensor instance (potentially sparse).
Returns
A dense tensor.
Examples
>>> from keras import backend as K
>>> b = K.placeholder((2, 2), sparse=True)
>>> print(K.is_sparse(b))
True
>>> c = K.to_dense(b)
>>> print(K.is_sparse(c))
False
variable
keras.backend.variable(value, dtype=None, name=None, constraint=None)
constant
keras.backend.constant(value, dtype=None, shape=None, name=None)
is_keras_tensor
keras.backend.is_keras_tensor(x)
A "Keras tensor" is a tensor that was returned by a Keras layer, (Layer class) or by Input.
Arguments
• x: A candidate tensor.
Returns
A boolean: Whether the argument is a Keras tensor.
Raises
• ValueError: In case x is not a symbolic tensor.
Examples
>>> from keras import backend as K
>>> from keras.layers import Input, Dense
>>> np_var = numpy.array([1, 2])
>>> K.is_keras_tensor(np_var) # A numpy array is not a symbolic tensor.
ValueError
>>> k_var = tf.placeholder('float32', shape=(1,1))
>>> # A variable indirectly created outside of keras is not a Keras tensor.
>>> K.is_keras_tensor(k_var)
False
>>> keras_var = K.variable(np_var)
>>> # A variable created with the keras backend is not a Keras tensor.
>>> K.is_keras_tensor(keras_var)
False
>>> keras_placeholder = K.placeholder(shape=(2, 4, 5))
>>> # A placeholder is not a Keras tensor.
>>> K.is_keras_tensor(keras_placeholder)
False
>>> keras_input = Input([10])
>>> K.is_keras_tensor(keras_input) # An Input is a Keras tensor.
True
>>> keras_layer_output = Dense(10)(keras_input)
>>> # Any Keras layer output is a Keras tensor.
>>> K.is_keras_tensor(keras_layer_output)
True
is_tensor
keras.backend.is_tensor(x)
placeholder
keras.backend.placeholder(shape=None, ndim=None, dtype=None, sparse=False,
name=None)
Arguments
• x: A candidate placeholder.
Returns
Boolean.
shape
keras.backend.shape(x)
int_shape
keras.backend.int_shape(x)
Numpy implementation
def int_shape(x):
return x.shape
ndim
keras.backend.ndim(x)
Numpy implementation
def ndim(x):
return x.ndim
size
keras.backend.size(x, name=None)
dtype
keras.backend.dtype(x)
Examples
>>> from keras import backend as K
>>> K.dtype(K.placeholder(shape=(2,4,5)))
'float32'
>>> K.dtype(K.placeholder(shape=(2,4,5), dtype='float32'))
'float32'
>>> K.dtype(K.placeholder(shape=(2,4,5), dtype='float64'))
'float64'
# Keras variable
>>> kvar = K.variable(np.array([[1, 2], [3, 4]]))
>>> K.dtype(kvar)
'float32_ref'
>>> kvar = K.variable(np.array([[1, 2], [3, 4]]), dtype='float32')
>>> K.dtype(kvar)
'float32_ref'
Numpy implementation
def dtype(x):
return x.dtype.name
eval
keras.backend.eval(x)
Numpy implementation
def eval(x):
return x
zeros
keras.backend.zeros(shape, dtype=None, name=None)
Numpy implementation
def zeros(shape, dtype=floatx(), name=None):
return np.zeros(shape, dtype=dtype)
ones
keras.backend.ones(shape, dtype=None, name=None)
Numpy implementation
def ones(shape, dtype=floatx(), name=None):
return np.ones(shape, dtype=dtype)
eye
keras.backend.eye(size, dtype=None, name=None)
Numpy implementation
def eye(size, dtype=None, name=None):
if isinstance(size, (list, tuple)):
n, m = size
else:
n, m = size, size
return np.eye(n, m, dtype=dtype)
zeros_like
keras.backend.zeros_like(x, dtype=None, name=None)
ones_like
keras.backend.ones_like(x, dtype=None, name=None)
Numpy implementation
def ones_like(x, dtype=floatx(), name=None):
return np.ones_like(x, dtype=dtype)
identity
keras.backend.identity(x, name=None)
Numpy implementation
def random_uniform_variable(shape, low, high, dtype=None, name=None, seed=None):
return (high - low) * np.random.random(shape).astype(dtype) + low
random_normal_variable
keras.backend.random_normal_variable(shape, mean, scale, dtype=None, name=None,
seed=None)
Numpy implementation
def random_normal_variable(shape, mean, scale, dtype=None, name=None, seed=None):
return scale * np.random.randn(*shape).astype(dtype) + mean
count_params
keras.backend.count_params(x)
Example
>>> kvar = K.zeros((2,3))
>>> K.count_params(kvar)
6
>>> K.eval(kvar)
array([[ 0., 0., 0.],
[ 0., 0., 0.]], dtype=float32)
Numpy implementation
def count_params(x):
return x.size
cast
keras.backend.cast(x, dtype)
Returns
Keras tensor with dtype dtype.
Example
>>> from keras import backend as K
>>> input = K.placeholder((2, 3), dtype='float32')
>>> input
<tf.Tensor 'Placeholder_2:0' shape=(2, 3) dtype=float32>
# It doesn't work in-place as below.
>>> K.cast(input, dtype='float16')
<tf.Tensor 'Cast_1:0' shape=(2, 3) dtype=float16>
>>> input
<tf.Tensor 'Placeholder_2:0' shape=(2, 3) dtype=float32>
# you need to assign it.
>>> input = K.cast(input, dtype='float16')
>>> input
<tf.Tensor 'Cast_2:0' shape=(2, 3) dtype=float16>
update
keras.backend.update(x, new_x)
Arguments
• x: A Variable.
• new_x: A tensor of same shape as x.
Returns
The variable x updated.
update_add
keras.backend.update_add(x, increment)
Arguments
• x: A Variable.
• increment: A tensor of same shape as x.
Returns
The variable x updated.
update_sub
keras.backend.update_sub(x, decrement)
Arguments
• x: A Variable.
• decrement: A tensor of same shape as x.
Returns
The variable x updated.
moving_average_update
keras.backend.moving_average_update(x, value, momentum)
dot
keras.backend.dot(x, y)
Arguments
• x: Tensor or variable.
• y: Tensor or variable.
Returns
A tensor, dot product of x and y.
Examples
# dot product between tensors
>>> x = K.placeholder(shape=(2, 3))
>>> y = K.placeholder(shape=(3, 4))
>>> xy = K.dot(x, y)
>>> xy
<tf.Tensor 'MatMul_9:0' shape=(2, 4) dtype=float32>
Numpy implementation
def dot(x, y):
return np.dot(x, y)
batch_dot
keras.backend.batch_dot(x, y, axes=None)
Examples
Assume x = [[1, 2], [3, 4]] and y = [[5, 6], [7, 8]] batch_dot(x, y,
axes=1) = [[17], [53]] which is the main diagonal of x.dot(y.T), although we never
have to calculate the off-diagonal elements.
Pseudocode:
inner_products = []
for xi, yi in zip(x, y):
inner_products.append(xi.dot(yi))
result = stack(inner_products)
Shape inference: Let x's shape be (100, 20) and y's shape be (100, 30, 20). If axes is (1, 2),
to find the output shape of resultant tensor, loop through each dimension in x's shape and y's shape:
Numpy implementation
Show the Numpy implementation wzxhzdk:102
transpose
keras.backend.transpose(x)
Numpy implementation
def transpose(x):
return np.transpose(x)
gather
keras.backend.gather(reference, indices)
Arguments
• reference: A tensor.
• indices: An integer tensor of indices.
Returns
A tensor of same type as reference.
Numpy implementation
def gather(reference, indices):
return reference[indices]
max
keras.backend.max(x, axis=None, keepdims=False)
Maximum value in a tensor.
Arguments
• x: A tensor or variable.
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to find maximum values. If
None (default), finds the maximum over all dimensions.
• keepdims: A boolean, whether to keep the dimensions or not. If keepdims is False, the rank
of the tensor is reduced by 1. If keepdims is True, the reduced dimension is retained with
length 1.
Returns
A tensor with maximum values of x.
Numpy implementation
def max(x, axis=None, keepdims=False):
if isinstance(axis, list):
axis = tuple(axis)
return np.max(x, axis=axis, keepdims=keepdims)
min
keras.backend.min(x, axis=None, keepdims=False)
Numpy implementation
def min(x, axis=None, keepdims=False):
if isinstance(axis, list):
axis = tuple(axis)
return np.min(x, axis=axis, keepdims=keepdims)
sum
keras.backend.sum(x, axis=None, keepdims=False)
Numpy implementation
def sum(x, axis=None, keepdims=False):
if isinstance(axis, list):
axis = tuple(axis)
return np.sum(x, axis=axis, keepdims=keepdims)
prod
keras.backend.prod(x, axis=None, keepdims=False)
Numpy implementation
def prod(x, axis=None, keepdims=False):
if isinstance(axis, list):
axis = tuple(axis)
return np.prod(x, axis=axis, keepdims=keepdims)
cumsum
keras.backend.cumsum(x, axis=0)
cumprod
keras.backend.cumprod(x, axis=0)
var
keras.backend.var(x, axis=None, keepdims=False)
std
keras.backend.std(x, axis=None, keepdims=False)
mean
keras.backend.mean(x, axis=None, keepdims=False)
any
keras.backend.any(x, axis=None, keepdims=False)
all
keras.backend.all(x, axis=None, keepdims=False)
argmax
keras.backend.argmax(x, axis=-1)
argmin
keras.backend.argmin(x, axis=-1)
Element-wise square.
Arguments
• x: Tensor or variable.
Returns
A tensor.
abs
keras.backend.abs(x)
sqrt
keras.backend.sqrt(x)
exp
keras.backend.exp(x)
Element-wise exponential.
Arguments
• x: Tensor or variable.
Returns
A tensor.
log
keras.backend.log(x)
Element-wise log.
Arguments
• x: Tensor or variable.
Returns
A tensor.
logsumexp
keras.backend.logsumexp(x, axis=None, keepdims=False)
sign
keras.backend.sign(x)
Element-wise sign.
Arguments
• x: Tensor or variable.
Returns
A tensor.
pow
keras.backend.pow(x, a)
Element-wise exponentiation.
Arguments
• x: Tensor or variable.
• a: Python integer.
Returns
A tensor. Numpy implementation
def pow(x, a=1.):
return np.power(x, a)
clip
keras.backend.clip(x, min_value, max_value)
equal
keras.backend.equal(x, y)
not_equal
keras.backend.not_equal(x, y)
greater
keras.backend.greater(x, y)
greater_equal
keras.backend.greater_equal(x, y)
less_equal
keras.backend.less_equal(x, y)
maximum
keras.backend.maximum(x, y)
minimum
keras.backend.minimum(x, y)
sin
keras.backend.sin(x)
cos
keras.backend.cos(x)
normalize_batch_in_training
keras.backend.normalize_batch_in_training(x, gamma, beta, reduction_axes,
epsilon=0.001)
Computes mean and std for batch then apply batch_normalization on batch.
Arguments
• x: Input tensor or variable.
• gamma: Tensor by which to scale the input.
• beta: Tensor with which to center the input.
• reduction_axes: iterable of integers, axes over which to normalize.
• epsilon: Fuzz factor.
Returns
A tuple length of 3, (normalized_tensor, mean, variance).
batch_normalization
keras.backend.batch_normalization(x, mean, var, beta, gamma, axis=-1,
epsilon=0.001)
Arguments
• x: Input tensor or variable.
• mean: Mean of batch.
• var: Variance of batch.
• beta: Tensor with which to center the input.
• gamma: Tensor by which to scale the input.
• axis: Integer, the axis that should be normalized. (typically the features axis).
• epsilon: Fuzz factor.
Returns
A tensor.
Numpy implementation
def batch_normalization(x, mean, var, beta, gamma, axis=-1, epsilon=0.001):
return ((x - mean) / sqrt(var + epsilon)) * gamma + beta
concatenate
keras.backend.concatenate(tensors, axis=-1)
reshape
keras.backend.reshape(x, shape)
permute_dimensions
keras.backend.permute_dimensions(x, pattern)
resize_images
keras.backend.resize_images(x, height_factor, width_factor, data_format,
interpolation='nearest')
Returns
A tensor.
Raises
• ValueError: if data_format is
resize_volumes
keras.backend.resize_volumes(x, depth_factor, height_factor, width_factor,
data_format)
Returns
A tensor.
Raises
• ValueError: if data_format is
repeat_elements
keras.backend.repeat_elements(x, rep, axis)
If x has shape (s1, s2, s3) and axis is 1, the output will have shape (s1, s2 * rep, s3).
Arguments
• x: Tensor or variable.
• rep: Python integer, number of times to repeat.
• axis: Axis along which to repeat.
Returns
A tensor.
repeat
keras.backend.repeat(x, n)
Repeats a 2D tensor.
if x has shape (samples, dim) and n is 2, the output will have shape (samples, 2, dim).
Arguments
• x: Tensor or variable.
• n: Python integer, number of times to repeat.
Returns
A tensor.
arange
keras.backend.arange(start, stop=None, step=1, dtype='int32')
Arguments
• start: Start value.
• stop: Stop value.
• step: Difference between two successive values.
• dtype: Integer dtype to use.
Returns
An integer tensor.
tile
keras.backend.tile(x, n)
Arguments
• x: A tensor or variable
• n: A list of integer. The length must be the same as the number of dimensions in x.
Returns
A tiled tensor.
Example
>>> from keras import backend as K
>>> kvar = K.variable(np.random.random((2, 3)))
>>> kvar_tile = K.tile(K.eye(2), (2, 3))
>>> K.eval(kvar_tile)
array([[1., 0., 1., 0., 1., 0.],
[0., 1., 0., 1., 0., 1.],
[1., 0., 1., 0., 1., 0.],
[0., 1., 0., 1., 0., 1.]], dtype=float32)
Numpy implementation
def tile(x, n):
return np.tile(x, n)
flatten
keras.backend.flatten(x)
Flatten a tensor.
Arguments
• x: A tensor or variable.
Returns
A tensor, reshaped into 1-D
batch_flatten
keras.backend.batch_flatten(x)
expand_dims
keras.backend.expand_dims(x, axis=-1)
squeeze
keras.backend.squeeze(x, axis)
temporal_padding
keras.backend.temporal_padding(x, padding=(1, 1))
spatial_2d_padding
keras.backend.spatial_2d_padding(x, padding=((1, 1), (1, 1)), data_format=None)
Returns
A padded 4D tensor.
Raises
• ValueError: if data_format is
spatial_3d_padding
keras.backend.spatial_3d_padding(x, padding=((1, 1), (1, 1), (1, 1)),
data_format=None)
Pads 5D tensor with zeros along the depth, height, width dimensions.
Pads these dimensions with respectively "padding[0]", "padding[1]" and "padding[2]" zeros left and
right.
For 'channels_last' data_format, the 2nd, 3rd and 4th dimension will be padded. For 'channels_first'
data_format, the 3rd, 4th and 5th dimension will be padded.
Arguments
• x: Tensor or variable.
• padding: Tuple of 3 tuples, padding pattern.
• data_format: string, "channels_last" or "channels_first".
Returns
A padded 5D tensor.
Raises
• ValueError: if data_format is
stack
keras.backend.stack(x, axis=0)
Arguments
• x: List of tensors.
• axis: Axis along which to perform stacking.
Returns
A tensor.
Numpy implementation
def stack(x, axis=0):
return np.stack(x, axis=axis)
one_hot
keras.backend.one_hot(indices, num_classes)
reverse
keras.backend.reverse(x, axes)
slice
keras.backend.slice(x, start, size)
get_value
keras.backend.get_value(x)
batch_get_value
keras.backend.batch_get_value(ops)
set_value
keras.backend.set_value(x, value)
print_tensor
keras.backend.print_tensor(x, message='')
Note that print_tensor returns a new tensor identical to x which should be used in the following
code. Otherwise the print operation is not taken into account during evaluation.
Example
>>> x = K.print_tensor(x, message="x is: ")
Arguments
• x: Tensor to print.
• message: Message to print jointly with the tensor.
Returns
The same tensor x, unchanged.
function
keras.backend.function(inputs, outputs, updates=None)
gradients
keras.backend.gradients(loss, variables)
Arguments
• loss: Scalar tensor to minimize.
• variables: List of variables.
Returns
A gradients tensor.
stop_gradient
keras.backend.stop_gradient(variables)
Returns variables but with zero gradient w.r.t. every other variable.
Arguments
• variables: tensor or list of tensors to consider constant with respect to any other variable.
Returns
A single tensor or a list of tensors (depending on the passed argument) that has constant gradient with
respect to any other variable.
rnn
keras.backend.rnn(step_function, inputs, initial_states, go_backwards=False,
mask=None, constants=None, unroll=False, input_length=None)
Raises
• ValueError: If input dimension is less than 3.
• ValueError: If unroll is True but input timestep is not a fixed number.
• ValueError: If mask is provided (not None) but states is not provided (len(states) == 0).
Numpy implementation
Show the Numpy implementation wzxhzdk:206
switch
keras.backend.switch(condition, then_expression, else_expression)
Numpy implementation
def switch(condition, then_expression, else_expression):
cond_float = condition.astype(floatx())
while cond_float.ndim < then_expression.ndim:
cond_float = cond_float[..., np.newaxis]
return cond_float * then_expression + (1 - cond_float) * else_expression
in_train_phase
keras.backend.in_train_phase(x, alt, training=None)
Selects x in train phase, and alt otherwise.
Arguments
• x: What to return in train phase (tensor or callable that returns a tensor).
• alt: What to return otherwise (tensor or callable that returns a tensor).
• training: Optional scalar tensor (or Python boolean, or Python integer) specifying the learning
phase.
Returns
Either x or alt based on the training flag. the training flag defaults to
K.learning_phase().
in_test_phase
keras.backend.in_test_phase(x, alt, training=None)
Arguments
• x: What to return in test phase (tensor or callable that returns a tensor).
• alt: What to return otherwise (tensor or callable that returns a tensor).
• training: Optional scalar tensor (or Python boolean, or Python integer) specifying the learning
phase.
Returns
Either x or alt based on K.learning_phase.
relu
keras.backend.relu(x, alpha=0.0, max_value=None, threshold=0.0)
Otherwise, it follows: f(x) = max_value for x >= max_value, f(x) = x for threshold
<= x < max_value, f(x) = alpha * (x - threshold) otherwise.
Arguments
• x: A tensor or variable.
• alpha: A scalar, slope of negative section (default=0.).
• max_value: float. Saturation threshold.
• threshold: float. Threshold value for thresholded activation.
Returns
A tensor.
Numpy implementation
def relu(x, alpha=0., max_value=None, threshold=0.):
if max_value is None:
max_value = np.inf
above_threshold = x * (x >= threshold)
above_threshold = np.clip(above_threshold, 0.0, max_value)
below_threshold = alpha * (x - threshold) * (x < threshold)
return below_threshold + above_threshold
elu
keras.backend.elu(x, alpha=1.0)
softmax
keras.backend.softmax(x, axis=-1)
Softmax of a tensor.
Arguments
• x: A tensor or variable.
• axis: The dimension softmax would be performed on. The default is -1 which indicates the last
dimension.
Returns
A tensor.
Numpy implementation
def softmax(x, axis=-1):
y = np.exp(x - np.max(x, axis, keepdims=True))
return y / np.sum(y, axis, keepdims=True)
softplus
keras.backend.softplus(x)
Softplus of a tensor.
Arguments
• x: A tensor or variable.
Returns
A tensor.
Numpy implementation
def softplus(x):
return np.log(1. + np.exp(x))
softsign
keras.backend.softsign(x)
Softsign of a tensor.
Arguments
• x: A tensor or variable.
Returns
A tensor.
Numpy implementation
def softsign(x):
return x / (1 + np.abs(x))
categorical_crossentropy
keras.backend.categorical_crossentropy(target, output, from_logits=False, axis=-1)
Returns
Output tensor.
Raises
• ValueError: if axis is neither -1 nor one of the axes of output.
sparse_categorical_crossentropy
keras.backend.sparse_categorical_crossentropy(target, output, from_logits=False,
axis=-1)
Returns
Output tensor.
Raises
• ValueError: if axis is neither -1 nor one of the axes of output.
binary_crossentropy
keras.backend.binary_crossentropy(target, output, from_logits=False)
Returns
A tensor.
sigmoid
keras.backend.sigmoid(x)
Element-wise sigmoid.
Arguments
• x: A tensor or variable.
Returns
A tensor.
Numpy implementation
def sigmoid(x):
return 1. / (1. + np.exp(-x))
hard_sigmoid
keras.backend.hard_sigmoid(x)
Arguments
• x: A tensor or variable.
Returns
A tensor.
Numpy implementation
def hard_sigmoid(x):
y = 0.2 * x + 0.5
return np.clip(y, 0, 1)
tanh
keras.backend.tanh(x)
Element-wise tanh.
Arguments
• x: A tensor or variable.
Returns
A tensor.
Numpy implementation
def tanh(x):
return np.tanh(x)
dropout
keras.backend.dropout(x, level, noise_shape=None, seed=None)
Arguments
• x: tensor
• level: fraction of the entries in the tensor that will be set to 0.
• noise_shape: shape for randomly generated keep/drop flags, must be broadcastable to the shape
of x
• seed: random seed to ensure determinism.
Returns
A tensor. Numpy implementation
Show the Numpy implementation wzxhzdk:231
l2_normalize
keras.backend.l2_normalize(x, axis=None)
in_top_k
keras.backend.in_top_k(predictions, targets, k)
Arguments
• predictions: A tensor of shape (batch_size, classes) and type float32.
• targets: A 1D tensor of length batch_size and type int32 or int64.
• k: An int, number of top elements to consider.
Returns
A 1D tensor of length batch_size and type bool. output[i] is True if predictions[i,
targets[i]] is within top-k values of predictions[i].
conv1d
keras.backend.conv1d(x, kernel, strides=1, padding='valid', data_format=None,
dilation_rate=1)
1D convolution.
Arguments
• x: Tensor or variable.
• kernel: kernel tensor.
• strides: stride integer.
• padding: string, "same", "causal" or "valid".
• data_format: string, "channels_last" or "channels_first".
• dilation_rate: integer dilate rate.
Returns
A tensor, result of 1D convolution.
Raises
• ValueError: If data_format is neither "channels_last" nor "channels_first".
conv2d
keras.backend.conv2d(x, kernel, strides=(1, 1), padding='valid', data_format=None,
dilation_rate=(1, 1))
2D convolution.
Arguments
• x: Tensor or variable.
• kernel: kernel tensor.
• strides: strides tuple.
• padding: string, "same" or "valid".
• data_format: string, "channels_last" or "channels_first". Whether to use Theano
or TensorFlow/CNTK data format for inputs/kernels/outputs.
• dilation_rate: tuple of 2 integers.
Returns
A tensor, result of 2D convolution.
Raises
• ValueError: If data_format is neither "channels_last" nor "channels_first".
conv2d_transpose
keras.backend.conv2d_transpose(x, kernel, output_shape, strides=(1, 1),
padding='valid', data_format=None, dilation_rate=(1, 1))
separable_conv1d
keras.backend.separable_conv1d(x, depthwise_kernel, pointwise_kernel, strides=1,
padding='valid', data_format=None, dilation_rate=1)
separable_conv2d
keras.backend.separable_conv2d(x, depthwise_kernel, pointwise_kernel, strides=(1,
1), padding='valid', data_format=None, dilation_rate=(1, 1))
depthwise_conv2d
keras.backend.depthwise_conv2d(x, depthwise_kernel, strides=(1, 1),
padding='valid', data_format=None, dilation_rate=(1, 1))
conv3d
keras.backend.conv3d(x, kernel, strides=(1, 1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1, 1))
3D convolution.
Arguments
• x: Tensor or variable.
• kernel: kernel tensor.
• strides: strides tuple.
• padding: string, "same" or "valid".
• data_format: string, "channels_last" or "channels_first". Whether to use Theano
or TensorFlow/CNTK data format for inputs/kernels/outputs.
• dilation_rate: tuple of 3 integers.
Returns
A tensor, result of 3D convolution.
Raises
• ValueError: If data_format is neither "channels_last" nor "channels_first".
conv3d_transpose
keras.backend.conv3d_transpose(x, kernel, output_shape, strides=(1, 1, 1),
padding='valid', data_format=None)
2D Pooling.
Arguments
• x: Tensor or variable.
• pool_size: tuple of 2 integers.
• strides: tuple of 2 integers.
• padding: string, "same" or "valid".
• data_format: string, "channels_last" or "channels_first".
• pool_mode: string, "max" or "avg".
Returns
A tensor, result of 2D pooling.
Raises
• ValueError: if data_format is
pool3d
keras.backend.pool3d(x, pool_size, strides=(1, 1, 1), padding='valid',
data_format=None, pool_mode='max')
3D Pooling.
Arguments
• x: Tensor or variable.
• pool_size: tuple of 3 integers.
• strides: tuple of 3 integers.
• padding: string, "same" or "valid".
• data_format: string, "channels_last" or "channels_first".
• pool_mode: string, "max" or "avg".
Returns
A tensor, result of 3D pooling.
Raises
• ValueError: if data_format is
local_conv1d
keras.backend.local_conv1d(inputs, kernel, kernel_size, strides, data_format=None)
local_conv2d
keras.backend.local_conv2d(inputs, kernel, kernel_size, strides, output_shape,
data_format=None)
bias_add
keras.backend.bias_add(x, bias, data_format=None)
Returns
Output tensor.
Raises
ValueError: In one of the two cases below: 1. invalid data_format argument. 2. invalid bias shape.
the bias should be either a vector or a tensor with ndim(x) - 1 dimension Numpy implementation
Show the Numpy implementation wzxhzdk:248
random_normal
keras.backend.random_normal(shape, mean=0.0, stddev=1.0, dtype=None, seed=None)
random_uniform
keras.backend.random_uniform(shape, minval=0.0, maxval=1.0, dtype=None, seed=None)
random_binomial
keras.backend.random_binomial(shape, p=0.0, dtype=None, seed=None)
truncated_normal
keras.backend.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=None, seed=None)
ctc_label_dense_to_sparse
keras.backend.ctc_label_dense_to_sparse(labels, label_lengths)
ctc_batch_cost
keras.backend.ctc_batch_cost(y_true, y_pred, input_length, label_length)
Returns
Tensor with shape (samples,1) containing the CTC loss of each element.
ctc_decode
keras.backend.ctc_decode(y_pred, input_length, greedy=True, beam_width=100,
top_paths=1, merge_repeated=False)
Returns
• Tuple: List: if greedy is True, returns a list of one element that contains the decoded
sequence. If False, returns the top_paths most probable decoded sequences. Important:
blank labels are returned as -1. Tensor (top_paths, ) that contains the log probability of
each decoded sequence.
control_dependencies
keras.backend.control_dependencies(control_inputs)
Map the function fn over the elements elems and return the outputs.
Arguments
• fn: Callable that will be called upon each element in elems
• elems: tensor
• name: A string name for the map node in the graph
• dtype: Output data type.
Returns
Tensor with dtype dtype.
foldl
keras.backend.foldl(fn, elems, initializer=None, name=None)
foldr
keras.backend.foldr(fn, elems, initializer=None, name=None)
Usage of initializers
Initializations define the way to set the initial random weights of Keras layers.
The keyword arguments used for passing initializers to layers will depend on the layer. Usually it is
simply kernel_initializer and bias_initializer:
model.add(Dense(64,
kernel_initializer='random_uniform',
bias_initializer='zeros'))
Available initializers
The following built-in initializers are available as part of the keras.initializers module:
Initializer
keras.initializers.Initializer()
Zeros
keras.initializers.Zeros()
Ones
keras.initializers.Ones()
Constant
keras.initializers.Constant(value=0)
RandomNormal
keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None)
RandomUniform
keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None)
TruncatedNormal
keras.initializers.TruncatedNormal(mean=0.0, stddev=0.05, seed=None)
Arguments
• scale: Scaling factor (positive float).
• mode: One of "fan_in", "fan_out", "fan_avg".
• distribution: Random distribution to use. One of "normal", "uniform".
• seed: A Python integer. Used to seed the random generator.
Raises
• ValueError: In case of an invalid value for the "scale", mode" or "distribution" arguments.
Orthogonal
keras.initializers.Orthogonal(gain=1.0, seed=None)
Identity
keras.initializers.Identity(gain=1.0)
Initializer that generates the identity matrix.
Only use for 2D matrices. If the desired matrix is not square, it gets padded with zeros for the
additional rows/columns.
Arguments
• gain: Multiplicative factor to apply to the identity matrix.
lecun_uniform
keras.initializers.lecun_uniform(seed=None)
Arguments
• seed: A Python integer. Used to seed the random generator.
Returns
An initializer.
References
• Efficient BackProp
glorot_normal
keras.initializers.glorot_normal(seed=None)
Arguments
• seed: A Python integer. Used to seed the random generator.
Returns
An initializer.
References
• Understanding the difficulty of training deep feedforward neural networks
glorot_uniform
keras.initializers.glorot_uniform(seed=None)
Arguments
• seed: A Python integer. Used to seed the random generator.
Returns
An initializer.
References
• Understanding the difficulty of training deep feedforward neural networks
he_normal
keras.initializers.he_normal(seed=None)
He normal initializer.
It draws samples from a truncated normal distribution centered on 0 with stddev = sqrt(2 /
fan_in) where fan_in is the number of input units in the weight tensor.
Arguments
• seed: A Python integer. Used to seed the random generator.
Returns
An initializer.
References
• Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet
Classification
lecun_normal
keras.initializers.lecun_normal(seed=None)
Arguments
• seed: A Python integer. Used to seed the random generator.
Returns
An initializer.
References
• Self-Normalizing Neural Networks
• Efficient Backprop
he_uniform
keras.initializers.he_uniform(seed=None)
Arguments
• seed: A Python integer. Used to seed the random generator.
Returns
An initializer.
References
• Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet
Classification
An initializer may be passed as a string (must match one of the available initializers above), or as a
callable:
from keras import initializers
model.add(Dense(64, kernel_initializer=initializers.random_normal(stddev=0.01)))
model.add(Dense(64, kernel_initializer=my_init))
Usage of regularizers
Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These
penalties are incorporated in the loss function that the network optimizes.
The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers
Dense, Conv1D, Conv2D and Conv3D have a unified API.
Example
from keras import regularizers
model.add(Dense(64, input_dim=64,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Available penalties
keras.regularizers.l1(0.)
keras.regularizers.l2(0.)
keras.regularizers.l1_l2(l1=0.01, l2=0.01)
def l1_reg(weight_matrix):
return 0.01 * K.sum(K.abs(weight_matrix))
model.add(Dense(64, input_dim=64,
kernel_regularizer=l1_reg))
Alternatively, you can write your regularizers in an object-oriented way; see the keras/regularizers.py
module for examples.
Usage of constraints
Functions from the constraints module allow setting constraints (eg. non-negativity) on network
parameters during optimization.
The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers
Dense, Conv1D, Conv2D and Conv3D have a unified API.
Available constraints
MaxNorm
keras.constraints.MaxNorm(max_value=2, axis=0)
References
• Dropout: A Simple Way to Prevent Neural Networks from Overfitting
NonNeg
keras.constraints.NonNeg()
UnitNorm
keras.constraints.UnitNorm(axis=0)
Constrains the weights incident to each hidden unit to have unit norm.
Arguments
• axis: integer, axis along which to calculate weight norms. For instance, in a Dense layer the
weight matrix has shape (input_dim, output_dim), set axis to 0 to constrain each
weight vector of length (input_dim,). In a Conv2D layer with
data_format="channels_last", the weight tensor has shape (rows, cols,
input_depth, output_depth), set axis to [0, 1, 2] to constrain the weights of
each filter tensor of size (rows, cols, input_depth).
MinMaxNorm
keras.constraints.MinMaxNorm(min_value=0.0, max_value=1.0, rate=1.0, axis=0)
Model visualization
Keras provides utility functions to plot a Keras model (using graphviz).
• show_shapes (defaults to False) controls whether output shapes are shown in the graph.
• show_layer_names (defaults to True) controls whether layer names are shown in the graph.
• expand_nested (defaults to False) controls whether to expand nested models into clusters in
the graph.
• dpi (defaults to 96) controls image dpi.
You can also directly obtain the pydot.Graph object and render it yourself, for example to show it in
an ipython notebook :
from IPython.display import SVG
from keras.utils import model_to_dot
SVG(model_to_dot(model).create(prog='dot', format='svg'))
Arguments
• build_fn: callable function or class instance
• sk_params: model parameters & fitting parameters
build_fn should construct, compile and return a Keras model, which will then be used to fit/predict.
One of the following three values could be passed to build_fn:
1. A function
2. An instance of a class that implements the __call__ method
3. None. This means you implement a class that inherits from either KerasClassifier or
KerasRegressor. The __call__ method of the present class will then be treated as the
default build_fn.
sk_params takes both model parameters and fitting parameters. Legal model parameters are the
arguments of build_fn. Note that like all other estimators in scikit-learn, build_fn should provide
default values for its arguments, so that you could create the estimator without passing any values to
sk_params.
sk_params could also accept parameters for calling fit, predict, predict_proba, and
score methods (e.g., epochs, batch_size). fitting (predicting) parameters are selected in the
following order:
1. Values passed to the dictionary arguments of fit, predict, predict_proba, and score
methods
2. Values passed to sk_params
3. The default values of the keras.models.Sequential fit, predict,
predict_proba and score methods
When using scikit-learn's grid_search API, legal tunable parameters are those you could pass to
sk_params, including fitting parameters. In other words, you could use grid_search to search for
the best batch_size or epochs as well as the model parameters.
CustomObjectScope
keras.utils.CustomObjectScope()
Code within a with statement will be able to access custom objects by name. Changes to global
custom objects persist within the enclosing with statement. At end of the with statement, global
custom objects are reverted to state at beginning of the with statement.
Example
Consider a custom object MyObject (e.g. a class):
with CustomObjectScope({'MyObject':MyObject}):
layer = Dense(..., kernel_regularizer='MyObject')
# save, load, etc. will recognize custom object by name
HDF5Matrix
keras.utils.HDF5Matrix(datapath, dataset, start=0, end=None, normalizer=None)
Optionally, a normalizer function (or lambda) can be given. This will be called on every slice of data
retrieved.
Arguments
• datapath: string, path to a HDF5 file
• dataset: string, name of the HDF5 dataset in the file specified in datapath
• start: int, start of desired slice of the specified dataset
• end: int, end of desired slice of the specified dataset
• normalizer: function to be called on data when retrieved
Returns
An array-like HDF5 dataset.
Sequence
keras.utils.Sequence()
Notes
Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only
train once on each sample per epoch which is not the case with generators.
Examples
from skimage.io import imread
from skimage.transform import resize
import numpy as np
class CIFAR10Sequence(Sequence):
def __len__(self):
return int(np.ceil(len(self.x) / float(self.batch_size)))
return np.array([
resize(imread(file_name), (200, 200))
for file_name in batch_x]), np.array(batch_y)
to_categorical
keras.utils.to_categorical(y, num_classes=None, dtype='float32')
Returns
A binary matrix representation of the input. The classes axis is placed last.
Example
# Consider an array of 5 labels out of a set of 3 classes {0, 1, 2}:
> labels
array([0, 2, 1, 2, 0])
# `to_categorical` converts this into a matrix with as many
# columns as there are classes. The number of rows
# stays the same.
> to_categorical(labels)
array([[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.]], dtype=float32)
normalize
keras.utils.normalize(x, axis=-1, order=2)
get_file
keras.utils.get_file(fname, origin, untar=False, md5_hash=None, file_hash=None,
cache_subdir='datasets', hash_algorithm='auto', extract=False,
archive_format='auto', cache_dir=None)
Files in tar, tar.gz, tar.bz, and zip formats can also be extracted. Passing a hash will verify the file after
download. The command line programs shasum and sha256sum can compute the hash.
Arguments
• fname: Name of the file. If an absolute path /path/to/file.txt is specified the file will
be saved at that location.
• origin: Original URL of the file.
• untar: Deprecated in favor of 'extract'. boolean, whether the file should be decompressed
• md5_hash: Deprecated in favor of 'file_hash'. md5 hash of the file for verification
• file_hash: The expected hash string of the file after download. The sha256 and md5 hash
algorithms are both supported.
• cache_subdir: Subdirectory under the Keras cache dir where the file is saved. If an absolute
path /path/to/folder is specified the file will be saved at that location.
• hash_algorithm: Select the hash algorithm to verify the file. options are 'md5', 'sha256', and
'auto'. The default 'auto' detects the hash algorithm in use.
• extract: True tries extracting the file as an Archive, like tar or zip.
• archive_format: Archive format to try for extracting the file. Options are 'auto', 'tar', 'zip', and
None. 'tar' includes tar, tar.gz, and tar.bz files. The default 'auto' is ['tar', 'zip']. None or an empty
list will return no matches found.
• cache_dir: Location to store cached files, when None it defaults to the Keras Directory.
Returns
Path to the downloaded file
print_summary
keras.utils.print_summary(model, line_length=None, positions=None, print_fn=None)
plot_model
keras.utils.plot_model(model, to_file='model.png', show_shapes=False,
show_layer_names=True, rankdir='TB', expand_nested=False, dpi=96)
multi_gpu_model
keras.utils.multi_gpu_model(model, gpus=None, cpu_merge=True, cpu_relocation=False)
num_samples = 1000
height = 224
width = 224
num_classes = 1000
# Save model via the template model (which shares the same weights):
model.save('my_model.h5')
try:
parallel_model = multi_gpu_model(model, cpu_relocation=True)
print("Training using multiple GPUs..")
except ValueError:
parallel_model = model
print("Training using single GPU or CPU..")
parallel_model.compile(..)
..
Example 3 - Training models with weights merge on GPU (recommended for NV-link)
..
# Not needed to change the device scope for model definition:
model = Xception(weights=None, ..)
try:
parallel_model = multi_gpu_model(model, cpu_merge=False)
print("Training using multiple GPUs..")
except:
parallel_model = model
print("Training using single GPU or CPU..")
parallel_model.compile(..)
..
On model saving
To save the multi-gpu model, use .save(fname) or .save_weights(fname) with the template
model (the argument you passed to multi_gpu_model), rather than the model returned by
multi_gpu_model.
On Github Issues and Pull Requests
Found a bug? Have a new feature to suggest? Want to contribute changes to the codebase? Make sure
to read this first.
Bug reporting
Your code doesn't work, and you have determined that the issue lies with Keras? Follow these steps to
report a bug.
1. Your bug may already be fixed. Make sure to update to the current Keras master branch, as well
as the latest Theano/TensorFlow/CNTK master branch. To easily update Theano: pip
install git+git://github.com/Theano/Theano.git --upgrade
2. Search for similar issues. Make sure to delete is:open on the issue search to find solved
tickets as well. It's possible somebody has encountered this bug already. Also remember to
check out Keras' FAQ. Still having a problem? Open an issue on Github to let us know.
3. Make sure you provide us with useful information about your configuration: what OS are you
using? What Keras backend are you using? Are you running on GPU? If so, what is your
version of Cuda, of cuDNN? What is your GPU?
4. Provide us with a script to reproduce the issue. This script should be runnable as-is and should
not require external data download (use randomly generated data if you need to run a model on
some test data). We recommend that you use Github Gists to post your code. Any issue that
cannot be reproduced is likely to be closed.
5. If possible, take a stab at fixing the bug yourself --if you can!
The more information you provide, the easier it is for us to validate that there is a bug and the faster
we'll be able to take action. If you want your issue to be resolved quickly, following the steps above is
crucial.
Requesting a Feature
You can also use Tensorflow Github issues to request features you would like to see in Keras, or
changes in the Keras API.
1. Provide a clear and detailed explanation of the feature you want and why it's important to add.
Keep in mind that we want features that will be useful to the majority of our users and not just a
small subset. If you're just targeting a minority of users, consider writing an add-on library for
Keras. It is crucial for Keras to avoid bloating the API and codebase.
2. Provide code snippets demonstrating the API you have in mind and illustrating the use cases of
your feature. Of course, you don't need to write any real code at this point!
3. After discussing the feature you may choose to attempt a Pull Request on tf.keras. If you're at
all able, start writing some code. We always have more work to do than time to do it. If you can
write some code then that will speed the process along.
Pull Requests
Where should I submit my pull request?
Note:
We are no longer adding new features to multi-backend Keras (we only fix bugs), as we are refocusing
development efforts on tf.keras. If you are still interested in submitting a feature pull request, please
direct it to tf.keras in the TensorFlow repository instead.
1. Keras improvements and bugfixes go to the Keras master branch.
2. Experimental new features such as layers and datasets go to keras-contrib. Unless it is a new
feature listed in Requests for Contributions, in which case it belongs in core Keras. If you think
your feature belongs in core Keras, you can submit a design doc to explain your feature and
argue for it (see explanations below).
Please note that PRs that are primarily about code style (as opposed to fixing bugs, improving docs, or
adding new functionality) will likely be rejected.
Here's a quick guide to submitting your improvements:
1. If your PR introduces a change in functionality, make sure you start by writing a design doc and
sending it to the Keras mailing list to discuss whether the change should be made, and how to
handle it. This will save you from having your PR closed down the road! Of course, if your PR
is a simple bug fix, you don't need to do that. The process for writing and submitting design
docs is as follow:
• Start from this Google Doc template, and copy it to new Google doc.
• Fill in the content. Note that you will need to insert code examples. To insert code, use a
Google Doc extension such as CodePretty (there are several such extensions available).
• Set sharing settings to "everyone with the link is allowed to comment"
• Send the document to keras-users@googlegroups.com with a subject that starts
with [API DESIGN REVIEW] (all caps) so that we notice it.
• Wait for comments, and answer them as they come. Edit the proposal as necessary.
• The proposal will finally be approved or rejected. Once approved, you can send out Pull
Requests or ask others to write Pull Requests.
2. Write the code (or get others to write it). This is the hard part!
3. Make sure any new function or class you introduce has proper docstrings. Make sure any code
you touch still has up-to-date docstrings and documentation. Docstring style should be
respected. In particular, they should be formatted in MarkDown, and there should be sections
for Arguments, Returns, Raises (if applicable). Look at other docstrings in the codebase
for examples.
4. Write tests. Your code should have full unit test coverage. If you want to see your PR merged
promptly, this is crucial.
5. Run our test suite locally. It's easy: from the Keras folder, simply run: py.test tests/.
• You will need to install the test requirements as well: pip install -e .[tests].
6. Make sure all tests are passing:
• with the Theano backend, on Python 2.7 and Python 3.6. Make sure you have the
development version of Theano.
• with the TensorFlow backend, on Python 2.7 and Python 3.6. Make sure you have the
development version of TensorFlow.
• with the CNTK backend, on Python 2.7 and Python 3.6. Make sure you have the
development version of CNTK.
7. We use PEP8 syntax conventions, but we aren't dogmatic when it comes to line length. Make
sure your lines stay reasonably sized, though. To make your life easier, we recommend running
a PEP8 linter:
• Install PEP8 packages: pip install pep8 pytest-pep8 autopep8
• Run a standalone PEP8 check: py.test --pep8 -m pep8
• You can automatically fix some PEP8 error by running: autopep8 -i --select
<errors> <FILENAME> for example: autopep8 -i --select E128
tests/keras/backend/test_backends.py
8. When committing, use appropriate, descriptive commit messages.
9. Update the documentation. If introducing new functionality, make sure you include code
snippets demonstrating the usage of your new feature.
10.Submit your PR. If your changes have been approved in a previous discussion, and if you have
complete (and passing) unit tests as well as proper docstrings/documentation, your PR is likely
to be merged promptly.
Adding new examples
Even if you don't contribute to the Keras source code, if you have an application of Keras that is
concise and powerful, please consider adding it to our collection of examples. Existing examples show
idiomatic Keras code: make sure to keep your own script in the same spirit.
class CharacterTable(object):
"""Given a set of characters:
+ Encode them to a one-hot integer representation
+ Decode the one-hot or integer representation to their character output
+ Decode a vector of probabilities to their character output
"""
def __init__(self, chars):
"""Initialize character table.
# Arguments
chars: Characters that can appear in the input.
"""
self.chars = sorted(set(chars))
self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
self.indices_char = dict((i, c) for i, c in enumerate(self.chars))
# Arguments
C: string, to be encoded.
num_rows: Number of rows in the returned one-hot encoding. This is
used to keep the # of rows for each data the same.
"""
x = np.zeros((num_rows, len(self.chars)))
for i, c in enumerate(C):
x[i, self.char_indices[c]] = 1
return x
# Arguments
x: A vector or a 2D array of probabilities or one-hot representations;
or a vector of character indices (used with `calc_argmax=False`).
calc_argmax: Whether to find the character index with maximum
probability, defaults to `True`.
"""
if calc_argmax:
x = x.argmax(axis=-1)
return ''.join(self.indices_char[x] for x in x)
class colors:
ok = '\033[92m'
fail = '\033[91m'
close = '\033[0m'
questions = []
expected = []
seen = set()
print('Generating data...')
while len(questions) < TRAINING_SIZE:
f = lambda: int(''.join(np.random.choice(list('0123456789'))
for i in range(np.random.randint(1, DIGITS + 1))))
a, b = f(), f()
# Skip any addition questions we've already seen
# Also skip any such that x+Y == Y+x (hence the sorting).
key = tuple(sorted((a, b)))
if key in seen:
continue
seen.add(key)
# Pad the data with spaces such that it is always MAXLEN.
q = '{}+{}'.format(a, b)
query = q + ' ' * (MAXLEN - len(q))
ans = str(a + b)
# Answers can be of maximum size DIGITS + 1.
ans += ' ' * (DIGITS + 1 - len(ans))
if REVERSE:
# Reverse the query, e.g., '12+345 ' becomes ' 543+21'. (Note the
# space used for padding.)
query = query[::-1]
questions.append(query)
expected.append(ans)
print('Total addition questions:', len(questions))
print('Vectorization...')
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
y[i] = ctable.encode(sentence, DIGITS + 1)
# Shuffle (x, y) in unison as the later parts of x will almost all be larger
# digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]
# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]
print('Training Data:')
print(x_train.shape)
print(y_train.shape)
print('Validation Data:')
print(x_val.shape)
print(y_val.shape)
print('Build model...')
model = Sequential()
# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
# Note: In a situation where your input sequences have a variable length,
# use input_shape=(None, num_feature).
model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars))))
# As the decoder RNN's input, repeatedly provide with the last output of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(DIGITS + 1))
# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(LAYERS):
# By setting return_sequences to True, return not only the last output but
# all the outputs so far in the form of (num_samples, timesteps,
# output_dim). This is necessary as TimeDistributed in the below expects
# the first dimension to be the timesteps.
model.add(RNN(HIDDEN_SIZE, return_sequences=True))
# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
model.add(layers.TimeDistributed(layers.Dense(len(chars), activation='softmax')))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
# Train the model each generation and show predictions against the validation
# dataset.
for iteration in range(1, 200):
print()
print('-' * 50)
print('Iteration', iteration)
model.fit(x_train, y_train,
batch_size=BATCH_SIZE,
epochs=1,
validation_data=(x_val, y_val))
# Select 10 samples from the validation set at random so we can visualize
# errors.
for i in range(10):
ind = np.random.randint(0, len(x_val))
rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
preds = model.predict_classes(rowx, verbose=0)
q = ctable.decode(rowx[0])
correct = ctable.decode(rowy[0])
guess = ctable.decode(preds[0], calc_argmax=False)
print('Q', q[::-1] if REVERSE else q, end=' ')
print('T', correct, end=' ')
if correct == guess:
print(colors.ok + '☑' + colors.close, end=' ')
else:
print(colors.fail + '☒' + colors.close, end=' ')
Note that the same result can also be achieved via a Lambda layer.
Because our custom layer is written with primitives from the Keras backend (K), our code can run both
on TensorFlow and Theano.
from __future__ import print_function
import keras
from keras.models import Sequential
from keras import layers
from keras.datasets import mnist
from keras import backend as K
class Antirectifier(layers.Layer):
'''This is the combination of a sample-wise
L2 normalization with the concatenation of the
positive part of the input with the negative part
of the input. The result is a tensor of samples that are
twice as large as the input samples.
# Input shape
2D tensor of shape (samples, n)
# Output shape
2D tensor of shape (samples, 2*n)
# Theoretical justification
When applying ReLU, assuming that the distribution
of the previous output is approximately centered around 0.,
you are discarding half of your input. This is inefficient.
# global parameters
batch_size = 128
num_classes = 10
epochs = 40
Notes
• With default word, sentence, and query vector sizes, the GRU model achieves:
• 52.1% test accuracy on QA1 in 20 epochs (2 seconds per epoch on CPU)
• 37.0% test accuracy on QA2 in 20 epochs (16 seconds per epoch on CPU) In comparison, the
Facebook paper achieves 50% and 20% for the LSTM baseline.
• The task does not traditionally parse the question separately. This likely improves accuracy and
is a good example of merging two RNNs.
• The word vector embeddings are not shared between the story and question RNNs.
• See how the accuracy changes given 10,000 training samples (en-10k) instead of only 1000.
1000 was used in order to be comparable to the original paper.
• Experiment with GRU, LSTM, and JZS1-3 as they give subtly different results.
• The length and noise (i.e. 'useless' story components) impact the ability of LSTMs / GRUs to
provide the correct answer. Given only the supporting facts, these RNNs can achieve 100%
accuracy on many tasks. Memory networks and neural networks that use attentional processes
can efficiently search through this noise to find the relevant statements, improving performance
substantially. This becomes especially obvious on QA2 and QA3, both far longer than QA1.
from __future__ import print_function
from functools import reduce
import re
import tarfile
import numpy as np
def tokenize(sent):
'''Return the tokens of a sentence including punctuation.
If only_supporting is true,
only the sentences that support the answer are kept.
'''
data = []
story = []
for line in lines:
line = line.decode('utf-8').strip()
nid, line = line.split(' ', 1)
nid = int(nid)
if nid == 1:
story = []
if '\t' in line:
q, a, supporting = line.split('\t')
q = tokenize(q)
if only_supporting:
# Only select the related substory
supporting = map(int, supporting.split())
substory = [story[i - 1] for i in supporting]
else:
# Provide all the substories
substory = [x for x in story if x]
data.append((substory, q, a))
story.append('')
else:
sent = tokenize(line)
story.append(sent)
return data
RNN = recurrent.LSTM
EMBED_HIDDEN_SIZE = 50
SENT_HIDDEN_SIZE = 100
QUERY_HIDDEN_SIZE = 100
BATCH_SIZE = 32
EPOCHS = 20
print('RNN / Embed / Sent / Query = {}, {}, {}, {}'.format(RNN,
EMBED_HIDDEN_SIZE,
SENT_HIDDEN_SIZE,
QUERY_HIDDEN_SIZE))
try:
path = get_file('babi-tasks-v1-2.tar.gz',
origin='https://s3.amazonaws.com/text-datasets/'
'babi_tasks_1-20_v1-2.tar.gz')
except:
print('Error downloading dataset, please download it manually:\n'
'$ wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2'
'.tar.gz\n'
'$ mv tasks_1-20_v1-2.tar.gz ~/.keras/datasets/babi-tasks-v1-2.tar.gz')
raise
print('vocab = {}'.format(vocab))
print('x.shape = {}'.format(x.shape))
print('xq.shape = {}'.format(xq.shape))
print('y.shape = {}'.format(y.shape))
print('story_maxlen, query_maxlen = {}, {}'.format(story_maxlen, query_maxlen))
print('Build model...')
print('Training')
model.fit([x, xq], y,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_split=0.05)
print('Evaluation')
loss, acc = model.evaluate([tx, txq], ty,
batch_size=BATCH_SIZE)
print('Test loss / test accuracy = {:.4f} / {:.4f}'.format(loss, acc))
def tokenize(sent):
'''Return the tokens of a sentence including punctuation.
If max_length is supplied,
any stories longer than max_length tokens will be discarded.
'''
data = parse_stories(f.readlines(), only_supporting=only_supporting)
flatten = lambda data: reduce(lambda x, y: x + y, data)
data = [(flatten(story), q, answer) for story, q, answer in data
if not max_length or len(flatten(story)) < max_length]
return data
def vectorize_stories(data):
inputs, queries, answers = [], [], []
for story, query, answer in data:
inputs.append([word_idx[w] for w in story])
queries.append([word_idx[w] for w in query])
answers.append(word_idx[answer])
return (pad_sequences(inputs, maxlen=story_maxlen),
pad_sequences(queries, maxlen=query_maxlen),
np.array(answers))
try:
path = get_file('babi-tasks-v1-2.tar.gz',
origin='https://s3.amazonaws.com/text-datasets/'
'babi_tasks_1-20_v1-2.tar.gz')
except:
print('Error downloading dataset, please download it manually:\n'
'$ wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2'
'.tar.gz\n'
'$ mv tasks_1-20_v1-2.tar.gz ~/.keras/datasets/babi-tasks-v1-2.tar.gz')
raise
challenges = {
# QA1 with 10,000 samples
'single_supporting_fact_10k': 'tasks_1-20_v1-2/en-10k/qa1_'
'single-supporting-fact_{}.txt',
# QA2 with 10,000 samples
'two_supporting_facts_10k': 'tasks_1-20_v1-2/en-10k/qa2_'
'two-supporting-facts_{}.txt',
}
challenge_type = 'single_supporting_fact_10k'
challenge = challenges[challenge_type]
print('-')
print('Vocab size:', vocab_size, 'unique words')
print('Story max length:', story_maxlen, 'words')
print('Query max length:', query_maxlen, 'words')
print('Number of training stories:', len(train_stories))
print('Number of test stories:', len(test_stories))
print('-')
print('Here\'s what a "story" tuple looks like (input, query, answer):')
print(train_stories[0])
print('-')
print('Vectorizing the word sequences...')
print('-')
print('inputs: integer tensor of shape (samples, max_length)')
print('inputs_train shape:', inputs_train.shape)
print('inputs_test shape:', inputs_test.shape)
print('-')
print('queries: integer tensor of shape (samples, max_length)')
print('queries_train shape:', queries_train.shape)
print('queries_test shape:', queries_test.shape)
print('-')
print('answers: binary (1 or 0) tensor of shape (samples, vocab_size)')
print('answers_train shape:', answers_train.shape)
print('answers_test shape:', answers_test.shape)
print('-')
print('Compiling...')
# placeholders
input_sequence = Input((story_maxlen,))
question = Input((query_maxlen,))
# encoders
# embed the input sequence into a sequence of vectors
input_encoder_m = Sequential()
input_encoder_m.add(Embedding(input_dim=vocab_size,
output_dim=64))
input_encoder_m.add(Dropout(0.3))
# output: (samples, story_maxlen, embedding_dim)
# add the match matrix with the second input vector sequence
response = add([match, input_encoded_c]) # (samples, story_maxlen, query_maxlen)
response = Permute((2, 1))(response) # (samples, query_maxlen, story_maxlen)
# the original paper uses a matrix multiplication for this reduction step.
# we choose to use a RNN instead.
answer = LSTM(32)(answer) # (samples, 32)
# train
model.fit([inputs_train, queries_train], answers_train,
batch_size=32,
epochs=120,
validation_data=([inputs_test, queries_test], answers_test))
Train a simple deep CNN on the CIFAR10 small
images dataset.
It gets to 75% validation accuracy in 25 epochs, and 79% after 50 epochs. (it's still underfitting at that
point, though).
from __future__ import print_function
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os
batch_size = 32
num_classes = 10
epochs = 100
data_augmentation = True
num_predictions = 20
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'keras_cifar10_trained_model.h5'
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
# initiate RMSprop optimizer
opt = keras.optimizers.RMSprop(learning_rate=0.0001, decay=1e-6)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
if not data_augmentation:
print('Not using data augmentation.')
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test),
shuffle=True)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
zca_epsilon=1e-06, # epsilon for ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to
180)
# randomly shift images horizontally (fraction of total width)
width_shift_range=0.1,
# randomly shift images vertically (fraction of total height)
height_shift_range=0.1,
shear_range=0., # set range for random shear
zoom_range=0., # set range for random zoom
channel_shift_range=0., # set range for random channel shifts
# set mode for filling points outside the input boundaries
fill_mode='nearest',
cval=0., # value used for fill_mode = "constant"
horizontal_flip=True, # randomly flip images
vertical_flip=False, # randomly flip images
# set rescaling factor (applied before any other transformation)
rescale=None,
# set function that will be applied on each input
preprocessing_function=None,
# image data format, either "channels_first" or "channels_last"
data_format=None,
# fraction of images reserved for validation (strictly between 0 and 1)
validation_split=0.0)
# Training parameters
batch_size = 32 # orig paper trained all networks with batch_size=128
epochs = 200
data_augmentation = True
num_classes = 10
# Model parameter
# ----------------------------------------------------------------------------
# | | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model | n | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
# |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20 | 3 (2)| 92.16 | 91.25 | ----- | ----- | 35 (---)
# ResNet32 | 5(NA)| 92.46 | 92.49 | NA | NA | 50 ( NA)
# ResNet44 | 7(NA)| 92.50 | 92.83 | NA | NA | 70 ( NA)
# ResNet56 | 9 (6)| 92.71 | 93.03 | 93.01 | NA | 90 (100)
# ResNet110 |18(12)| 92.65 | 93.39+-.16| 93.15 | 93.63 | 165(180)
# ResNet164 |27(18)| ----- | 94.07 | ----- | 94.54 | ---(---)
# ResNet1001| (111)| ----- | 92.39 | ----- | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------
n = 3
# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 1
# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
def lr_schedule(epoch):
"""Learning Rate Schedule
Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
Called automatically every epoch as part of callbacks during training.
# Arguments
epoch (int): The number of epochs
# Returns
lr (float32): learning rate
"""
lr = 1e-3
if epoch > 180:
lr *= 0.5e-3
elif epoch > 160:
lr *= 1e-3
elif epoch > 120:
lr *= 1e-2
elif epoch > 80:
lr *= 1e-1
print('Learning rate: ', lr)
return lr
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1,
activation='relu',
batch_normalization=True,
conv_first=True):
"""2D Convolution-Batch Normalization-Activation stack builder
# Arguments
inputs (tensor): input tensor from input image or previous layer
num_filters (int): Conv2D number of filters
kernel_size (int): Conv2D square kernel dimensions
strides (int): Conv2D square stride dimensions
activation (string): activation name
batch_normalization (bool): whether to include batch normalization
conv_first (bool): conv-bn-activation (True) or
bn-activation-conv (False)
# Returns
x (tensor): tensor as input to the next layer
"""
conv = Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=l2(1e-4))
x = inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
else:
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
x = conv(x)
return x
Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
Last ReLU is after the shortcut connection.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filters is
doubled. Within each stage, the layers have the same number filters and the
same number of filters.
Features maps sizes:
stage 0: 32x32, 16
stage 1: 16x16, 32
stage 2: 8x8, 64
The Number of parameters is approx the same as Table 6 of [a]:
ResNet20 0.27M
ResNet32 0.46M
ResNet44 0.66M
ResNet56 0.85M
ResNet110 1.7M
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 6 != 0:
raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
# Start model definition.
num_filters = 16
num_res_blocks = int((depth - 2) / 6)
inputs = Input(shape=input_shape)
x = resnet_layer(inputs=inputs)
# Instantiate the stack of residual units
for stack in range(3):
for res_block in range(num_res_blocks):
strides = 1
if stack > 0 and res_block == 0: # first layer but not first stack
strides = 2 # downsample
y = resnet_layer(inputs=x,
num_filters=num_filters,
strides=strides)
y = resnet_layer(inputs=y,
num_filters=num_filters,
activation=None)
if stack > 0 and res_block == 0: # first layer but not first stack
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
x = Activation('relu')(x)
num_filters *= 2
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 16
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
if version == 2:
model = resnet_v2(input_shape=input_shape, depth=depth)
else:
model = resnet_v1(input_shape=input_shape, depth=depth)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(learning_rate=lr_schedule(0)),
metrics=['accuracy'])
model.summary()
print(model_type)
# Prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath=filepath,
monitor='val_acc',
verbose=1,
save_best_only=True)
lr_scheduler = LearningRateScheduler(lr_schedule)
lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
cooldown=0,
patience=5,
min_lr=0.5e-6)
Results example:
from __future__ import print_function
import time
import numpy as np
from PIL import Image as pil_image
from keras.preprocessing.image import save_img
from keras import layers
from keras.applications import vgg16
from keras import backend as K
def normalize(x):
"""utility function to normalize a tensor.
# Arguments
x: An input tensor.
# Returns
The normalized input tensor.
"""
return x / (K.sqrt(K.mean(K.square(x))) + K.epsilon())
def deprocess_image(x):
"""utility function to convert a float array into a valid uint8 image.
# Arguments
x: A numpy-array representing the generated image.
# Returns
A processed numpy-array, which could be used in e.g. imshow.
"""
# normalize tensor: center on 0., ensure std is 0.25
x -= x.mean()
x /= (x.std() + K.epsilon())
x *= 0.25
# clip to [0, 1]
x += 0.5
x = np.clip(x, 0, 1)
# Arguments
x: A numpy-array, which could be used in e.g. imshow.
former: The former numpy-array.
Need to determine the former mean and variance.
# Returns
A processed numpy-array representing the generated image.
"""
if K.image_data_format() == 'channels_first':
x = x.transpose((2, 0, 1))
return (x / 255 - 0.5) * 4 * former.std() + former.mean()
def visualize_layer(model,
layer_name,
step=1.,
epochs=15,
upscaling_steps=9,
upscaling_factor=1.2,
output_dim=(412, 412),
filter_range=(0, None)):
"""Visualizes the most relevant filters of one conv-layer in a certain model.
# Arguments
model: The model containing layer_name.
layer_name: The name of the layer to be visualized.
Has to be a part of model.
step: step size for gradient ascent.
epochs: Number of iterations for gradient ascent.
upscaling_steps: Number of upscaling steps.
Starting image is in this case (80, 80).
upscaling_factor: Factor to which to slowly upgrade
the image towards output_dim.
output_dim: [img_width, img_height] The output image dimensions.
filter_range: Tupel[lower, upper]
Determines the to be computed filter numbers.
If the second value is `None`,
the last filter will be inferred as the upper boundary.
"""
def _generate_filter_image(input_img,
layer_output,
filter_index):
"""Generates image for one particular filter.
# Arguments
input_img: The input-image Tensor.
layer_output: The output-image Tensor.
filter_index: The to be processed filter number.
Assumed to be valid.
#Returns
Either None if no image could be generated.
or a tuple of the image (array) itself and the last loss.
"""
s_time = time.time()
# this function returns the loss and grads given the input picture
iterate = K.function([input_img], [loss, grads])
# Arguments
filters: A List of generated images and their corresponding losses
for each processed filter.
n: dimension of the grid.
If none, the largest possible square will be used
"""
if n is None:
n = int(np.floor(np.sqrt(len(filters))))
# the filters that have the highest loss are assumed to be better-looking.
# we will only keep the top n*n filters.
filters.sort(key=lambda x: x[1], reverse=True)
filters = filters[:n * n]
# get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers[1:]])
output_layer = layer_dict[layer_name]
assert isinstance(output_layer, layers.Conv2D)
if __name__ == '__main__':
# the name of the layer we want to visualize
# (see model definition at keras/applications/vgg16.py)
LAYER_NAME = 'block5_conv1'
seq = Sequential()
seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
input_shape=(None, 40, 40, 1),
padding='same', return_sequences=True))
seq.add(BatchNormalization())
for i in range(n_samples):
# Add 3 to 7 moving squares
n = np.random.randint(3, 8)
for j in range(n):
# Initial position
xstart = np.random.randint(20, 60)
ystart = np.random.randint(20, 60)
# Direction of motion
directionx = np.random.randint(0, 3) - 1
directiony = np.random.randint(0, 3) - 1
for t in range(n_frames):
x_shift = xstart + directionx * t
y_shift = ystart + directiony * t
noisy_movies[i, t, x_shift - w: x_shift + w,
y_shift - w: y_shift + w, 0] += 1
for j in range(16):
new_pos = seq.predict(track[np.newaxis, ::, ::, ::, ::])
new = new_pos[::, -1, ::, ::, ::]
track = np.concatenate((track, new), axis=0)
ax = fig.add_subplot(121)
if i >= 7:
ax.text(1, 3, 'Predictions !', fontsize=20, color='w')
else:
ax.text(1, 3, 'Initial trajectory', fontsize=20)
plt.imshow(toplot)
ax = fig.add_subplot(122)
plt.text(1, 3, 'Ground truth', fontsize=20)
plt.imshow(toplot)
plt.savefig('%i_animate.png' % (i + 1))
e.g.:
python deep_dream.py img/mypic.jpg results/dream
def preprocess_image(image_path):
# Util function to open, resize and format pictures
# into appropriate tensors.
img = load_img(image_path)
img = img_to_array(img)
img = np.expand_dims(img, axis=0)
img = inception_v3.preprocess_input(img)
return img
def deprocess_image(x):
# Util function to convert a tensor into a valid image.
if K.image_data_format() == 'channels_first':
x = x.reshape((3, x.shape[2], x.shape[3]))
x = x.transpose((1, 2, 0))
else:
x = x.reshape((x.shape[1], x.shape[2], 3))
x /= 2.
x += 0.5
x *= 255.
x = np.clip(x, 0, 255).astype('uint8')
return x
K.set_learning_phase(0)
# Get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers])
def eval_loss_and_grads(x):
outs = fetch_loss_and_grads([x])
loss_value = outs[0]
grad_values = outs[1]
return loss_value, grad_values
"""Process:
# Playing with these hyperparameters will also allow you to achieve new effects
step = 0.01 # Gradient ascent step size
num_octave = 3 # Number of scales at which to run gradient ascent
octave_scale = 1.4 # Size ratio between scales
iterations = 20 # Number of ascent steps per scale
max_loss = 10.
img = preprocess_image(base_image_path)
if K.image_data_format() == 'channels_first':
original_shape = img.shape[2:]
else:
original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])
img += lost_detail
shrunk_original_img = resize_img(original_img, shape)
Epoch TF TH
10 0.027 0.064
15 0.038 0.035
20 0.043 0.045
25 0.014 0.019
Additional dependencies
This requires cairo and editdistance packages:
OUTPUT_DIR = 'image_ocr'
np.random.seed(55)
def speckle(img):
severity = np.random.uniform(0, 0.6)
blur = ndimage.gaussian_filter(np.random.randn(*img.shape) * severity, 1)
img_speck = (img + blur)
img_speck[img_speck > 1] = 1
img_speck[img_speck <= 0] = 0
return img_speck
buf = surface.get_data()
a = np.frombuffer(buf, np.uint8)
a.shape = (h, w, 4)
a = a[:, :, 0] # grab single channel
a = a.astype(np.float32) / 255
a = np.expand_dims(a, 0)
if rotate:
a = image.random_rotation(a, 3 * (w - top_left_x) / w + 1)
a = speckle(a)
return a
a = list(range(stop_ind))
np.random.shuffle(a)
a += list(range(stop_ind, len_val))
for mat in matrix_list:
if isinstance(mat, np.ndarray):
ret.append(mat[a])
elif isinstance(mat, list):
ret.append([mat[i] for i in a])
else:
raise TypeError('`shuffle_mats_or_lists` only supports '
'numpy.array and list objects.')
return ret
class TextImageGenerator(keras.callbacks.Callback):
self.minibatch_size = minibatch_size
self.img_w = img_w
self.img_h = img_h
self.monogram_file = monogram_file
self.bigram_file = bigram_file
self.downsample_factor = downsample_factor
self.val_split = val_split
self.blank_label = self.get_output_size() - 1
self.absolute_max_string_len = absolute_max_string_len
def get_output_size(self):
return len(alphabet) + 1
# num_words can be independent of the epoch size due to the use of generators
# as max_string_len grows, num_words can grow
def build_word_list(self, num_words, max_string_len=None, mono_fraction=0.5):
assert max_string_len <= self.absolute_max_string_len
assert num_words % self.minibatch_size == 0
assert (self.val_split * num_words) % self.minibatch_size == 0
self.num_words = num_words
self.string_list = [''] * self.num_words
tmp_string_list = []
self.max_string_len = max_string_len
self.Y_data = np.ones([self.num_words, self.absolute_max_string_len]) * -1
self.X_text = []
self.Y_len = [0] * self.num_words
def _is_length_of_word_valid(word):
return (max_string_len == -1 or
max_string_len is None or
len(word) <= max_string_len)
self.cur_val_index = self.val_split
self.cur_train_index = 0
def next_train(self):
while 1:
ret = self.get_batch(self.cur_train_index,
self.minibatch_size, train=True)
self.cur_train_index += self.minibatch_size
if self.cur_train_index >= self.val_split:
self.cur_train_index = self.cur_train_index % 32
(self.X_text, self.Y_data, self.Y_len) = shuffle_mats_or_lists(
[self.X_text, self.Y_data, self.Y_len], self.val_split)
yield ret
def next_val(self):
while 1:
ret = self.get_batch(self.cur_val_index,
self.minibatch_size, train=False)
self.cur_val_index += self.minibatch_size
if self.cur_val_index >= self.num_words:
self.cur_val_index = self.val_split + self.cur_val_index % 32
yield ret
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
# For a real OCR application, this should be beam search with a dictionary
# and language model. For this example, best path is sufficient.
class VizCallback(keras.callbacks.Callback):
# Network parameters
conv_filters = 16
kernel_size = (3, 3)
pool_size = 2
time_dense_size = 32
rnn_size = 512
minibatch_size = 32
if K.image_data_format() == 'channels_first':
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
fdir = os.path.dirname(
get_file('wordlists.tgz',
origin='http://www.mythic-ai.com/datasets/wordlists.tgz',
untar=True))
img_gen = TextImageGenerator(
monogram_file=os.path.join(fdir, 'wordlist_mono_clean.txt'),
bigram_file=os.path.join(fdir, 'wordlist_bi_clean.txt'),
minibatch_size=minibatch_size,
img_w=img_w,
img_h=img_h,
downsample_factor=(pool_size ** 2),
val_split=words_per_epoch - val_words)
act = 'relu'
input_data = Input(name='the_input', shape=input_shape, dtype='float32')
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv1')(input_data)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv2')(inner)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)
labels = Input(name='the_labels',
shape=[img_gen.absolute_max_string_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(
ctc_lambda_func, output_shape=(1,),
name='ctc')([y_pred, labels, input_length, label_length])
# the loss calc occurs elsewhere, so use a dummy lambda func for the loss
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)
if start_epoch > 0:
weight_file = os.path.join(
OUTPUT_DIR,
os.path.join(run_name, 'weights%02d.h5' % (start_epoch - 1)))
model.load_weights(weight_file)
# captures output of softmax so we can decode the output during visualization
test_func = K.function([input_data], [y_pred])
model.fit_generator(
generator=img_gen.next_train(),
steps_per_epoch=(words_per_epoch - val_words) // minibatch_size,
epochs=stop_epoch,
validation_data=img_gen.next_val(),
validation_steps=val_words // minibatch_size,
callbacks=[viz_cb, img_gen],
initial_epoch=start_epoch)
if __name__ == '__main__':
run_name = datetime.datetime.now().strftime('%Y:%m:%d:%H:%M:%S')
train(run_name, 0, 20, 128)
# increase to wider images and start at epoch 20.
# The learned weights are reloaded
train(run_name, 20, 25, 512)
max_features = 20000
# cut texts after this number of words
# (among top max_features most common words)
maxlen = 100
batch_size = 32
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Train...')
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=4,
validation_data=[x_test, y_test])
# set parameters:
max_features = 5000
maxlen = 400
batch_size = 32
embedding_dims = 50
filters = 250
kernel_size = 3
hidden_dims = 250
epochs = 2
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test))
# Embedding
max_features = 20000
maxlen = 100
embedding_size = 128
# Convolution
kernel_size = 5
filters = 64
pool_size = 4
# LSTM
lstm_output_size = 70
# Training
batch_size = 30
epochs = 2
'''
Note:
batch_size is highly sensitive.
Only 2 epochs are needed as the dataset is very small.
'''
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, embedding_size, input_length=maxlen))
model.add(Dropout(0.25))
model.add(Conv1D(filters,
kernel_size,
padding='valid',
activation='relu',
strides=1))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(lstm_output_size))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print('Train...')
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
return new_sequences
# Set parameters:
# ngram_range = 2 will add bi-grams features
ngram_range = 1
max_features = 20000
maxlen = 400
batch_size = 32
embedding_dims = 50
epochs = 5
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Average train sequence length: {}'.format(
np.mean(list(map(len, x_train)), dtype=int)))
print('Average test sequence length: {}'.format(
np.mean(list(map(len, x_test)), dtype=int)))
if ngram_range > 1:
print('Adding {}-gram features'.format(ngram_range))
# Create set of unique n-gram from the training set.
ngram_set = set()
for input_list in x_train:
for i in range(2, ngram_range + 1):
set_of_ngram = create_ngram_set(input_list, ngram_value=i)
ngram_set.update(set_of_ngram)
print('Build model...')
model = Sequential()
# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test))
max_features = 20000
# cut texts after this number of words (among top max_features most common words)
maxlen = 80
batch_size = 32
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
print('Train...')
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=15,
validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])
input_token_index = dict(
[(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
[(char, i) for i, char in enumerate(target_characters)])
encoder_input_data = np.zeros(
(len(input_texts), max_encoder_seq_length, num_encoder_tokens),
dtype='float32')
decoder_input_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
decoder_target_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2)
# Save model
model.save('s2s.h5')
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Update states
states_value = [h, c]
return decoded_sentence
# Vectorize the data. We use the same approach as the training script.
# NOTE: the data must be identical, in order for the character -> integer
# mappings to be consistent.
# We omit encoding target_texts since they are not needed.
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, 'r', encoding='utf-8') as f:
lines = f.read().split('\n')
for line in lines[: min(num_samples, len(lines) - 1)]:
input_text, target_text = line.split('\t')
# We use "tab" as the "start sequence" character
# for the targets, and "\n" as "end sequence" character.
target_text = '\t' + target_text + '\n'
input_texts.append(input_text)
target_texts.append(target_text)
for char in input_text:
if char not in input_characters:
input_characters.add(char)
for char in target_text:
if char not in target_characters:
target_characters.add(char)
input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])
input_token_index = dict(
[(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
[(char, i) for i, char in enumerate(target_characters)])
encoder_input_data = np.zeros(
(len(input_texts), max_encoder_seq_length, num_encoder_tokens),
dtype='float32')
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update states
states_value = [h, c]
return decoded_sentence
When lahead > 1, the model input is preprocessed to a "rolling window view" of the data, with the
window length = lahead. This is similar to sklearn's view_as_windows with window_shape
being a single number.
When lahead < tsteps, only the stateful LSTM converges because its statefulness allows it to see
beyond the capability that lahead gave it to fit the n-point average. The stateless LSTM does not have
this capability, and hence is limited by its lahead parameter, which is not sufficient to see the n-point
average.
When lahead >= tsteps, both the stateful and stateless LSTM converge.
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, LSTM
# ----------------------------------------------------------
# EDITABLE PARAMETERS
# Read the documentation in the script head for more details
# ----------------------------------------------------------
# length of input
input_len = 1000
# The input sequence length that the LSTM is trained on for each output point
lahead = 1
# ------------
# MAIN PROGRAM
# ------------
print("*" * 33)
if lahead >= tsteps:
print("STATELESS LSTM WILL ALSO CONVERGE")
else:
print("STATELESS LSTM WILL NOT CONVERGE")
print("*" * 33)
np.random.seed(1986)
print('Generating Data...')
# when lahead > 1, need to convert the input to "rolling window view"
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.repeat.html
if lahead > 1:
data_input = np.repeat(data_input.values, repeats=lahead, axis=1)
data_input = pd.DataFrame(data_input)
for i, c in enumerate(data_input.columns):
data_input[c] = data_input[c].shift(i)
def create_model(stateful):
model = Sequential()
model.add(LSTM(20,
input_shape=(lahead, 1),
batch_size=batch_size,
stateful=stateful))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
return model
x_train = x[:to_train]
y_train = y[:to_train]
x_test = x[to_train:]
y_test = y[to_train:]
# some reshaping
reshape_3 = lambda x: x.values.reshape((x.shape[0], x.shape[1], 1))
x_train = reshape_3(x_train)
x_test = reshape_3(x_test)
print('Training')
for i in range(epochs):
print('Epoch', i + 1, '/', epochs)
# Note that the last state for sample i in a batch will
# be used as initial state for sample i in the next batch.
# Thus we are simultaneously training on batch_size series with
# lower resolution than the original series contained in data_input.
# Each of these series are offset by one step and can be
# extracted with data_input[i::batch_size].
model_stateful.fit(x_train,
y_train,
batch_size=batch_size,
epochs=1,
verbose=1,
validation_data=(x_test, y_test),
shuffle=False)
model_stateful.reset_states()
print('Predicting')
predicted_stateful = model_stateful.predict(x_test, batch_size=batch_size)
print('Training')
model_stateless.fit(x_train,
y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
shuffle=False)
print('Predicting')
predicted_stateless = model_stateless.predict(x_test, batch_size=batch_size)
# ----------------------------
print('Plotting Results')
plt.subplot(3, 1, 1)
plt.plot(y_test)
plt.title('Expected')
plt.subplot(3, 1, 2)
# drop the first "tsteps-1" because it is not possible to predict them
# since the "previous" timesteps to use do not exist
plt.plot((y_test - predicted_stateful).flatten()[tsteps - 1:])
plt.title('Stateful: Expected - Predicted')
plt.subplot(3, 1, 3)
plt.plot((y_test - predicted_stateless).flatten())
plt.title('Stateless: Expected - Predicted')
plt.show()
path = get_file(
'nietzsche.txt',
origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
with io.open(path, encoding='utf-8') as f:
text = f.read().lower()
print('corpus length:', len(text))
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
x[i, t, char_indices[char]] = 1
y[i, char_indices[next_chars[i]]] = 1
optimizer = RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
generated = ''
sentence = text[start_index: start_index + maxlen]
generated += sentence
print('----- Generating with seed: "' + sentence + '"')
sys.stdout.write(generated)
for i in range(400):
x_pred = np.zeros((1, maxlen, len(chars)))
for t, char in enumerate(sentence):
x_pred[0, t, char_indices[char]] = 1.
sys.stdout.write(next_char)
sys.stdout.flush()
print()
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
model.fit(x, y,
batch_size=128,
epochs=60,
callbacks=[print_callback])
np.random.seed(1337)
num_classes = 10
def build_generator(latent_size):
# we will map a pair of (z, L), where z is a latent vector and L is a
# label drawn from P_c, to image space (..., 28, 28, 1)
cnn = Sequential()
fake_image = cnn(h)
def build_discriminator():
# build a relatively standard conv net, with LeakyReLUs as suggested in
# the reference paper
cnn = Sequential()
cnn.add(Flatten())
features = cnn(image)
if __name__ == '__main__':
# batch and latent size taken from the paper
epochs = 100
batch_size = 100
latent_size = 100
latent = Input(shape=(latent_size, ))
image_class = Input(shape=(1,), dtype='int32')
print('Combined model:')
combined.compile(
optimizer=Adam(learning_rate=adam_lr, beta_1=adam_beta_1),
loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
)
combined.summary()
# get our mnist data, and force it to be of shape (..., 28, 28, 1) with
# range [-1, 1]
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = (x_train.astype(np.float32) - 127.5) / 127.5
x_train = np.expand_dims(x_train, axis=-1)
train_history = defaultdict(list)
test_history = defaultdict(list)
epoch_gen_loss = []
epoch_disc_loss = []
x = np.concatenate((image_batch, generated_images))
# make new noise. we generate 2 * batch size here such that we have
# the generator optimize over an identical number of images as the
# discriminator
noise = np.random.uniform(-1, 1, (2 * len(image_batch), latent_size))
sampled_labels = np.random.randint(0, num_classes, 2 *
len(image_batch))
epoch_gen_loss.append(combined.train_on_batch(
[noise, sampled_labels.reshape((-1, 1))],
[trick, sampled_labels]))
progress_bar.update(index + 1)
# sample some labels from p_c and generate images from them
sampled_labels = np.random.randint(0, num_classes, num_test)
generated_images = generator.predict(
[noise, sampled_labels.reshape((-1, 1))], verbose=False)
x = np.concatenate((x_test, generated_images))
y = np.array([1] * num_test + [0] * num_test)
aux_y = np.concatenate((y_test, sampled_labels), axis=0)
generator_test_loss = combined.evaluate(
[noise, sampled_labels.reshape((-1, 1))],
[trick, sampled_labels], verbose=False)
test_history['generator'].append(generator_test_loss)
test_history['discriminator'].append(discriminator_test_loss)
sampled_labels = np.array([
[i] * num_rows for i in range(num_classes)
]).reshape(-1, 1)
Image.fromarray(img).save(
'plot_epoch_{0:03d}_generated.png'.format(epoch))