## How should I cite Keras?

Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:

@misc{chollet2017kerasR,
title={R Interface to Keras},
author={Chollet, Fran\c{c}ois and Allaire, JJ and others},
year={2017},
publisher={GitHub},
howpublished={\url{https://github.com/rstudio/keras}},
}

## How can I run Keras on a GPU?

Note that installation and configuration of the GPU-based backends can take considerably more time and effort. So if you are just getting started with Keras you may want to stick with the CPU version initially, then install the appropriate GPU version once your training becomes more computationally demanding.

Below are instructions for installing and enabling GPU support for the various supported backends.

### TensorFlow

If your system has an NVIDIA® GPU and you have the GPU version of TensorFlow installed then your Keras code will automatically run on the GPU.

Additional details on GPU installation can be found here: https://tensorflow.rstudio.com/installation_gpu.html.

### Theano

If you are running on the Theano backend, you can set the THEANO_FLAGS environment variable to indicate you’d like to execute tensor operations on the GPU. For example:

Sys.setenv(KERAS_BACKEND = "keras")
Sys.setenv(THEANO_FLAGS = "device=gpu,floatX=float32")
library(keras)

The name ‘gpu’ might have to be changed depending on your device’s identifier (e.g. gpu0, gpu1, etc).

### CNTK

If you have the GPU version of CNTK installed then your Keras code will automatically run on the GPU.

Additional information on installing the GPU version of CNTK can be found here: https://docs.microsoft.com/en-us/cognitive-toolkit/setup-linux-python

## How can I run a Keras model on multiple GPUs?

We recommend doing so using the TensorFlow backend. There are two ways to run a single model on multiple GPUs: data parallelism and device parallelism.

In most cases, what you need is most likely data parallelism.

### Data parallelism

Data parallelism consists in replicating the target model once on each device, and using each replica to process a different fraction of the input data. Keras has a built-in utility, multi_gpu_model(), which can produce a data-parallel version of any model, and achieves quasi-linear speedup on up to 8 GPUs.

For more information, see the documentation for multi_gpu_model. Here is a quick example:

# Replicates model on 8 GPUs.
# This assumes that your machine has 8 available GPUs.
parallel_model <- multi_gpu_model(model, gpus=8)
parallel_model %>% compile(
loss = "categorical_crossentropy",
optimizer = "rmsprop"
)

# This fit call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model %>% fit(x, y, epochs = 20, batch_size = 256)

### Device parallelism

Device parallelism consists in running different parts of a same model on different devices. It works best for models that have a parallel architecture, e.g. a model with two branches.

This can be achieved by using TensorFlow device scopes. Here is a quick example:

# Model where a shared LSTM is used to encode two different sequences in parallel
input_a <- layer_input(shape = c(140, 256))
input_b <- layer_input(shape = c(140, 256))

shared_lstm <- layer_lstm(units = 64)

# Process the first sequence on one GPU
library(tensorflow)
with(tf$device_scope("/gpu:0", { encoded_a <- shared_lstm(tweet_a) }): # Process the next sequence on another GPU with(tf$device_scope("/gpu:1", {
encoded_b <- shared_lstm(tweet_b)
}):

# Concatenate results on CPU

## How can I “freeze” Keras layers?

To “freeze” a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model, or using fixed embeddings for a text input.

You can pass a trainable argument (boolean) to a layer constructor to set a layer to be non-trainable:

frozen_layer <- layer_dense(units = 32, trainable = FALSE)

Additionally, you can set the trainable property of a layer to TRUE or FALSE after instantiation. For this to take effect, you will need to call compile() on your model after modifying the trainable property. Here’s an example:

x <- layer_input(shape = c(32))
layer <- layer_dense(units = 32)
layer$trainable <- FALSE y <- x %>% layer frozen_model <- keras_model(x, y) # in the model below, the weights of layer will not be updated during training frozen_model %>% compile(optimizer = 'rmsprop', loss = 'mse') layer$trainable <- TRUE
trainable_model <- keras_model(x, y)
# with this model the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model %>% compile(optimizer = 'rmsprop', loss = 'mse')

frozen_model %>% fit(data, labels)  # this does NOT update the weights of layer
trainable_model %>% fit(data, labels)  # this updates the weights of layer

Finally, you can freeze or unfreeze the weights for an entire model (or a range of layers within the model) using the freeze_weights() and unfreeze_weights() functions. For example:

# instantiate a VGG16 model
conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(150, 150, 3)
)

# freeze it's weights
freeze_weights(conv_base)

# create a composite model that includes the base + more layers
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")

# compile
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)

# unfreeze weights from "block5_conv1" on
unfreeze_weights(conv_base, from = "block5_conv1")

# compile again since we froze or unfroze layers
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)

## How can I use stateful RNNs?

Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.

When using stateful RNNs, it is therefore assumed that:

• all batches have the same number of samples
• If X1 and X2 are successive batches of samples, then X2[[i]] is the follow-up sequence to X1[[i], for every i.

To use statefulness in RNNs, you need to:

• explicitly specify the batch size you are using, by passing a batch_size argument to the first layer in your model. E.g. batch_size=32 for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
• set stateful=TRUE in your RNN layer(s).
• specify shuffle=FALSE when calling fit().

To reset the states accumulated in either a single layer or an entire model use the reset_states() function.

Notes that the methods predict(), fit(), train_on_batch(), predict_classes(), etc. will all update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.

## How can I remove a layer from a Sequential model?

You can remove the last added layer in a Sequential model by calling pop_layer():

model <- keras_model_sequential()
model %>%
layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>%
layer_dense(units = 32, activation = 'relu') %>%
layer_dense(units = 32, activation = 'relu')

length(model$layers) # "3" model %>% pop_layer() length(model$layers)     # "2"

## How can I use pre-trained models in Keras?

Code and pre-trained weights are available for the following image classification models:

For example:

model <- application_vgg16(weights = 'imagenet', include_top = TRUE)

For a few simple usage examples, see the documentation for the Applications module.

The VGG16 model is also the basis for the Deep dream Keras example script.

## How can I use other Keras backends?

By default the Keras Python and R packages use the TensorFlow backend. Other available backends include Theano or CNTK. To learn more about using alternatate backends (e.g. Theano or CNTK) see the article on Keras backends.

## How can I use the PlaidML backend?

PlaidML is an open source portable deep learning engine that runs on most existing PC hardware with OpenCL-capable GPUs from NVIDIA, AMD, or Intel. PlaidML includes a Keras backend which you can use as described below.

First, build and install PlaidML as described on the project website. You must be sure that PlaidML is correctly installed, setup, and working before proceeding further!

Then, to use Keras with the PlaidML backend you do the following:

library(keras)
use_backend("plaidml")

This should automatically discover and use the Python environment where plaidml and plaidml-keras were installed. If this doesn’t work as expected you can also force the selection of a particular Python environment. For example, if you installed PlaidML in conda environment named “plaidml” you would do this:

library(keras)
use_condaenv("plaidml")
use_backend("plaidml")

## How can I use Keras in another R package?

### Testing on CRAN

The main consideration in using Keras within another R package is to ensure that your package can be tested in an environment where Keras is not available (e.g. the CRAN test servers). To do this, arrange for your tests to be skipped when Keras isn’t available using the is_keras_available() function.

For example, here’s a testthat utility function that can be used to skip a test when Keras isn’t available:

# testthat utilty for skipping tests when Keras isn't available
skip_if_no_keras <- function(version = NULL) {
if (!is_keras_available(version))
skip("Required keras version not available for testing")
}

# use the function within a test
test_that("keras function works correctly", {
skip_if_no_keras()
# test code here
})

You can pass the version argument to check for a specific version of Keras.

### Keras Module

Another consideration is gaining access to the underlying Keras python module. You might need to do this if you require lower level access to Keras than is provided for by the Keras R package.

Since the Keras R package can bind to multiple different implementations of Keras (either the original Keras or the TensorFlow implementation of Keras), you should use the keras::implementation() function to obtain access to the correct python module. You can use this function within the .onLoad function of a package to provide global access to the module within your package. For example:

# Keras python module
keras <- NULL

# Obtain a reference to the module from the keras R package
keras <<- keras::implementation()
}

### Custom Layers

If you create custom layers in R or import other Python packages which include custom Keras layers, be sure to wrap them using the create_layer() function so that they are composable using the magrittr pipe operator. See the documentation on layer wrapper functions for additional details.

## How can I obtain reproducible results using Keras during development?

During development of a model, sometimes it is useful to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data modification, or merely a result of a new random sample.

The use_session_with_seed() function establishes a common random seed for R, Python, NumPy, and TensorFlow. It furthermore disables hash randomization, GPU computations, and CPU parallelization, which can be additional sources of non-reproducibility.

To use the function, call it immediately after you load the keras package:

library(keras)
use_session_with_seed(42)

# ...rest of code follows...

This function takes all measures known to promote reproducible results from Keras sessions, however it’s possible that various individual features or libraries used by the backend escape its effects. If you encounter non-reproducible results please investigate the possible sources of the problem. The source code for use_session_with_seed() is here: https://github.com/rstudio/tensorflow/blob/master/R/seed.R. Contributions via pull request are very welcome!

Please note again that use_session_with_seed() disables GPU computations and CPU parallelization by default (as both can lead to non-deterministic computations) so should generally not be used when model training time is paramount. You can re-enable GPU computations and/or CPU parallelism using the disable_gpu and disable_parallel_cpu arguments. For example:

library(keras)
use_session_with_seed(42, disable_gpu = FALSE, disable_parallel_cpu = FALSE)

## Where is the Keras configuration filed stored?

The default directory where all Keras data is stored is:

~/.keras/

In case Keras cannot create the above directory (e.g. due to permission issues), /tmp/.keras/ is used as a backup.

The Keras configuration file is a JSON file stored at $HOME/.keras/keras.json. The default configuration file looks like this: { "image_data_format": "channels_last", "epsilon": 1e-07, "floatx": "float32", "backend": "tensorflow" } It contains the following fields: • The image data format to be used as default by image processing layers and utilities (either channels_last or channels_first). • The epsilon numerical fuzz factor to be used to prevent division by zero in some operations. • The default float data type. • The default backend (this will always be “tensorflow” in the R interface to Keras) Likewise, cached dataset files, such as those downloaded with get_file(), are stored by default in $HOME/.keras/datasets/.