Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with dataset_imdb() , each wire is encoded as a sequence of word indexes (same conventions).

dataset_reuters(
path = "reuters.npz",
num_words = NULL,
skip_top = 0L,
maxlen = NULL,
test_split = 0.2,
seed = 113L,
start_char = 1L,
oov_char = 2L,
index_from = 3L
)

dataset_reuters_word_index(path = "reuters_word_index.pkl")

## Arguments

path Where to cache the data (relative to ~/.keras/dataset). Max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept Skip the top N most frequently occuring words (which may not be informative). Truncate sequences after this length. Fraction of the dataset to be used as test data. Random seed for sample shuffling. The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character. words that were cut out because of the num_words or skip_top limit will be replaced with this character. index actual words with this index and higher.

## Value

Lists of training and test data: train$x, train$y, test$x, test$y with same format as dataset_imdb(). The dataset_reuters_word_index() function returns a list where the names are words and the values are integer. e.g. word_index[["giraffe"]] might return 1234.

Other datasets: dataset_boston_housing(), dataset_cifar100(), dataset_cifar10(), dataset_fashion_mnist(), dataset_imdb(), dataset_mnist()