Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with
dataset_imdb() , each wire is encoded as a sequence of word indexes (same
dataset_reuters( path = "reuters.npz", num_words = NULL, skip_top = 0L, maxlen = NULL, test_split = 0.2, seed = 113L, start_char = 1L, oov_char = 2L, index_from = 3L ) dataset_reuters_word_index(path = "reuters_word_index.pkl")
Where to cache the data (relative to
Max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept
Skip the top N most frequently occuring words (which may not be informative).
Truncate sequences after this length.
Fraction of the dataset to be used as test data.
Random seed for sample shuffling.
The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.
words that were cut out because of the
index actual words with this index and higher.
Lists of training and test data:
train$x, train$y, test$x, test$y
with same format as
function returns a list where the names are words and the values are
word_index[["giraffe"]] might return