19.7. d2l Tài liệu API

Việc triển khai các thành viên sau đây của gói d2l và các phần nơi chúng được xác định và giải thích có thể được tìm thấy trong source file.

class d2l.mxnet.Accumulator(n)[source]

Bases: object

For accumulating sums over n variables.

add(*args)[source]
reset()[source]
class d2l.mxnet.AddNorm(dropout, **kwargs)[source]

Bases: mxnet.gluon.block.Block

Residual connection followed by layer normalization.

Defined in Section 10.7

forward(X, Y)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.AdditiveAttention(num_hiddens, dropout, **kwargs)[source]

Bases: mxnet.gluon.block.Block

Additive attention.

Defined in Section 10.3

forward(queries, keys, values, valid_lens)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.Animator(xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1, figsize=(3.5, 2.5))[source]

Bases: object

For plotting data in animation.

add(x, y)[source]
class d2l.mxnet.AttentionDecoder(**kwargs)[source]

Bases: d2l.mxnet.Decoder

The base attention-based decoder interface.

Defined in Section 10.4

property attention_weights
class d2l.mxnet.BERTEncoder(vocab_size, num_hiddens, ffn_num_hiddens, num_heads, num_layers, dropout, max_len=1000, **kwargs)[source]

Bases: mxnet.gluon.block.Block

BERT encoder.

Defined in Section 14.8.4

forward(tokens, segments, valid_lens)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.BERTModel(vocab_size, num_hiddens, ffn_num_hiddens, num_heads, num_layers, dropout, max_len=1000)[source]

Bases: mxnet.gluon.block.Block

The BERT model.

Defined in Section 14.8.5.2

forward(tokens, segments, valid_lens=None, pred_positions=None)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.BPRLoss(weight=None, batch_axis=0, **kwargs)[source]

Bases: mxnet.gluon.loss.Loss

forward(positive, negative)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

class d2l.mxnet.BananasDataset(is_train)[source]

Bases: mxnet.gluon.data.dataset.Dataset

A customized dataset to load the banana detection dataset.

Defined in Section 13.6

class d2l.mxnet.Benchmark(description='Done')[source]

Bases: object

For measuring running time.

class d2l.mxnet.CTRDataset(data_path, feat_mapper=None, defaults=None, min_threshold=4, num_feat=34)[source]

Bases: mxnet.gluon.data.dataset.Dataset

class d2l.mxnet.Decoder(**kwargs)[source]

Bases: mxnet.gluon.block.Block

The base decoder interface for the encoder-decoder architecture.

Defined in Section 9.6

forward(X, state)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

init_state(enc_outputs, *args)[source]
class d2l.mxnet.DotProductAttention(dropout, **kwargs)[source]

Bases: mxnet.gluon.block.Block

Scaled dot product attention.

Defined in Section 10.3.2

forward(queries, keys, values, valid_lens=None)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.Encoder(**kwargs)[source]

Bases: mxnet.gluon.block.Block

The base encoder interface for the encoder-decoder architecture.

forward(X, *args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.EncoderBlock(num_hiddens, ffn_num_hiddens, num_heads, dropout, use_bias=False, **kwargs)[source]

Bases: mxnet.gluon.block.Block

Transformer encoder block.

Defined in Section 10.7

forward(X, valid_lens)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.EncoderDecoder(encoder, decoder, **kwargs)[source]

Bases: mxnet.gluon.block.Block

The base class for the encoder-decoder architecture.

Defined in Section 9.6

forward(enc_X, dec_X, *args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.HingeLossbRec(weight=None, batch_axis=0, **kwargs)[source]

Bases: mxnet.gluon.loss.Loss

forward(positive, negative, margin=1)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

class d2l.mxnet.MaskLM(vocab_size, num_hiddens, **kwargs)[source]

Bases: mxnet.gluon.block.Block

The masked language model task of BERT.

Defined in Section 14.8.4

forward(X, pred_positions)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.MaskedSoftmaxCELoss(axis=- 1, sparse_label=True, from_logits=False, weight=None, batch_axis=0, **kwargs)[source]

Bases: mxnet.gluon.loss.SoftmaxCrossEntropyLoss

The softmax cross-entropy loss with masks.

Defined in Section 9.7.2

forward(pred, label, valid_len)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

class d2l.mxnet.MultiHeadAttention(num_hiddens, num_heads, dropout, use_bias=False, **kwargs)[source]

Bases: mxnet.gluon.block.Block

Multi-head attention.

Defined in Section 10.5

forward(queries, keys, values, valid_lens)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.NextSentencePred(**kwargs)[source]

Bases: mxnet.gluon.block.Block

The next sentence prediction task of BERT.

Defined in Section 14.8.5.1

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.PositionWiseFFN(ffn_num_hiddens, ffn_num_outputs, **kwargs)[source]

Bases: mxnet.gluon.block.Block

Positionwise feed-forward network.

Defined in Section 10.7

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.PositionalEncoding(num_hiddens, dropout, max_len=1000)[source]

Bases: mxnet.gluon.block.Block

Positional encoding.

Defined in Section 10.6

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.RNNModel(rnn_layer, vocab_size, **kwargs)[source]

Bases: mxnet.gluon.block.Block

The RNN model.

Defined in Section 8.6

begin_state(*args, **kwargs)[source]
forward(inputs, state)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.RNNModelScratch(vocab_size, num_hiddens, device, get_params, init_state, forward_fn)[source]

Bases: object

An RNN Model implemented from scratch.

begin_state(batch_size, ctx)[source]
class d2l.mxnet.RandomGenerator(sampling_weights)[source]

Bases: object

Randomly draw among {1, …, n} according to n sampling weights.

draw()[source]
class d2l.mxnet.Residual(num_channels, use_1x1conv=False, strides=1, **kwargs)[source]

Bases: mxnet.gluon.block.Block

The Residual block of ResNet.

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.SNLIDataset(dataset, num_steps, vocab=None)[source]

Bases: mxnet.gluon.data.dataset.Dataset

A customized dataset to load the SNLI dataset.

Defined in Section 15.4

class d2l.mxnet.Seq2SeqEncoder(vocab_size, embed_size, num_hiddens, num_layers, dropout=0, **kwargs)[source]

Bases: d2l.mxnet.Encoder

The RNN encoder for sequence to sequence learning.

Defined in Section 9.7

forward(X, *args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.SeqDataLoader(batch_size, num_steps, use_random_iter, max_tokens)[source]

Bases: object

An iterator to load sequence data.

class d2l.mxnet.Timer[source]

Bases: object

Record multiple running times.

avg()[source]

Return the average time.

cumsum()[source]

Return the accumulated time.

start()[source]

Start the timer.

stop()[source]

Stop the timer and record the time in a list.

sum()[source]

Return the sum of time.

class d2l.mxnet.TokenEmbedding(embedding_name)[source]

Bases: object

Token Embedding.

class d2l.mxnet.TransformerEncoder(vocab_size, num_hiddens, ffn_num_hiddens, num_heads, num_layers, dropout, use_bias=False, **kwargs)[source]

Bases: d2l.mxnet.Encoder

Transformer encoder.

Defined in Section 10.7

forward(X, valid_lens, *args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.VOCSegDataset(is_train, crop_size, voc_dir)[source]

Bases: mxnet.gluon.data.dataset.Dataset

A customized dataset to load the VOC dataset.

Defined in Section 13.9

filter(imgs)[source]

Returns a new dataset with samples filtered by the filter function fn.

Note that if the Dataset is the result of a lazily transformed one with transform(lazy=False), the filter is eagerly applied to the transformed samples without materializing the transformed result. That is, the transformation will be applied again whenever a sample is retrieved after filter().

fncallable

A filter function that takes a sample as input and returns a boolean. Samples that return False are discarded.

Dataset

The filtered dataset.

normalize_image(img)[source]
class d2l.mxnet.Vocab(tokens=None, min_freq=0, reserved_tokens=None)[source]

Bases: object

Vocabulary for text.

to_tokens(indices)[source]
property token_freqs
property unk
d2l.mxnet.accuracy(y_hat, y)[source]

Compute the number of correct predictions.

Defined in Section 3.6

d2l.mxnet.annotate(text, xy, xytext)[source]
d2l.mxnet.argmax(x, *args, **kwargs)
d2l.mxnet.assign_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5)[source]

Assign closest ground-truth bounding boxes to anchor boxes.

Defined in Section 13.4

d2l.mxnet.astype(x, *args, **kwargs)
d2l.mxnet.batchify(data)[source]

Return a minibatch of examples for skip-gram with negative sampling.

Defined in Section 14.3

d2l.mxnet.bbox_to_rect(bbox, color)[source]

Convert bounding box to matplotlib format.

Defined in Section 13.3

d2l.mxnet.bleu(pred_seq, label_seq, k)[source]

Compute the BLEU.

Defined in Section 9.7.4

d2l.mxnet.box_center_to_corner(boxes)[source]

Convert from (center, width, height) to (upper-left, lower-right).

Defined in Section 13.3

d2l.mxnet.box_corner_to_center(boxes)[source]

Convert from (upper-left, lower-right) to (center, width, height).

Defined in Section 13.3

d2l.mxnet.box_iou(boxes1, boxes2)[source]

Compute pairwise IoU across two lists of anchor or bounding boxes.

Defined in Section 13.4

d2l.mxnet.build_array_nmt(lines, vocab, num_steps)[source]

Transform text sequences of machine translation into minibatches.

Defined in Section 9.5.4

d2l.mxnet.copyfile(filename, target_dir)[source]

Copy a file into a target directory.

Defined in Section 13.13

d2l.mxnet.corr2d(X, K)[source]

Compute 2D cross-correlation.

Defined in Section 6.2

d2l.mxnet.count_corpus(tokens)[source]

Count token frequencies.

Defined in Section 8.2

d2l.mxnet.download(name, cache_dir='../data')[source]

Download a file inserted into DATA_HUB, return the local filename.

Defined in Section 4.10

d2l.mxnet.download_all()[source]

Download all files in the DATA_HUB.

Defined in Section 4.10

d2l.mxnet.download_extract(name, folder=None)[source]

Download and extract a zip/tar file.

Defined in Section 4.10

d2l.mxnet.evaluate_accuracy(net, data_iter)[source]

Compute the accuracy for a model on a dataset.

Defined in Section 3.6

d2l.mxnet.evaluate_accuracy_gpu(net, data_iter, device=None)[source]

Compute the accuracy for a model on a dataset using a GPU.

Defined in Section 6.6

d2l.mxnet.evaluate_accuracy_gpus(net, data_iter, split_f=<function split_batch>)[source]

Compute the accuracy for a model on a dataset using multiple GPUs.

Defined in Section 12.6

d2l.mxnet.evaluate_loss(net, data_iter, loss)[source]

Evaluate the loss of a model on the given dataset.

Defined in Section 4.4

d2l.mxnet.evaluate_ranking(net, test_input, seq, candidates, num_users, num_items, devices)[source]
d2l.mxnet.get_centers_and_contexts(corpus, max_window_size)[source]

Return center words and context words in skip-gram.

Defined in Section 14.3

d2l.mxnet.get_data_ch11(batch_size=10, n=1500)[source]

Defined in Section 11.5.2

d2l.mxnet.get_dataloader_workers()[source]

Use 4 processes to read the data except for Windows.

Defined in Section 3.5

d2l.mxnet.get_fashion_mnist_labels(labels)[source]

Return text labels for the Fashion-MNIST dataset.

Defined in Section 3.5

d2l.mxnet.get_negatives(all_contexts, vocab, counter, K)[source]

Return noise words in negative sampling.

Defined in Section 14.3

d2l.mxnet.get_tokens_and_segments(tokens_a, tokens_b=None)[source]

Get tokens of the BERT input sequence and their segment IDs.

Defined in Section 14.8

d2l.mxnet.grad_clipping(net, theta)[source]

Clip the gradient.

Defined in Section 8.5

d2l.mxnet.hit_and_auc(rankedlist, test_matrix, k)[source]
d2l.mxnet.linreg(X, w, b)[source]

The linear regression model.

Defined in Section 3.2

d2l.mxnet.load_array(data_arrays, batch_size, is_train=True)[source]

Construct a Gluon data iterator.

Defined in Section 3.3

d2l.mxnet.load_corpus_time_machine(max_tokens=- 1)[source]

Return token indices and the vocabulary of the time machine dataset.

Defined in Section 8.2

d2l.mxnet.load_data_bananas(batch_size)[source]

Load the banana detection dataset.

Defined in Section 13.6

d2l.mxnet.load_data_fashion_mnist(batch_size, resize=None)[source]

Download the Fashion-MNIST dataset and then load it into memory.

Defined in Section 3.5

d2l.mxnet.load_data_imdb(batch_size, num_steps=500)[source]

Return data iterators and the vocabulary of the IMDb review dataset.

Defined in Section 15.1

d2l.mxnet.load_data_ml100k(data, num_users, num_items, feedback='explicit')[source]
d2l.mxnet.load_data_nmt(batch_size, num_steps, num_examples=600)[source]

Return the iterator and the vocabularies of the translation dataset.

Defined in Section 9.5.4

d2l.mxnet.load_data_ptb(batch_size, max_window_size, num_noise_words)[source]

Download the PTB dataset and then load it into memory.

Defined in Section 14.3.5

d2l.mxnet.load_data_snli(batch_size, num_steps=50)[source]

Download the SNLI dataset and return data iterators and vocabulary.

Defined in Section 15.4

d2l.mxnet.load_data_time_machine(batch_size, num_steps, use_random_iter=False, max_tokens=10000)[source]

Return the iterator and the vocabulary of the time machine dataset.

Defined in Section 8.3

d2l.mxnet.load_data_voc(batch_size, crop_size)[source]

Load the VOC semantic segmentation dataset.

Defined in Section 13.9

d2l.mxnet.load_data_wiki(batch_size, max_len)[source]

Load the WikiText-2 dataset.

Defined in Section 14.9.1.2

d2l.mxnet.masked_softmax(X, valid_lens)[source]

Perform softmax operation by masking elements on the last axis.

Defined in Section 10.3

d2l.mxnet.multibox_detection(cls_probs, offset_preds, anchors, nms_threshold=0.5, pos_threshold=0.009999999)[source]

Predict bounding boxes using non-maximum suppression.

Defined in Section 13.4.4

d2l.mxnet.multibox_prior(data, sizes, ratios)[source]

Generate anchor boxes with different shapes centered on each pixel.

Defined in Section 13.4

d2l.mxnet.multibox_target(anchors, labels)[source]

Label anchor boxes using ground-truth bounding boxes.

Defined in Section 13.4.3

d2l.mxnet.nms(boxes, scores, iou_threshold)[source]

Sort confidence scores of predicted bounding boxes.

Defined in Section 13.4.4

d2l.mxnet.numpy(x, *args, **kwargs)
d2l.mxnet.offset_boxes(anchors, assigned_bb, eps=1e-06)[source]

Transform for anchor box offsets.

Defined in Section 13.4.3

d2l.mxnet.offset_inverse(anchors, offset_preds)[source]

Predict bounding boxes based on anchor boxes with predicted offsets.

Defined in Section 13.4.3

d2l.mxnet.plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5), axes=None)[source]

Plot data points.

Defined in Section 2.4

d2l.mxnet.predict_ch3(net, test_iter, n=6)[source]

Predict labels (defined in Chapter 3).

Defined in Section 3.6

d2l.mxnet.predict_ch8(prefix, num_preds, net, vocab, device)[source]

Generate new characters following the prefix.

Defined in Section 8.5

d2l.mxnet.predict_sentiment(net, vocab, sequence)[source]

Predict the sentiment of a text sequence.

Defined in Section 15.2

d2l.mxnet.predict_seq2seq(net, src_sentence, src_vocab, tgt_vocab, num_steps, device, save_attention_weights=False)[source]

Predict for sequence to sequence.

Defined in Section 9.7.4

d2l.mxnet.predict_snli(net, vocab, premise, hypothesis)[source]

Predict the logical relationship between the premise and hypothesis.

Defined in Section 15.5

d2l.mxnet.preprocess_nmt(text)[source]

Preprocess the English-French dataset.

Defined in Section 9.5

d2l.mxnet.read_csv_labels(fname)[source]

Read fname to return a filename to label dictionary.

Defined in Section 13.13

d2l.mxnet.read_data_bananas(is_train=True)[source]

Read the banana detection dataset images and labels.

Defined in Section 13.6

d2l.mxnet.read_data_ml100k()[source]
d2l.mxnet.read_data_nmt()[source]

Load the English-French dataset.

Defined in Section 9.5

d2l.mxnet.read_imdb(data_dir, is_train)[source]

Read the IMDb review dataset text sequences and labels.

Defined in Section 15.1

d2l.mxnet.read_ptb()[source]

Load the PTB dataset into a list of text lines.

Defined in Section 14.3

d2l.mxnet.read_snli(data_dir, is_train)[source]

Read the SNLI dataset into premises, hypotheses, and labels.

Defined in Section 15.4

d2l.mxnet.read_time_machine()[source]

Load the time machine dataset into a list of text lines.

Defined in Section 8.2

d2l.mxnet.read_voc_images(voc_dir, is_train=True)[source]

Read all VOC feature and label images.

Defined in Section 13.9

d2l.mxnet.reduce_sum(x, *args, **kwargs)
d2l.mxnet.reorg_test(data_dir)[source]

Organize the testing set for data loading during prediction.

Defined in Section 13.13

d2l.mxnet.reorg_train_valid(data_dir, labels, valid_ratio)[source]

Split the validation set out of the original training set.

Defined in Section 13.13

d2l.mxnet.reshape(x, *args, **kwargs)
d2l.mxnet.resnet18(num_classes)[source]

A slightly modified ResNet-18 model.

Defined in Section 12.6

d2l.mxnet.seq_data_iter_random(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using random sampling.

Defined in Section 8.3

d2l.mxnet.seq_data_iter_sequential(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using sequential partitioning.

Defined in Section 8.3

d2l.mxnet.set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)[source]

Set the axes for matplotlib.

Defined in Section 2.4

d2l.mxnet.set_figsize(figsize=(3.5, 2.5))[source]

Set the figure size for matplotlib.

Defined in Section 2.4

d2l.mxnet.sgd(params, lr, batch_size)[source]

Minibatch stochastic gradient descent.

Defined in Section 3.2

d2l.mxnet.show_bboxes(axes, bboxes, labels=None, colors=None)[source]

Show bounding boxes.

Defined in Section 13.4

d2l.mxnet.show_heatmaps(matrices, xlabel, ylabel, titles=None, figsize=(2.5, 2.5), cmap='Reds')[source]

Show heatmaps of matrices.

Defined in Section 10.1

d2l.mxnet.show_images(imgs, num_rows, num_cols, titles=None, scale=1.5)[source]

Plot a list of images.

Defined in Section 3.5

d2l.mxnet.show_list_len_pair_hist(legend, xlabel, ylabel, xlist, ylist)[source]

Plot the histogram for list length pairs.

Defined in Section 9.5

d2l.mxnet.show_trace_2d(f, results)[source]

Show the trace of 2D variables during optimization.

Defined in Section 11.3.1.1

d2l.mxnet.size(a)
d2l.mxnet.split_and_load_ml100k(split_mode='seq-aware', feedback='explicit', test_ratio=0.1, batch_size=256)[source]
d2l.mxnet.split_batch(X, y, devices)[source]

Split X and y into multiple devices.

Defined in Section 12.5

d2l.mxnet.split_batch_multi_inputs(X, y, devices)[source]

Split multi-input X and y into multiple devices.

Defined in Section 15.5

d2l.mxnet.split_data_ml100k(data, num_users, num_items, split_mode='random', test_ratio=0.1)[source]

Split the dataset in random mode or seq-aware mode.

d2l.mxnet.squared_loss(y_hat, y)[source]

Squared loss.

Defined in Section 3.2

d2l.mxnet.subsample(sentences, vocab)[source]

Subsample high-frequency words.

Defined in Section 14.3

d2l.mxnet.synthetic_data(w, b, num_examples)[source]

Generate y = Xw + b + noise.

Defined in Section 3.2

d2l.mxnet.to(x, *args, **kwargs)
d2l.mxnet.tokenize(lines, token='word')[source]

Split text lines into word or character tokens.

Defined in Section 8.2

d2l.mxnet.tokenize_nmt(text, num_examples=None)[source]

Tokenize the English-French dataset.

Defined in Section 9.5

d2l.mxnet.train_2d(trainer, steps=20, f_grad=None)[source]

Optimize a 2D objective function with a customized trainer.

Defined in Section 11.3.1.1

d2l.mxnet.train_batch_ch13(net, features, labels, loss, trainer, devices, split_f=<function split_batch>)[source]

Train for a minibatch with mutiple GPUs (defined in Chapter 13).

Defined in Section 13.1

d2l.mxnet.train_ch11(trainer_fn, states, hyperparams, data_iter, feature_dim, num_epochs=2)[source]

Defined in Section 11.5.2

d2l.mxnet.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices=[gpu(0), gpu(1), gpu(2), gpu(3)], split_f=<function split_batch>)[source]

Train a model with mutiple GPUs (defined in Chapter 13).

Defined in Section 13.1

d2l.mxnet.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)[source]

Train a model (defined in Chapter 3).

Defined in Section 3.6

d2l.mxnet.train_ch6(net, train_iter, test_iter, num_epochs, lr, device)[source]

Train a model with a GPU (defined in Chapter 6).

Defined in Section 6.6

d2l.mxnet.train_ch8(net, train_iter, vocab, lr, num_epochs, device, use_random_iter=False)[source]

Train a model (defined in Chapter 8).

Defined in Section 8.5

d2l.mxnet.train_concise_ch11(tr_name, hyperparams, data_iter, num_epochs=2)[source]

Defined in Section 11.5.2

d2l.mxnet.train_epoch_ch3(net, train_iter, loss, updater)[source]

Train a model within one epoch (defined in Chapter 3).

Defined in Section 3.6

d2l.mxnet.train_epoch_ch8(net, train_iter, loss, updater, device, use_random_iter)[source]

Train a model within one epoch (defined in Chapter 8).

Defined in Section 8.5

d2l.mxnet.train_ranking(net, train_iter, test_iter, loss, trainer, test_seq_iter, num_users, num_items, num_epochs, devices, evaluator, candidates, eval_step=1)[source]
d2l.mxnet.train_recsys_rating(net, train_iter, test_iter, loss, trainer, num_epochs, devices=[gpu(0), gpu(1), gpu(2), gpu(3)], evaluator=None, **kwargs)[source]
d2l.mxnet.train_seq2seq(net, data_iter, lr, num_epochs, tgt_vocab, device)[source]

Train a model for sequence to sequence.

Defined in Section 9.7.2

d2l.mxnet.transpose(a)
d2l.mxnet.transpose_output(X, num_heads)[source]

Reverse the operation of transpose_qkv.

Defined in Section 10.5

d2l.mxnet.transpose_qkv(X, num_heads)[source]

Transposition for parallel computation of multiple attention heads.

Defined in Section 10.5

d2l.mxnet.truncate_pad(line, num_steps, padding_token)[source]

Truncate or pad sequences.

Defined in Section 9.5

d2l.mxnet.try_all_gpus()[source]

Return all available GPUs, or [cpu()] if no GPU exists.

Defined in Section 5.6

d2l.mxnet.try_gpu(i=0)[source]

Return gpu(i) if exists, otherwise return cpu().

Defined in Section 5.6

d2l.mxnet.update_D(X, Z, net_D, net_G, loss, trainer_D)[source]

Update discriminator.

Defined in Section 17.1

d2l.mxnet.update_G(Z, net_D, net_G, loss, trainer_G)[source]

Update generator.

Defined in Section 17.1

d2l.mxnet.use_svg_display()[source]

Use the svg format to display a plot in Jupyter.

Defined in Section 2.4

d2l.mxnet.voc_colormap2label()[source]

Build the mapping from RGB to class indices for VOC labels.

Defined in Section 13.9

d2l.mxnet.voc_label_indices(colormap, colormap2label)[source]

Map any RGB values in VOC labels to their class indices.

Defined in Section 13.9

d2l.mxnet.voc_rand_crop(feature, label, height, width)[source]

Randomly crop both feature and label images.

Defined in Section 13.9

class d2l.torch.Accumulator(n)[source]

Bases: object

For accumulating sums over n variables.

add(*args)[source]
reset()[source]
class d2l.torch.AddNorm(normalized_shape, dropout, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Residual connection followed by layer normalization.

Defined in Section 10.7

forward(X, Y)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.AdditiveAttention(key_size, query_size, num_hiddens, dropout, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Additive attention.

Defined in Section 10.3

forward(queries, keys, values, valid_lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.Animator(xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1, figsize=(3.5, 2.5))[source]

Bases: object

For plotting data in animation.

add(x, y)[source]
class d2l.torch.AttentionDecoder(**kwargs)[source]

Bases: d2l.torch.Decoder

The base attention-based decoder interface.

Defined in Section 10.4

property attention_weights
training: bool
class d2l.torch.BERTEncoder(vocab_size, num_hiddens, norm_shape, ffn_num_input, ffn_num_hiddens, num_heads, num_layers, dropout, max_len=1000, key_size=768, query_size=768, value_size=768, **kwargs)[source]

Bases: torch.nn.modules.module.Module

BERT encoder.

Defined in Section 14.8.4

forward(tokens, segments, valid_lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.BERTModel(vocab_size, num_hiddens, norm_shape, ffn_num_input, ffn_num_hiddens, num_heads, num_layers, dropout, max_len=1000, key_size=768, query_size=768, value_size=768, hid_in_features=768, mlm_in_features=768, nsp_in_features=768)[source]

Bases: torch.nn.modules.module.Module

The BERT model.

Defined in Section 14.8.5.2

forward(tokens, segments, valid_lens=None, pred_positions=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.BananasDataset(is_train)[source]

Bases: torch.utils.data.dataset.Dataset

A customized dataset to load the banana detection dataset.

Defined in Section 13.6

class d2l.torch.Benchmark(description='Done')[source]

Bases: object

For measuring running time.

class d2l.torch.Decoder(**kwargs)[source]

Bases: torch.nn.modules.module.Module

The base decoder interface for the encoder-decoder architecture.

Defined in Section 9.6

forward(X, state)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_state(enc_outputs, *args)[source]
training: bool
class d2l.torch.DotProductAttention(dropout, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Scaled dot product attention.

Defined in Section 10.3.2

forward(queries, keys, values, valid_lens=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.Encoder(**kwargs)[source]

Bases: torch.nn.modules.module.Module

The base encoder interface for the encoder-decoder architecture.

forward(X, *args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.EncoderBlock(key_size, query_size, value_size, num_hiddens, norm_shape, ffn_num_input, ffn_num_hiddens, num_heads, dropout, use_bias=False, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Transformer encoder block.

Defined in Section 10.7

forward(X, valid_lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.EncoderDecoder(encoder, decoder, **kwargs)[source]

Bases: torch.nn.modules.module.Module

The base class for the encoder-decoder architecture.

Defined in Section 9.6

forward(enc_X, dec_X, *args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.MaskLM(vocab_size, num_hiddens, num_inputs=768, **kwargs)[source]

Bases: torch.nn.modules.module.Module

The masked language model task of BERT.

Defined in Section 14.8.4

forward(X, pred_positions)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.MaskedSoftmaxCELoss(weight: Optional[torch.Tensor] = None, size_average=None, ignore_index: int = - 100, reduce=None, reduction: str = 'mean', label_smoothing: float = 0.0)[source]

Bases: torch.nn.modules.loss.CrossEntropyLoss

The softmax cross-entropy loss with masks.

Defined in Section 9.7.2

forward(pred, label, valid_len)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

ignore_index: int
label_smoothing: float
class d2l.torch.MultiHeadAttention(key_size, query_size, value_size, num_hiddens, num_heads, dropout, bias=False, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Multi-head attention.

Defined in Section 10.5

forward(queries, keys, values, valid_lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.NextSentencePred(num_inputs, **kwargs)[source]

Bases: torch.nn.modules.module.Module

The next sentence prediction task of BERT.

Defined in Section 14.8.5.1

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.PositionWiseFFN(ffn_num_input, ffn_num_hiddens, ffn_num_outputs, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Positionwise feed-forward network.

Defined in Section 10.7

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.PositionalEncoding(num_hiddens, dropout, max_len=1000)[source]

Bases: torch.nn.modules.module.Module

Positional encoding.

Defined in Section 10.6

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.RNNModel(rnn_layer, vocab_size, **kwargs)[source]

Bases: torch.nn.modules.module.Module

The RNN model.

Defined in Section 8.6

begin_state(device, batch_size=1)[source]
forward(inputs, state)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.RNNModelScratch(vocab_size, num_hiddens, device, get_params, init_state, forward_fn)[source]

Bases: object

A RNN Model implemented from scratch.

begin_state(batch_size, device)[source]
class d2l.torch.RandomGenerator(sampling_weights)[source]

Bases: object

Randomly draw among {1, …, n} according to n sampling weights.

draw()[source]
class d2l.torch.Residual(input_channels, num_channels, use_1x1conv=False, strides=1)[source]

Bases: torch.nn.modules.module.Module

The Residual block of ResNet.

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.SNLIDataset(dataset, num_steps, vocab=None)[source]

Bases: torch.utils.data.dataset.Dataset

A customized dataset to load the SNLI dataset.

Defined in Section 15.4

class d2l.torch.Seq2SeqEncoder(vocab_size, embed_size, num_hiddens, num_layers, dropout=0, **kwargs)[source]

Bases: d2l.torch.Encoder

The RNN encoder for sequence to sequence learning.

Defined in Section 9.7

forward(X, *args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.SeqDataLoader(batch_size, num_steps, use_random_iter, max_tokens)[source]

Bases: object

An iterator to load sequence data.

class d2l.torch.Timer[source]

Bases: object

Record multiple running times.

avg()[source]

Return the average time.

cumsum()[source]

Return the accumulated time.

start()[source]

Start the timer.

stop()[source]

Stop the timer and record the time in a list.

sum()[source]

Return the sum of time.

class d2l.torch.TokenEmbedding(embedding_name)[source]

Bases: object

Token Embedding.

class d2l.torch.TransformerEncoder(vocab_size, key_size, query_size, value_size, num_hiddens, norm_shape, ffn_num_input, ffn_num_hiddens, num_heads, num_layers, dropout, use_bias=False, **kwargs)[source]

Bases: d2l.torch.Encoder

Transformer encoder.

Defined in Section 10.7

forward(X, valid_lens, *args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class d2l.torch.VOCSegDataset(is_train, crop_size, voc_dir)[source]

Bases: torch.utils.data.dataset.Dataset

A customized dataset to load the VOC dataset.

Defined in Section 13.9

filter(imgs)[source]
normalize_image(img)[source]
class d2l.torch.Vocab(tokens=None, min_freq=0, reserved_tokens=None)[source]

Bases: object

Vocabulary for text.

to_tokens(indices)[source]
property token_freqs
property unk
d2l.torch.abs(input, *, out=None) Tensor

Computes the absolute value of each element in input.

(19.7.1)\[\text{out}_{i} = |\text{input}_{i}|\]
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> torch.abs(torch.tensor([-1, -2, 3]))
tensor([ 1,  2,  3])
d2l.torch.accuracy(y_hat, y)[source]

Compute the number of correct predictions.

Defined in Section 3.6

d2l.torch.annotate(text, xy, xytext)[source]
d2l.torch.arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) Tensor

Returns a 1-D tensor of size \(\left\lceil \frac{\text{end} - \text{start}}{\text{step}} \right\rceil\) with values from the interval [start, end) taken with common difference step beginning from start.

Note that non-integer step is subject to floating point rounding errors when comparing against end; to avoid inconsistency, we advise adding a small epsilon to end in such cases.

(19.7.2)\[\text{out}_{{i+1}} = \text{out}_{i} + \text{step}\]
Args:

start (Number): the starting value for the set of points. Default: 0. end (Number): the ending value for the set of points step (Number): the gap between each pair of adjacent points. Default: 1.

Keyword args:

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()). If dtype is not given, infer the data type from the other input arguments. If any of start, end, or stop are floating-point, the dtype is inferred to be the default dtype, see get_default_dtype(). Otherwise, the dtype is inferred to be torch.int64.

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.arange(5)
tensor([ 0,  1,  2,  3,  4])
>>> torch.arange(1, 4)
tensor([ 1,  2,  3])
>>> torch.arange(1, 2.5, 0.5)
tensor([ 1.0000,  1.5000,  2.0000])
d2l.torch.argmax(x, *args, **kwargs)
d2l.torch.assign_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5)[source]

Assign closest ground-truth bounding boxes to anchor boxes.

Defined in Section 13.4

d2l.torch.astype(x, *args, **kwargs)
d2l.torch.batchify(data)[source]

Return a minibatch of examples for skip-gram with negative sampling.

Defined in Section 14.3

d2l.torch.bbox_to_rect(bbox, color)[source]

Convert bounding box to matplotlib format.

Defined in Section 13.3

d2l.torch.bleu(pred_seq, label_seq, k)[source]

Compute the BLEU.

Defined in Section 9.7.4

d2l.torch.box_center_to_corner(boxes)[source]

Convert from (center, width, height) to (upper-left, lower-right).

Defined in Section 13.3

d2l.torch.box_corner_to_center(boxes)[source]

Convert from (upper-left, lower-right) to (center, width, height).

Defined in Section 13.3

d2l.torch.box_iou(boxes1, boxes2)[source]

Compute pairwise IoU across two lists of anchor or bounding boxes.

Defined in Section 13.4

d2l.torch.build_array_nmt(lines, vocab, num_steps)[source]

Transform text sequences of machine translation into minibatches.

Defined in Section 9.5.4

d2l.torch.concat()

cat(tensors, dim=0, *, out=None) -> Tensor

Concatenates the given sequence of seq tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty.

torch.cat() can be seen as an inverse operation for torch.split() and torch.chunk().

torch.cat() can be best understood via examples.

Args:
tensors (sequence of Tensors): any python sequence of tensors of the same type.

Non-empty tensors provided must have the same shape, except in the cat dimension.

dim (int, optional): the dimension over which the tensors are concatenated

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> x = torch.randn(2, 3)
>>> x
tensor([[ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 0)
tensor([[ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497],
        [ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497],
        [ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 1)
tensor([[ 0.6580, -1.0969, -0.4614,  0.6580, -1.0969, -0.4614,  0.6580,
         -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497, -0.1034, -0.5790,  0.1497, -0.1034,
         -0.5790,  0.1497]])
d2l.torch.copyfile(filename, target_dir)[source]

Copy a file into a target directory.

Defined in Section 13.13

d2l.torch.corr2d(X, K)[source]

Compute 2D cross-correlation.

Defined in Section 6.2

d2l.torch.cos(input, *, out=None) Tensor

Returns a new tensor with the cosine of the elements of input.

(19.7.3)\[\text{out}_{i} = \cos(\text{input}_{i})\]
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([ 1.4309,  1.2706, -0.8562,  0.9796])
>>> torch.cos(a)
tensor([ 0.1395,  0.2957,  0.6553,  0.5574])
d2l.torch.cosh(input, *, out=None) Tensor

Returns a new tensor with the hyperbolic cosine of the elements of input.

(19.7.4)\[\text{out}_{i} = \cosh(\text{input}_{i})\]
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([ 0.1632,  1.1835, -0.6979, -0.7325])
>>> torch.cosh(a)
tensor([ 1.0133,  1.7860,  1.2536,  1.2805])

Note

When input is on the CPU, the implementation of torch.cosh may use the Sleef library, which rounds very large results to infinity or negative infinity. See here for details.

d2l.torch.count_corpus(tokens)[source]

Count token frequencies.

Defined in Section 8.2

d2l.torch.download(name, cache_dir='../data')[source]

Download a file inserted into DATA_HUB, return the local filename.

Defined in Section 4.10

d2l.torch.download_all()[source]

Download all files in the DATA_HUB.

Defined in Section 4.10

d2l.torch.download_extract(name, folder=None)[source]

Download and extract a zip/tar file.

Defined in Section 4.10

d2l.torch.evaluate_accuracy(net, data_iter)[source]

Compute the accuracy for a model on a dataset.

Defined in Section 3.6

d2l.torch.evaluate_accuracy_gpu(net, data_iter, device=None)[source]

Compute the accuracy for a model on a dataset using a GPU.

Defined in Section 6.6

d2l.torch.evaluate_loss(net, data_iter, loss)[source]

Evaluate the loss of a model on the given dataset.

Defined in Section 4.4

d2l.torch.exp(input, *, out=None) Tensor

Returns a new tensor with the exponential of the elements of the input tensor input.

(19.7.5)\[y_{i} = e^{x_{i}}\]
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> torch.exp(torch.tensor([0, math.log(2.)]))
tensor([ 1.,  2.])
d2l.torch.eye(n, m=None, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) Tensor

Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.

Args:

n (int): the number of rows m (int, optional): the number of columns with default being n

Keyword arguments:

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Returns:

Tensor: A 2-D tensor with ones on the diagonal and zeros elsewhere

Example:

>>> torch.eye(3)
tensor([[ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.]])
d2l.torch.get_centers_and_contexts(corpus, max_window_size)[source]

Return center words and context words in skip-gram.

Defined in Section 14.3

d2l.torch.get_data_ch11(batch_size=10, n=1500)[source]

Defined in Section 11.5.2

d2l.torch.get_dataloader_workers()[source]

Use 4 processes to read the data.

Defined in Section 3.5

d2l.torch.get_fashion_mnist_labels(labels)[source]

Return text labels for the Fashion-MNIST dataset.

Defined in Section 3.5

d2l.torch.get_negatives(all_contexts, vocab, counter, K)[source]

Return noise words in negative sampling.

Defined in Section 14.3

d2l.torch.get_tokens_and_segments(tokens_a, tokens_b=None)[source]

Get tokens of the BERT input sequence and their segment IDs.

Defined in Section 14.8

d2l.torch.grad_clipping(net, theta)[source]

Clip the gradient.

Defined in Section 8.5

d2l.torch.linreg(X, w, b)[source]

The linear regression model.

Defined in Section 3.2

d2l.torch.linspace(start, end, steps, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) Tensor

Creates a one-dimensional tensor of size steps whose values are evenly spaced from start to end, inclusive. That is, the value are:

(19.7.6)\[(\text{start}, \text{start} + \frac{\text{end} - \text{start}}{\text{steps} - 1}, \ldots, \text{start} + (\text{steps} - 2) * \frac{\text{end} - \text{start}}{\text{steps} - 1}, \text{end})\]

From PyTorch 1.11 linspace requires the steps argument. Use steps=100 to restore the previous behavior.

Args:

start (float): the starting value for the set of points end (float): the ending value for the set of points steps (int): size of the constructed tensor

Keyword arguments:

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the data type to perform the computation in.

Default: if None, uses the global default dtype (see torch.get_default_dtype()) when both start and end are real, and corresponding complex dtype when either is complex.

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.linspace(3, 10, steps=5)
tensor([  3.0000,   4.7500,   6.5000,   8.2500,  10.0000])
>>> torch.linspace(-10, 10, steps=5)
tensor([-10.,  -5.,   0.,   5.,  10.])
>>> torch.linspace(start=-10, end=10, steps=5)
tensor([-10.,  -5.,   0.,   5.,  10.])
>>> torch.linspace(start=-10, end=10, steps=1)
tensor([-10.])
d2l.torch.load_array(data_arrays, batch_size, is_train=True)[source]

Construct a PyTorch data iterator.

Defined in Section 3.3

d2l.torch.load_corpus_time_machine(max_tokens=- 1)[source]

Return token indices and the vocabulary of the time machine dataset.

Defined in Section 8.2

d2l.torch.load_data_bananas(batch_size)[source]

Load the banana detection dataset.

Defined in Section 13.6

d2l.torch.load_data_fashion_mnist(batch_size, resize=None)[source]

Download the Fashion-MNIST dataset and then load it into memory.

Defined in Section 3.5

d2l.torch.load_data_imdb(batch_size, num_steps=500)[source]

Return data iterators and the vocabulary of the IMDb review dataset.

Defined in Section 15.1

d2l.torch.load_data_nmt(batch_size, num_steps, num_examples=600)[source]

Return the iterator and the vocabularies of the translation dataset.

Defined in Section 9.5.4

d2l.torch.load_data_ptb(batch_size, max_window_size, num_noise_words)[source]

Download the PTB dataset and then load it into memory.

Defined in Section 14.3.5

d2l.torch.load_data_snli(batch_size, num_steps=50)[source]

Download the SNLI dataset and return data iterators and vocabulary.

Defined in Section 15.4

d2l.torch.load_data_time_machine(batch_size, num_steps, use_random_iter=False, max_tokens=10000)[source]

Return the iterator and the vocabulary of the time machine dataset.

Defined in Section 8.3

d2l.torch.load_data_voc(batch_size, crop_size)[source]

Load the VOC semantic segmentation dataset.

Defined in Section 13.9

d2l.torch.load_data_wiki(batch_size, max_len)[source]

Load the WikiText-2 dataset.

Defined in Section 14.9.1.2

d2l.torch.log(input, *, out=None) Tensor

Returns a new tensor with the natural logarithm of the elements of input.

(19.7.7)\[y_{i} = \log_{e} (x_{i})\]
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(5)
>>> a
tensor([-0.7168, -0.5471, -0.8933, -1.4428, -0.1190])
>>> torch.log(a)
tensor([ nan,  nan,  nan,  nan,  nan])
d2l.torch.masked_softmax(X, valid_lens)[source]

Perform softmax operation by masking elements on the last axis.

Defined in Section 10.3

d2l.torch.matmul(input, other, *, out=None) Tensor

Matrix product of two tensors.

The behavior depends on the dimensionality of the tensors as follows:

  • If both tensors are 1-dimensional, the dot product (scalar) is returned.

  • If both arguments are 2-dimensional, the matrix-matrix product is returned.

  • If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.

  • If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.

  • If both arguments are at least 1-dimensional and at least one argument is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second argument is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if input is a \((j \times 1 \times n \times n)\) tensor and other is a \((k \times n \times n)\) tensor, out will be a \((j \times k \times n \times n)\) tensor.

    Note that the broadcasting logic only looks at the batch dimensions when determining if the inputs are broadcastable, and not the matrix dimensions. For example, if input is a \((j \times 1 \times n \times m)\) tensor and other is a \((k \times m \times p)\) tensor, these inputs are valid for broadcasting even though the final two dimensions (i.e. the matrix dimensions) are different. out will be a \((j \times k \times n \times p)\) tensor.

This operator supports TensorFloat32.

Note

The 1-dimensional dot product version of this function does not support an out parameter.

Arguments:

input (Tensor): the first tensor to be multiplied other (Tensor): the second tensor to be multiplied

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> # vector x vector
>>> tensor1 = torch.randn(3)
>>> tensor2 = torch.randn(3)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([])
>>> # matrix x vector
>>> tensor1 = torch.randn(3, 4)
>>> tensor2 = torch.randn(4)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([3])
>>> # batched matrix x broadcasted vector
>>> tensor1 = torch.randn(10, 3, 4)
>>> tensor2 = torch.randn(4)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([10, 3])
>>> # batched matrix x batched matrix
>>> tensor1 = torch.randn(10, 3, 4)
>>> tensor2 = torch.randn(10, 4, 5)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([10, 3, 5])
>>> # batched matrix x broadcasted matrix
>>> tensor1 = torch.randn(10, 3, 4)
>>> tensor2 = torch.randn(4, 5)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([10, 3, 5])
d2l.torch.multibox_detection(cls_probs, offset_preds, anchors, nms_threshold=0.5, pos_threshold=0.009999999)[source]

Predict bounding boxes using non-maximum suppression.

Defined in Section 13.4.4

d2l.torch.multibox_prior(data, sizes, ratios)[source]

Generate anchor boxes with different shapes centered on each pixel.

Defined in Section 13.4

d2l.torch.multibox_target(anchors, labels)[source]

Label anchor boxes using ground-truth bounding boxes.

Defined in Section 13.4.3

d2l.torch.nms(boxes, scores, iou_threshold)[source]

Sort confidence scores of predicted bounding boxes.

Defined in Section 13.4.4

d2l.torch.normal(mean, std, *, generator=None, out=None) Tensor

Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given.

The mean is a tensor with the mean of each output element’s normal distribution

The std is a tensor with the standard deviation of each output element’s normal distribution

The shapes of mean and std don’t need to match, but the total number of elements in each tensor need to be the same.

Note

When the shapes do not match, the shape of mean is used as the shape for the returned output tensor

Note

When std is a CUDA tensor, this function synchronizes its device with the CPU.

Args:

mean (Tensor): the tensor of per-element means std (Tensor): the tensor of per-element standard deviations

Keyword args:

generator (torch.Generator, optional): a pseudorandom number generator for sampling out (Tensor, optional): the output tensor.

Example:

>>> torch.normal(mean=torch.arange(1., 11.), std=torch.arange(1, 0, -0.1))
tensor([  1.0425,   3.5672,   2.7969,   4.2925,   4.7229,   6.2134,
          8.0505,   8.1408,   9.0563,  10.0566])
d2l.torch.normal(mean=0.0, std, *, out=None) Tensor

Similar to the function above, but the means are shared among all drawn elements.

Args:

mean (float, optional): the mean for all distributions std (Tensor): the tensor of per-element standard deviations

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> torch.normal(mean=0.5, std=torch.arange(1., 6.))
tensor([-1.2793, -1.0732, -2.0687,  5.1177, -1.2303])
d2l.torch.normal(mean, std=1.0, *, out=None) Tensor

Similar to the function above, but the standard deviations are shared among all drawn elements.

Args:

mean (Tensor): the tensor of per-element means std (float, optional): the standard deviation for all distributions

Keyword args:

out (Tensor, optional): the output tensor

Example:

>>> torch.normal(mean=torch.arange(1., 6.))
tensor([ 1.1552,  2.6148,  2.6535,  5.8318,  4.2361])
d2l.torch.normal(mean, std, size, *, out=None) Tensor

Similar to the function above, but the means and standard deviations are shared among all drawn elements. The resulting tensor has size given by size.

Args:

mean (float): the mean for all distributions std (float): the standard deviation for all distributions size (int…): a sequence of integers defining the shape of the output tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> torch.normal(2, 3, size=(1, 4))
tensor([[-1.3987, -1.9544,  3.6048,  0.7909]])
d2l.torch.numpy(x, *args, **kwargs)
d2l.torch.offset_boxes(anchors, assigned_bb, eps=1e-06)[source]

Transform for anchor box offsets.

Defined in Section 13.4.3

d2l.torch.offset_inverse(anchors, offset_preds)[source]

Predict bounding boxes based on anchor boxes with predicted offsets.

Defined in Section 13.4.3

d2l.torch.ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) Tensor

Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument size.

Args:
size (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

Keyword arguments:

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.ones(2, 3)
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

>>> torch.ones(5)
tensor([ 1.,  1.,  1.,  1.,  1.])
d2l.torch.plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5), axes=None)[source]

Plot data points.

Defined in Section 2.4

d2l.torch.predict_ch3(net, test_iter, n=6)[source]

Predict labels (defined in Chapter 3).

Defined in Section 3.6

d2l.torch.predict_ch8(prefix, num_preds, net, vocab, device)[source]

Generate new characters following the prefix.

Defined in Section 8.5

d2l.torch.predict_sentiment(net, vocab, sequence)[source]

Predict the sentiment of a text sequence.

Defined in Section 15.2

d2l.torch.predict_seq2seq(net, src_sentence, src_vocab, tgt_vocab, num_steps, device, save_attention_weights=False)[source]

Predict for sequence to sequence.

Defined in Section 9.7.4

d2l.torch.predict_snli(net, vocab, premise, hypothesis)[source]

Predict the logical relationship between the premise and hypothesis.

Defined in Section 15.5

d2l.torch.preprocess_nmt(text)[source]

Preprocess the English-French dataset.

Defined in Section 9.5

d2l.torch.rand(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) Tensor

Returns a tensor filled with random numbers from a uniform distribution on the interval \([0, 1)\)

The shape of the tensor is defined by the variable argument size.

Args:
size (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

Keyword args:

generator (torch.Generator, optional): a pseudorandom number generator for sampling out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.rand(4)
tensor([ 0.5204,  0.2503,  0.3525,  0.5673])
>>> torch.rand(2, 3)
tensor([[ 0.8237,  0.5781,  0.6879],
        [ 0.3816,  0.7249,  0.0998]])
d2l.torch.read_csv_labels(fname)[source]

Read fname to return a filename to label dictionary.

Defined in Section 13.13

d2l.torch.read_data_bananas(is_train=True)[source]

Read the banana detection dataset images and labels.

Defined in Section 13.6

d2l.torch.read_data_nmt()[source]

Load the English-French dataset.

Defined in Section 9.5

d2l.torch.read_imdb(data_dir, is_train)[source]

Read the IMDb review dataset text sequences and labels.

Defined in Section 15.1

d2l.torch.read_ptb()[source]

Load the PTB dataset into a list of text lines.

Defined in Section 14.3

d2l.torch.read_snli(data_dir, is_train)[source]

Read the SNLI dataset into premises, hypotheses, and labels.

Defined in Section 15.4

d2l.torch.read_time_machine()[source]

Load the time machine dataset into a list of text lines.

Defined in Section 8.2

d2l.torch.read_voc_images(voc_dir, is_train=True)[source]

Read all VOC feature and label images.

Defined in Section 13.9

d2l.torch.reduce_sum(x, *args, **kwargs)
d2l.torch.reorg_test(data_dir)[source]

Organize the testing set for data loading during prediction.

Defined in Section 13.13

d2l.torch.reorg_train_valid(data_dir, labels, valid_ratio)[source]

Split the validation set out of the original training set.

Defined in Section 13.13

d2l.torch.reshape(x, *args, **kwargs)
d2l.torch.resnet18(num_classes, in_channels=1)[source]

A slightly modified ResNet-18 model.

Defined in Section 12.6

d2l.torch.seq_data_iter_random(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using random sampling.

Defined in Section 8.3

d2l.torch.seq_data_iter_sequential(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using sequential partitioning.

Defined in Section 8.3

d2l.torch.sequence_mask(X, valid_len, value=0)[source]

Mask irrelevant entries in sequences.

Defined in Section 9.7.2

d2l.torch.set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)[source]

Set the axes for matplotlib.

Defined in Section 2.4

d2l.torch.set_figsize(figsize=(3.5, 2.5))[source]

Set the figure size for matplotlib.

Defined in Section 2.4

d2l.torch.sgd(params, lr, batch_size)[source]

Minibatch stochastic gradient descent.

Defined in Section 3.2

d2l.torch.show_bboxes(axes, bboxes, labels=None, colors=None)[source]

Show bounding boxes.

Defined in Section 13.4

d2l.torch.show_heatmaps(matrices, xlabel, ylabel, titles=None, figsize=(2.5, 2.5), cmap='Reds')[source]

Show heatmaps of matrices.

Defined in Section 10.1

d2l.torch.show_images(imgs, num_rows, num_cols, titles=None, scale=1.5)[source]

Plot a list of images.

Defined in Section 3.5

d2l.torch.show_list_len_pair_hist(legend, xlabel, ylabel, xlist, ylist)[source]

Plot the histogram for list length pairs.

Defined in Section 9.5

d2l.torch.show_trace_2d(f, results)[source]

Show the trace of 2D variables during optimization.

Defined in Section 11.3.1.1

d2l.torch.sin(input, *, out=None) Tensor

Returns a new tensor with the sine of the elements of input.

(19.7.8)\[\text{out}_{i} = \sin(\text{input}_{i})\]
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([-0.5461,  0.1347, -2.7266, -0.2746])
>>> torch.sin(a)
tensor([-0.5194,  0.1343, -0.4032, -0.2711])
d2l.torch.sinh(input, *, out=None) Tensor

Returns a new tensor with the hyperbolic sine of the elements of input.

(19.7.9)\[\text{out}_{i} = \sinh(\text{input}_{i})\]
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([ 0.5380, -0.8632, -0.1265,  0.9399])
>>> torch.sinh(a)
tensor([ 0.5644, -0.9744, -0.1268,  1.0845])

Note

When input is on the CPU, the implementation of torch.sinh may use the Sleef library, which rounds very large results to infinity or negative infinity. See here for details.

d2l.torch.size(x, *args, **kwargs)
d2l.torch.split_batch(X, y, devices)[source]

Split X and y into multiple devices.

Defined in Section 12.5

d2l.torch.squared_loss(y_hat, y)[source]

Squared loss.

Defined in Section 3.2

d2l.torch.stack(tensors, dim=0, *, out=None) Tensor

Concatenates a sequence of tensors along a new dimension.

All tensors need to be of the same size.

Arguments:

tensors (sequence of Tensors): sequence of tensors to concatenate dim (int): dimension to insert. Has to be between 0 and the number

of dimensions of concatenated tensors (inclusive)

Keyword args:

out (Tensor, optional): the output tensor.

d2l.torch.subsample(sentences, vocab)[source]

Subsample high-frequency words.

Defined in Section 14.3

d2l.torch.synthetic_data(w, b, num_examples)[source]

Generate y = Xw + b + noise.

Defined in Section 3.2

d2l.torch.tanh(input, *, out=None) Tensor

Returns a new tensor with the hyperbolic tangent of the elements of input.

(19.7.10)\[\text{out}_{i} = \tanh(\text{input}_{i})\]
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([ 0.8986, -0.7279,  1.1745,  0.2611])
>>> torch.tanh(a)
tensor([ 0.7156, -0.6218,  0.8257,  0.2553])
d2l.torch.tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False) Tensor

Constructs a tensor with no autograd history (also known as a “leaf tensor”, see /notes/autograd) by copying data.

Warning

When working with tensors prefer using torch.Tensor.clone(), torch.Tensor.detach(), and torch.Tensor.requires_grad_() for readability. Letting t be a tensor, torch.tensor(t) is equivalent to t.clone().detach(), and torch.tensor(t, requires_grad=True) is equivalent to t.clone().detach().requires_grad_(True).

See also

torch.as_tensor() preserves autograd history and avoids copies where possible. torch.from_numpy() creates a tensor that shares storage with a NumPy array.

Args:
data (array_like): Initial data for the tensor. Can be a list, tuple,

NumPy ndarray, scalar, and other types.

Keyword args:
dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, infers data type from data.

device (torch.device, optional): the device of the constructed tensor. If None and data is a tensor

then the device of data is used. If None and data is not a tensor then the result tensor is constructed on the CPU.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

pin_memory (bool, optional): If set, returned tensor would be allocated in

the pinned memory. Works only for CPU tensors. Default: False.

Example:

>>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]])
tensor([[ 0.1000,  1.2000],
        [ 2.2000,  3.1000],
        [ 4.9000,  5.2000]])

>>> torch.tensor([0, 1])  # Type inference on data
tensor([ 0,  1])

>>> torch.tensor([[0.11111, 0.222222, 0.3333333]],
...              dtype=torch.float64,
...              device=torch.device('cuda:0'))  # creates a double tensor on a CUDA device
tensor([[ 0.1111,  0.2222,  0.3333]], dtype=torch.float64, device='cuda:0')

>>> torch.tensor(3.14159)  # Create a zero-dimensional (scalar) tensor
tensor(3.1416)

>>> torch.tensor([])  # Create an empty tensor (of size (0,))
tensor([])
d2l.torch.to(x, *args, **kwargs)
d2l.torch.tokenize(lines, token='word')[source]

Split text lines into word or character tokens.

Defined in Section 8.2

d2l.torch.tokenize_nmt(text, num_examples=None)[source]

Tokenize the English-French dataset.

Defined in Section 9.5

d2l.torch.train_2d(trainer, steps=20, f_grad=None)[source]

Optimize a 2D objective function with a customized trainer.

Defined in Section 11.3.1.1

d2l.torch.train_batch_ch13(net, X, y, loss, trainer, devices)[source]

Train for a minibatch with mutiple GPUs (defined in Chapter 13).

Defined in Section 13.1

d2l.torch.train_ch11(trainer_fn, states, hyperparams, data_iter, feature_dim, num_epochs=2)[source]

Defined in Section 11.5.2

d2l.torch.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices=[device(type='cuda', index=0), device(type='cuda', index=1), device(type='cuda', index=2), device(type='cuda', index=3)])[source]

Train a model with mutiple GPUs (defined in Chapter 13).

Defined in Section 13.1

d2l.torch.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)[source]

Train a model (defined in Chapter 3).

Defined in Section 3.6

d2l.torch.train_ch6(net, train_iter, test_iter, num_epochs, lr, device)[source]

Train a model with a GPU (defined in Chapter 6).

Defined in Section 6.6

d2l.torch.train_ch8(net, train_iter, vocab, lr, num_epochs, device, use_random_iter=False)[source]

Train a model (defined in Chapter 8).

Defined in Section 8.5

d2l.torch.train_concise_ch11(trainer_fn, hyperparams, data_iter, num_epochs=4)[source]

Defined in Section 11.5.2

d2l.torch.train_epoch_ch3(net, train_iter, loss, updater)[source]

The training loop defined in Chapter 3.

Defined in Section 3.6

d2l.torch.train_epoch_ch8(net, train_iter, loss, updater, device, use_random_iter)[source]

Train a net within one epoch (defined in Chapter 8).

Defined in Section 8.5

d2l.torch.train_seq2seq(net, data_iter, lr, num_epochs, tgt_vocab, device)[source]

Train a model for sequence to sequence.

Defined in Section 9.7.2

d2l.torch.transpose(x, *args, **kwargs)
d2l.torch.transpose_output(X, num_heads)[source]

Reverse the operation of transpose_qkv.

Defined in Section 10.5

d2l.torch.transpose_qkv(X, num_heads)[source]

Transposition for parallel computation of multiple attention heads.

Defined in Section 10.5

d2l.torch.truncate_pad(line, num_steps, padding_token)[source]

Truncate or pad sequences.

Defined in Section 9.5

d2l.torch.try_all_gpus()[source]

Return all available GPUs, or [cpu(),] if no GPU exists.

Defined in Section 5.6

d2l.torch.try_gpu(i=0)[source]

Return gpu(i) if exists, otherwise return cpu().

Defined in Section 5.6

d2l.torch.update_D(X, Z, net_D, net_G, loss, trainer_D)[source]

Update discriminator.

Defined in Section 17.1

d2l.torch.update_G(Z, net_D, net_G, loss, trainer_G)[source]

Update generator.

Defined in Section 17.1

d2l.torch.use_svg_display()[source]

Use the svg format to display a plot in Jupyter.

Defined in Section 2.4

d2l.torch.voc_colormap2label()[source]

Build the mapping from RGB to class indices for VOC labels.

Defined in Section 13.9

d2l.torch.voc_label_indices(colormap, colormap2label)[source]

Map any RGB values in VOC labels to their class indices.

Defined in Section 13.9

d2l.torch.voc_rand_crop(feature, label, height, width)[source]

Randomly crop both feature and label images.

Defined in Section 13.9

d2l.torch.zeros(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) Tensor

Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument size.

Args:
size (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

Keyword args:

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.zeros(2, 3)
tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]])

>>> torch.zeros(5)
tensor([ 0.,  0.,  0.,  0.,  0.])
class d2l.tensorflow.Accumulator(n)[source]

Bases: object

For accumulating sums over n variables.

add(*args)[source]
reset()[source]
class d2l.tensorflow.AddNorm(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

Residual connection followed by layer normalization.

Defined in Section 10.7

call(X, Y, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.AdditiveAttention(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

Additive attention.

Defined in Section 10.3

call(queries, keys, values, valid_lens, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.Animator(xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1, figsize=(3.5, 2.5))[source]

Bases: object

For plotting data in animation.

add(x, y)[source]
class d2l.tensorflow.AttentionDecoder(*args, **kwargs)[source]

Bases: d2l.tensorflow.Decoder

The base attention-based decoder interface.

Defined in Section 10.4

property attention_weights
class d2l.tensorflow.Benchmark(description='Done')[source]

Bases: object

For measuring running time.

class d2l.tensorflow.Decoder(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

The base decoder interface for the encoder-decoder architecture.

Defined in Section 9.6

call(X, state, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

init_state(enc_outputs, *args)[source]
class d2l.tensorflow.DotProductAttention(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

Scaled dot product attention.

Defined in Section 10.3.2

call(queries, keys, values, valid_lens, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.Encoder(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

The base encoder interface for the encoder-decoder architecture.

call(X, *args, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.EncoderBlock(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

Transformer encoder block.

Defined in Section 10.7

call(X, valid_lens, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.EncoderDecoder(*args, **kwargs)[source]

Bases: keras.engine.training.Model

The base class for the encoder-decoder architecture.

Defined in Section 9.6

call(enc_X, dec_X, *args, **kwargs)[source]

Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Note: This method should not be called directly. It is only meant to be overridden when subclassing tf.keras.Model. To call a model on an input, always use the __call__() method, i.e. model(inputs), which relies on the underlying call() method.

Args:

inputs: Input tensor, or dict/list/tuple of input tensors. training: Boolean or boolean scalar tensor, indicating whether to run

the Network in training mode or inference mode.

mask: A mask or list of masks. A mask can be either a boolean tensor or
None (no mask). For more details, check the guide

[here](https://www.tensorflow.org/guide/keras/masking_and_padding).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

class d2l.tensorflow.MaskedSoftmaxCELoss(valid_len)[source]

Bases: keras.losses.Loss

The softmax cross-entropy loss with masks.

Defined in Section 9.7.2

call(label, pred)[source]

Invokes the Loss instance.

Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN], except

sparse loss functions such as sparse categorical crossentropy where shape = [batch_size, d0, .. dN-1]

y_pred: The predicted values. shape = [batch_size, d0, .. dN]

Returns:

Loss values with the shape [batch_size, d0, .. dN-1].

class d2l.tensorflow.MultiHeadAttention(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

Multi-head attention.

Defined in Section 10.5

call(queries, keys, values, valid_lens, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.PositionWiseFFN(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

Positionwise feed-forward network.

Defined in Section 10.7

call(X)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.PositionalEncoding(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

Positional encoding.

Defined in Section 10.6

call(X, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.RNNModel(*args, **kwargs)[source]

Bases: keras.engine.base_layer.Layer

Defined in Section 8.6

begin_state(*args, **kwargs)[source]
call(inputs, state)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.RNNModelScratch(vocab_size, num_hiddens, init_state, forward_fn, get_params)[source]

Bases: object

A RNN Model implemented from scratch.

begin_state(batch_size, *args, **kwargs)[source]
class d2l.tensorflow.Residual(*args, **kwargs)[source]

Bases: keras.engine.training.Model

The Residual block of ResNet.

call(X)[source]

Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Note: This method should not be called directly. It is only meant to be overridden when subclassing tf.keras.Model. To call a model on an input, always use the __call__() method, i.e. model(inputs), which relies on the underlying call() method.

Args:

inputs: Input tensor, or dict/list/tuple of input tensors. training: Boolean or boolean scalar tensor, indicating whether to run

the Network in training mode or inference mode.

mask: A mask or list of masks. A mask can be either a boolean tensor or
None (no mask). For more details, check the guide

[here](https://www.tensorflow.org/guide/keras/masking_and_padding).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

class d2l.tensorflow.Seq2SeqEncoder(*args, **kwargs)[source]

Bases: d2l.tensorflow.Encoder

The RNN encoder for sequence to sequence learning.

Defined in Section 9.7

call(X, *args, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.SeqDataLoader(batch_size, num_steps, use_random_iter, max_tokens)[source]

Bases: object

An iterator to load sequence data.

class d2l.tensorflow.Timer[source]

Bases: object

Record multiple running times.

avg()[source]

Return the average time.

cumsum()[source]

Return the accumulated time.

start()[source]

Start the timer.

stop()[source]

Stop the timer and record the time in a list.

sum()[source]

Return the sum of time.

class d2l.tensorflow.TrainCallback(net, train_iter, test_iter, num_epochs, device_name)[source]

Bases: keras.callbacks.Callback

A callback to visiualize the training progress.

Defined in Section 6.6

on_epoch_begin(epoch, logs=None)[source]

Called at the start of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Args:

epoch: Integer, index of epoch. logs: Dict. Currently no data is passed to this argument for this method

but that may change in the future.

on_epoch_end(epoch, logs)[source]

Called at the end of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Args:

epoch: Integer, index of epoch. logs: Dict, metric results for this training epoch, and for the

validation epoch if validation is performed. Validation result keys are prefixed with val_. For training epoch, the values of the

Model’s metrics are returned. Example`{‘loss’: 0.2, ‘accuracy’:

0.7}`.

class d2l.tensorflow.TransformerEncoder(*args, **kwargs)[source]

Bases: d2l.tensorflow.Encoder

Transformer encoder.

Defined in Section 10.7

call(X, valid_lens, **kwargs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Args:
inputs: Input tensor, or dict/list/tuple of input tensors.

The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.

  • NumPy array or Python scalar values in inputs get cast as tensors.

  • Keras mask metadata is only collected from inputs.

  • Layers are built (build(input_shape) method) using shape info from inputs only.

  • input_spec compatibility is only checked against inputs.

  • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

  • The SavedModel input specification is generated using inputs only.

  • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

*args: Additional positional arguments. May contain tensors, although

this is not recommended, for the reasons above.

**kwargs: Additional keyword arguments. May contain tensors, although

this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.

  • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class d2l.tensorflow.Updater(params, lr)[source]

Bases: object

For updating parameters using minibatch stochastic gradient descent.

Defined in Section 3.6

class d2l.tensorflow.Vocab(tokens=None, min_freq=0, reserved_tokens=None)[source]

Bases: object

Vocabulary for text.

to_tokens(indices)[source]
property token_freqs
property unk
d2l.tensorflow.accuracy(y_hat, y)[source]

Compute the number of correct predictions.

Defined in Section 3.6

d2l.tensorflow.annotate(text, xy, xytext)[source]
d2l.tensorflow.bbox_to_rect(bbox, color)[source]

Convert bounding box to matplotlib format.

Defined in Section 13.3

d2l.tensorflow.bleu(pred_seq, label_seq, k)[source]

Compute the BLEU.

Defined in Section 9.7.4

d2l.tensorflow.box_center_to_corner(boxes)[source]

Convert from (center, width, height) to (upper-left, lower-right).

Defined in Section 13.3

d2l.tensorflow.box_corner_to_center(boxes)[source]

Convert from (upper-left, lower-right) to (center, width, height).

Defined in Section 13.3

d2l.tensorflow.build_array_nmt(lines, vocab, num_steps)[source]

Transform text sequences of machine translation into minibatches.

Defined in Section 9.5.4

d2l.tensorflow.corr2d(X, K)[source]

Compute 2D cross-correlation.

d2l.tensorflow.count_corpus(tokens)[source]

Count token frequencies.

Defined in Section 8.2

d2l.tensorflow.download(name, cache_dir='../data')[source]

Download a file inserted into DATA_HUB, return the local filename.

Defined in Section 4.10

d2l.tensorflow.download_all()[source]

Download all files in the DATA_HUB.

Defined in Section 4.10

d2l.tensorflow.download_extract(name, folder=None)[source]

Download and extract a zip/tar file.

Defined in Section 4.10

d2l.tensorflow.evaluate_accuracy(net, data_iter)[source]

Compute the accuracy for a model on a dataset.

Defined in Section 3.6

d2l.tensorflow.evaluate_loss(net, data_iter, loss)[source]

Evaluate the loss of a model on the given dataset.

Defined in Section 4.4

d2l.tensorflow.get_data_ch11(batch_size=10, n=1500)[source]

Defined in Section 11.5.2

d2l.tensorflow.get_fashion_mnist_labels(labels)[source]

Return text labels for the Fashion-MNIST dataset.

Defined in Section 3.5

d2l.tensorflow.grad_clipping(grads, theta)[source]

Clip the gradient.

Defined in Section 8.5

d2l.tensorflow.linreg(X, w, b)[source]

The linear regression model.

Defined in Section 3.2

d2l.tensorflow.load_array(data_arrays, batch_size, is_train=True)[source]

Construct a TensorFlow data iterator.

Defined in Section 3.3

d2l.tensorflow.load_corpus_time_machine(max_tokens=- 1)[source]

Return token indices and the vocabulary of the time machine dataset.

Defined in Section 8.2

d2l.tensorflow.load_data_fashion_mnist(batch_size, resize=None)[source]

Download the Fashion-MNIST dataset and then load it into memory.

Defined in Section 3.5

d2l.tensorflow.load_data_nmt(batch_size, num_steps, num_examples=600)[source]

Return the iterator and the vocabularies of the translation dataset.

Defined in Section 9.5.4

d2l.tensorflow.load_data_time_machine(batch_size, num_steps, use_random_iter=False, max_tokens=10000)[source]

Return the iterator and the vocabulary of the time machine dataset.

Defined in Section 8.3

d2l.tensorflow.masked_softmax(X, valid_lens)[source]

Perform softmax operation by masking elements on the last axis.

Defined in Section 10.3

d2l.tensorflow.numpy(x, *args, **kwargs)
d2l.tensorflow.plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5), axes=None)[source]

Plot data points.

Defined in Section 2.4

d2l.tensorflow.predict_ch3(net, test_iter, n=6)[source]

Predict labels (defined in Chapter 3).

Defined in Section 3.6

d2l.tensorflow.predict_ch8(prefix, num_preds, net, vocab)[source]

Generate new characters following the prefix.

Defined in Section 8.5

d2l.tensorflow.predict_seq2seq(net, src_sentence, src_vocab, tgt_vocab, num_steps, save_attention_weights=False)[source]

Predict for sequence to sequence.

Defined in Section 9.7.4

d2l.tensorflow.preprocess_nmt(text)[source]

Preprocess the English-French dataset.

Defined in Section 9.5

d2l.tensorflow.read_data_nmt()[source]

Load the English-French dataset.

Defined in Section 9.5

d2l.tensorflow.read_time_machine()[source]

Load the time machine dataset into a list of text lines.

Defined in Section 8.2

d2l.tensorflow.seq_data_iter_random(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using random sampling.

Defined in Section 8.3

d2l.tensorflow.seq_data_iter_sequential(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using sequential partitioning.

Defined in Section 8.3

d2l.tensorflow.sequence_mask(X, valid_len, value=0)[source]

Mask irrelevant entries in sequences.

Defined in Section 9.7.2

d2l.tensorflow.set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)[source]

Set the axes for matplotlib.

Defined in Section 2.4

d2l.tensorflow.set_figsize(figsize=(3.5, 2.5))[source]

Set the figure size for matplotlib.

Defined in Section 2.4

d2l.tensorflow.sgd(params, grads, lr, batch_size)[source]

Minibatch stochastic gradient descent.

Defined in Section 3.2

d2l.tensorflow.show_heatmaps(matrices, xlabel, ylabel, titles=None, figsize=(2.5, 2.5), cmap='Reds')[source]

Show heatmaps of matrices.

Defined in Section 10.1

d2l.tensorflow.show_images(imgs, num_rows, num_cols, titles=None, scale=1.5)[source]

Plot a list of images.

Defined in Section 3.5

d2l.tensorflow.show_list_len_pair_hist(legend, xlabel, ylabel, xlist, ylist)[source]

Plot the histogram for list length pairs.

Defined in Section 9.5

d2l.tensorflow.show_trace_2d(f, results)[source]

Show the trace of 2D variables during optimization.

Defined in Section 11.3.1.1

d2l.tensorflow.size(a)
d2l.tensorflow.squared_loss(y_hat, y)[source]

Squared loss.

Defined in Section 3.2

d2l.tensorflow.synthetic_data(w, b, num_examples)[source]

Generate y = Xw + b + noise.

Defined in Section 3.2

d2l.tensorflow.tokenize(lines, token='word')[source]

Split text lines into word or character tokens.

Defined in Section 8.2

d2l.tensorflow.tokenize_nmt(text, num_examples=None)[source]

Tokenize the English-French dataset.

Defined in Section 9.5

d2l.tensorflow.train_2d(trainer, steps=20, f_grad=None)[source]

Optimize a 2D objective function with a customized trainer.

Defined in Section 11.3.1.1

d2l.tensorflow.train_ch11(trainer_fn, states, hyperparams, data_iter, feature_dim, num_epochs=2)[source]

Defined in Section 11.5.2

d2l.tensorflow.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)[source]

Train a model (defined in Chapter 3).

Defined in Section 3.6

d2l.tensorflow.train_ch6(net_fn, train_iter, test_iter, num_epochs, lr, device)[source]

Train a model with a GPU (defined in Chapter 6).

Defined in Section 6.6

d2l.tensorflow.train_ch8(net, train_iter, vocab, lr, num_epochs, strategy, use_random_iter=False)[source]

Train a model (defined in Chapter 8).

Defined in Section 8.5

d2l.tensorflow.train_concise_ch11(trainer_fn, hyperparams, data_iter, num_epochs=2)[source]

Defined in Section 11.5.2

d2l.tensorflow.train_epoch_ch3(net, train_iter, loss, updater)[source]

The training loop defined in Chapter 3.

Defined in Section 3.6

d2l.tensorflow.train_epoch_ch8(net, train_iter, loss, updater, use_random_iter)[source]

Train a model within one epoch (defined in Chapter 8).

Defined in Section 8.5

d2l.tensorflow.train_seq2seq(net, data_iter, lr, num_epochs, tgt_vocab, device)[source]

Train a model for sequence to sequence.

Defined in Section 9.7.2

d2l.tensorflow.transpose_output(X, num_heads)[source]

Reverse the operation of transpose_qkv.

Defined in Section 10.5

d2l.tensorflow.transpose_qkv(X, num_heads)[source]

Transposition for parallel computation of multiple attention heads.

Defined in Section 10.5

d2l.tensorflow.truncate_pad(line, num_steps, padding_token)[source]

Truncate or pad sequences.

Defined in Section 9.5

d2l.tensorflow.try_all_gpus()[source]

Return all available GPUs, or [cpu(),] if no GPU exists.

Defined in Section 5.6

d2l.tensorflow.try_gpu(i=0)[source]

Return gpu(i) if exists, otherwise return cpu().

Defined in Section 5.6

d2l.tensorflow.update_D(X, Z, net_D, net_G, loss, optimizer_D)[source]

Update discriminator.

Defined in Section 17.1

d2l.tensorflow.update_G(Z, net_D, net_G, loss, optimizer_G)[source]

Update generator.

Defined in Section 17.1

d2l.tensorflow.use_svg_display()[source]

Use the svg format to display a plot in Jupyter.

Defined in Section 2.4