每周碎片知识13

1️⃣[attention]

所有attention的总结:

Attention? Attention!


2️⃣[Pytorch]

①torch.no_grad能够显著减少内存使用,model.eval不能。因为eval不会关闭历史追踪。

model.eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval model instead of training mode.
torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).

Reference:
Does model.eval() & with torch.set_grad_enabled(is_train) have the same effect for grad history?

‘model.eval()’ vs ‘with torch.no_grad()’

②torch.full(…) returns a tensor filled with value.