## 2️⃣[Self-Attention with Relative Position Representations]

$a_{ij}^K$的具体形式：

## 3️⃣[WEIGHTED TRANSFORMER NETWORK FOR MACHINE TRANSLATION]

κ can be interpreted as a learned concatenation weight and α as the learned addition weight

①pre-processing

②模型

③post processing