[NIPS2017]Attention is all you need-Linux大棚

admin 管理员组

文章数量: 1086948

[NIPS2017]Attention is all you need

原理源码讲解：
.html

这篇文章是火遍全宇宙，关于网上的解读也非常多，将自己看完后的一点小想法也总结一下。
看完一遍之后，有很多疑问，我是针对每个疑问都了解清楚后才算明白了这篇文章，可能写的不到位，只是总结下，下次忘记了便于翻查。
一：Q，K， V 到底是什么？
在传统的seq2seq框架下：
query: seq2seq模型中decode时隐层向量S t − 1 _{t-1} t−1，记作q t − 1 _{t-1} t−1, Q就是多个query组成的矩阵Q
value: seq2seq模型中encode时的隐层向量h i _i i，记作v i _i i， V是输入序列中n个词的embedding矩阵
key: 对h i _i i做了一次先行映射得到的向量，记作k i _i i，K同上
在本文的transformer下，结合文字和图：
(1) encoder self-attention
Q 就是input sequence( w 1 w_1 w1, w 2 w_2 w2, …, w i w_i wi, …, w n w_n wn)将其映射为word embedding后 ( x 1 x_1 x1, x 2 x_2 x2, …, x i x_i xi, …, x n x_n xn），Q= ( x 1 x_1 x1, x 2 x_2 x2, …, x i x_i xi, …, x n x_n xn)，并且 K=V=Q
(2) decoder self-attention
当t=0时，decoder self-attention的Q是<bos>的embedding, 当t=j时，Q=(E < b o s > _{<bos>} <bos>, E y 1 _{y1} y1, …, E y j − 1 _{y_{j-1}} yj−1), 其中y j − 1 _{j-1} j−1是t=j-1时刻decoder的输出. K=V=Q
(3) encoder-decoder self-attention
K=V是encoder的输出，将encoder的输出传给decoder, 这一操作使得decoder可以获取输入 X X X序列的信息, 类似于传统seq2seq中的decoder端的attention. Q是decoder self-attention的输出.

二：怎么理解self-attention, 怎么做self-attention，为什么用self-attention?
(1) 在传统的 seq2seq 中的 encoder 阶段，针对输入 X X X = ( x 1 x_1 x1, x 2 x_2 x2, …, x i x_i xi, …, x n x_n xn)，经过RNN或LSTM变换后得到序列的隐层状态 H H H = ( h 1 h_1 h1, h 2 h_2 h2, …, h i h_i hi, …, h n h_n hn)，但是此篇文章抛弃了 RNN，encoder 过程就没了 hidden states，那拿什么做 self-attention 呢？input 的 sequence 共有 n 个 word，将每一个 word 映射成 embedding, 就得到 n 个 embedding，可以用 embedding 代替 hidden state 做 self-attention 。所以 Q 就是一个n行 d k d_k dk列的矩阵，这个矩阵就是n个词的embedding，并且Q=K=V。那么为什么管Q 就是query呢？就是每次用一个词的embedding，去计算其与剩下的（n-1）个词的 embedding 的 match 程度（也就是 attention 的大小，这就是self-attention的意思了。

针对n个词，一共要做n轮这样的操作：

(2)首先将query 和每个key进行相似度计算得到权重，常用的相似度函数有点积拼接，感知机等
然后使用一个softmax函数对这些权重归一化，最后权重与相应的键值value进行加权求和得到attention后的context
(3) 句子中的每个词都要和该句子中的所有词进行attention计算，目的是学习句子内部的词以来关系，捕获句子的内部结构。
三：怎么理解 decoder self-attention中的Masked Multi-Head Attention

四：如何理解公式(1), 怎么理解缩放因子 1 d k \frac{1}{\sqrt{d_k}} dk 1？

公式(1)中的softmax( Q K T d k \frac{QK^T}{\sqrt{d_k}} dk QKT) 就是类似 a i j a_{ij} aij和 c i c_i ci的计算，Q和K就相当于 e i j e_{ij} eij计算中的 S i − 1 S_{i-1} Si−1和 h j h_j hj：

上图中的Q换成q, K换成k

参考文献：
源码解读：

=blogxgwz0

本文标签： NIPS2017Attention is all you need

版权声明：本文标题：[NIPS2017]Attention is all you need 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.roclinux.cn/b/1687299593a86754.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

[NIPS2017]Attention is all you need

[NIPS2017]Attention is all you need

更多相关文章

[NIPS2017]Attention is all you need

发表评论

推荐文章

javascript - Calling a stored procedure using a task in Snowflake - Stack Overflow

javascript - Delay until fully loaded - Stack Overflow

logic - ¬q, (¬p⇒(¬q⇒¬r)), (s∨r), (s⇒t), and (p⇒t), prove t., using Fitch - Stack Overflow

电脑读卡器,笔记本读卡器怎么用教程

腾讯云Edgeone为我的网站保驾护航

热门文章

javascript - get html from a variable by id - Stack Overflow

oracle database - Prevent duplicate use of value for another primary key? - Stack Overflow

javascript - Reactjs prop available in render but not in componentDidMount - Stack Overflow

电脑能够连上wifi，但无Internet访问，如何解决？

javascript - jsPDF library cannot insert utf8 letters into pdf - Stack Overflow

javascript - Why should addChangeListener be in componentDidMount instead of componentWillMount? - Stack Overflow

In JavaScript, how can I return a boolean value indicating whether a key is present in a JSON object? - Stack Overflow

G4900 win7 显卡驱动下载 G5400 win7显卡驱动下载

2024win11软件启动速度反应慢解决方法及工具

轻量级软件授权方案：用Python实现专属激活系统

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

亲测可用-win10 Microsoft store打不开修复

win10卸载程序灾难性故障_win10一直弹出卸载或更改应用程序的具体处理方法

Win10休眠模式设置：开机后自动恢复工作界面，防止程序员虐待电脑

Win11备份和还原注册表详解

win10 操作系统，开机后分辨率的百分比会改变

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

[NIPS2017]Attention is all you need

[NIPS2017]Attention is all you need

更多相关文章

[NIPS2017]Attention is all you need

发表评论

推荐文章

javascript - Calling a stored procedure using a task in Snowflake - Stack Overflow

javascript - Delay until fully loaded - Stack Overflow

logic - &#172;q, (&#172;p⇒(&#172;q⇒&#172;r)), (s∨r), (s⇒t), and (p⇒t), prove t., using Fitch - Stack Overflow

电脑读卡器,笔记本读卡器怎么用教程

腾讯云Edgeone为我的网站保驾护航

热门文章

javascript - get html from a variable by id - Stack Overflow

oracle database - Prevent duplicate use of value for another primary key? - Stack Overflow

javascript - Reactjs prop available in render but not in componentDidMount - Stack Overflow

电脑能够连上wifi，但无Internet访问，如何解决？

javascript - jsPDF library cannot insert utf8 letters into pdf - Stack Overflow

javascript - Why should addChangeListener be in componentDidMount instead of componentWillMount? - Stack Overflow

In JavaScript, how can I return a boolean value indicating whether a key is present in a JSON object? - Stack Overflow

G4900 win7 显卡驱动 下载 G5400 win7显卡驱动下载

2024win11软件启动速度反应慢解决方法及工具

轻量级软件授权方案：用Python实现专属激活系统

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

亲测可用-win10 Microsoft store打不开修复

win10卸载程序灾难性故障_win10一直弹出卸载或更改应用程序的具体处理方法

Win10休眠模式设置：开机后自动恢复工作界面，防止程序员虐待电脑

Win11备份和还原注册表详解

win10 操作系统，开机后分辨率的百分比会改变

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

logic - ¬q, (¬p⇒(¬q⇒¬r)), (s∨r), (s⇒t), and (p⇒t), prove t., using Fitch - Stack Overflow

G4900 win7 显卡驱动下载 G5400 win7显卡驱动下载