Understanding the Encoder of the Transformer
Learn about the high-level structure and components of the encoder of the transformer.
The transformer consists of a stack of N number of encoders. The output of one encoder is sent as input to the encoder above it. As shown in the following figure, we have a stack of N number of encoders. Each encoder sends its output to the encoder above it. The final encoder returns the representation of the given source sentence as output. We feed the source sentence as input to the encoder and get the representation of the source sentence as output.
A stack of N number of encoders |
Note that in the transformer paper "Attention Is All You Need," the authors have used N=6, meaning that they stacked up six encoders, one above the other. However, we can try out different values of N. For simplicity and better understanding, let's keep N=2:
A stack of 2 encoders |
Encoder's components
Okay, the question is, how exactly does the encoder work? How is it generating the representation for the given source sentence (input sentence)? To understand this, let's tap into the encoder and see its components. The following figure shows the components of the encoder:
From the preceding figure, we can understand that all the encoder blocks are identical. We can also observe that each encoder block consists of two sublayers:
- Multi-head attention
- Feedforward network
Now, we will get into the details and learn how exactly these two sublayers work.
Comments
Post a Comment