A Deep Dive into Attention Mechanism : Encoders of Transformers

Understanding the Encoder of the Transformer

Learn about the high-level structure and components of the encoder of the transformer.

The transformer consists of a stack of N number of encoders. The output of one encoder is sent as input to the encoder above it. As shown in the following figure, we have a stack of N number of encoders. Each encoder sends its output to the encoder above it. The final encoder returns the representation of the given source sentence as output. We feed the source sentence as input to the encoder and get the representation of the source sentence as output.

A stack of N number of encoders

Note that in the transformer paper "Attention Is All You Need," the authors have used N=6, meaning that they stacked up six encoders, one above the other. However, we can try out different values of N. For simplicity and better understanding, let's keep N=2:

A stack of 2 encoders

Encoder's components

Okay, the question is, how exactly does the encoder work? How is it generating the representation for the given source sentence (input sentence)? To understand this, let's tap into the encoder and see its components. The following figure shows the components of the encoder:

Encoder with its components

From the preceding figure, we can understand that all the encoder blocks are identical. We can also observe that each encoder block consists of two sublayers:

Multi-head attention
Feedforward network

Now, we will get into the details and learn how exactly these two sublayers work.

Machine Learning

Search This Blog

A Deep Dive into Attention Mechanism : Encoders of Transformers

Comments

Post a Comment