BERT Architecture
BERT Architecture
BERT Architecture
BERT
architecture
Transformer Encoder
The transformer encoder is the core architecture of
BERT, comprising multiple layers of identical
submodules. Each submodule includes multi-head self-
attention and a position-wise feedforward network,
followed by residual connections and layer
normalization. This architecture processes all input
tokens simultaneously, allowing the model to capture
bidirectional relationships in text.
FFN(x)=max(0,xW1+b1)W2+b2