You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SimplifiedTransformer simplifies transformer block without affecting training. Skip connections, projection parameters, sequential sub-blocks, and normalization layers are removed. Experimental results confirm similar training speed and performance.
The author presents an implementation for Simplifying Transformer Blocks. The standard transformer blocks are complex and can lead to architecture instability. In this work, the author investigates how the standard transformer block can be simplified. Through signal propagation theory and empirical observations, the author proposes modifications that remove several components without sacrificing training speed or performance. The simplified transformers achieve the same training speed and performance as standard transformers, while being 15% faster in training throughput and using 15% fewer parameters.
SimplifiedTransformer simplifies transformer block without affecting training. Skip connections, projection parameters, sequential sub-blocks, and normalization layers are removed. Experimental results confirm similar training speed and performance.