You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the DeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the profile data.
Schedules
Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions.
The micro-batches in the reverse direction are symmetric to those in the forward direction, so
we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border
have mutually overlapped computation and communication
DualPipeV
DualPipeV is a concise V-shape schedule derived from DualPipe using a "cut-in-half" procedure, introduced by Sea AI Lab as "Cut-in-half" in their blog post. Thanks to them for this efficient schedule!
Schedules
Example DualPipeV scheduling for 4 PP ranks (8 PP stages) and 10 micro-batches.
Pipeline Bubbles and Memory Usage Comparison (based on the same number of PP stages)
Method
Bubble
Parameter Per Device
Activation Per Device
#Devices
1F1B
(PP-1)(𝐹+𝐵)
1×
PP
PP
ZB1P
(PP-1)(𝐹+𝐵-2𝑊)
1×
PP
PP
DualPipe
(PP/2-1)(𝐹&𝐵+𝐵-3𝑊)
2×
PP+1
PP
DualPipeV
(PP/2-1)(𝐹&𝐵+𝐵-3𝑊)
2×
PP+1
PP/2
PP denotes the number of pp stages (even).
𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of a
full backward chunk, 𝑊 denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐵
denotes the execution time of two mutually overlapped forward and backward chunks.