CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[XPU] feat: add xpu async memory copy to enable zero cost checkpoint #71168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
033a925
to
406fb2d
Compare
406fb2d
to
dd8b145
Compare
std::shared_ptr<distributed::XpuAsyncLoad::Task>>( | ||
*m, "XpuAsyncLoadTask") | ||
.def("is_completed", | ||
&distributed::XpuAsyncLoad::Task::IsCompleted, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个接口没有测试到
return; | ||
} | ||
// platform::MemcpySyncH2D(dst, src, num, dst_place); | ||
xpu_memcpy_async(dst, src, num, XPU_HOST_TO_DEVICE, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没有检查返回值
return; | ||
} | ||
// platform::MemcpySyncD2H(dst, src, num, src_place); | ||
xpu_memcpy_async(dst, src, num, XPU_DEVICE_TO_HOST, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有检查返回值
// (but let's store a CPU event just so we can return a reference). | ||
// In a real design, you might do a separate approach. | ||
|
||
phi::Place event_place = is_xpu_place(place) ? phi::CPUPlace() : place; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XPU为什么不需要创建event?
data1 = paddle.randn([10, 10]) | ||
print_debug_info(data1, "data1 (for compute)") | ||
|
||
# Offload data0 -> pinned memory (usually on CPU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
似乎没有看到哪里有指定是CPU pinned memory的类型?
void XpuAsyncLoad::SyncCalcuStream(const Place& place, | ||
phi::XPUContext* offload_ctx, | ||
platform::DeviceEvent* calc_event) { | ||
if (is_xpu_place(place)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个place似乎是offload的src place,也就是XPU place,为啥不需要插入event wait?
src.place(), | ||
src_ptr, | ||
size, | ||
/*stream=*/nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里使用nullptr stream,而task->UpdateWaitChain(*load_ctx_)
使用的是load_ctx_的stream,可能会有同步问题,即task.wait返回成功了,但是copy可能还未成功
src.place(), | ||
src_ptr, | ||
size, | ||
/*stream=*/nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
return task; | ||
} | ||
|
||
/* ------------ Reload (CPU -> XPU) ------------ */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reload的CPU如果不是pinned memory,xpu_memcpy_async可能会退化成同步xpu_memcpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…addlePaddle#71168) * [XPU] feat: add xpu async memory copy to enable zero cost checkpoint * [XPU] feat: add xpu async memory copy to enable zero cost checkpoint * [XPU] feat: add xpu async memory copy to enable zero cost checkpoint * [XPU] feat: add xpu async memory copy to enable zero cost checkpoint * [XPU] feat: add xpu async memory copy to enable zero cost checkpoint * [XPU] feat: add xpu async memory copy to enable zero cost checkpoint * [XPU] feat: add xpu async memory copy to enable zero cost checkpoint
PR Category
Custom Device
PR Types
New features
Description
Add xpu async memory copy to enable zero cost checkpoint