[Profile] Split record event into Global and Local for more accurate profile #62722

HydrogenSulfate · 2024-03-14T08:28:39Z

PR types

Function optimization

PR changes

Others

Description

Pcard-75624

Fix the ambiguity of node execution event, the first segment is for calling GradNodeFunction, should be recognized as XXXGradNode computation cost, the second segment is for potential gradient accumulation in backward queue, should not be count into XXXGradNode, or will misleading users who profile paddle program with nsight or other visualization software.

So this PR use Local_XXXGradNode to represent execution time of XXXGradNode function, Global_XXXGradNode to represent execution time of Local_XXXGradNode plus potential gradient accumulation. Thus, Global_XXXGradNode should always be larger than Local_XXXGradNode and Local_XXXGradNode is more significant for profiling.

To achieve this target, this PR modifies eager_gen.py and several manually XXXnode.cc, and move event creation next to node execution for ignoring node(s) skipped in backward node queue(i.e.

).

before:

after:

As is dipicted below, Global_MultiplyGradNode include 2 parts: Local_MultiplyGradNode execution(grad_output_tensors = (*node)( node_input_buffer->Buffers(), create_graph, is_general_grad);) and gradient accumulation(node_input_buffers_dict[next_node]->add(edge_rank.first, edge_rank.second, grad_output_tensor, create_graph);)

paddle-bot · 2024-03-14T08:28:44Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

plit record event into Global and Local for more accurate profile

wanghuancoder

LGTM

cxxly

LGTM

sctivate py39

5ddb621

plit record event into Global and Local for more accurate profile

HydrogenSulfate force-pushed the fix_event_range branch 2 times, most recently from 3d481a4 to c05787e Compare March 15, 2024 03:20

update include file event_tracing.h

e6e2fb6

HydrogenSulfate force-pushed the fix_event_range branch from c05787e to e6e2fb6 Compare March 15, 2024 03:20

wanghuancoder approved these changes Mar 15, 2024

View reviewed changes

luotao1 approved these changes Mar 15, 2024

View reviewed changes

zyfncg approved these changes Mar 15, 2024

View reviewed changes

cxxly approved these changes Mar 18, 2024

View reviewed changes

cxxly merged commit 5e7c7af into PaddlePaddle:develop Mar 18, 2024

HydrogenSulfate deleted the fix_event_range branch March 18, 2024 02:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Profile] Split record event into Global and Local for more accurate profile #62722

[Profile] Split record event into Global and Local for more accurate profile #62722

Uh oh!

HydrogenSulfate commented Mar 14, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented Mar 14, 2024

Uh oh!

wanghuancoder left a comment

Uh oh!

cxxly left a comment

Uh oh!

Uh oh!

[Profile] Split record event into Global and Local for more accurate profile #62722

[Profile] Split record event into Global and Local for more accurate profile #62722

Uh oh!

Conversation

HydrogenSulfate commented Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Mar 14, 2024

Uh oh!

wanghuancoder left a comment

Choose a reason for hiding this comment

Uh oh!

cxxly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HydrogenSulfate commented Mar 14, 2024 •

edited

Loading