You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For now, although there there is already complete indexing (i.e. __getitem__ / __setitem__) functional support, almost all the indexing parsing parts are running in Python, which brings much performance overhead.
In this PR, we move indexing parsing of dygraph from Python to Pybind, aiming to optimize the performance (For static mode, still in Python).
some other things in this pr:
add a fast path for single tensor getting (like x[tensor[1,2]] ): by replace the stack to unsqueeze.
remove non-necessary transpose in advanced getting.
remove any check in bool-mask indexing, since the any kernel and gpu->cpu memcpy is quite slow.
remove cond in bool-mask indexing, since currently this conditional block cannot be parsed in PIR
For convenience of review, the old indexing-parsing code will be deleted later.
你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.
* start down to pybind
* basic-getitem down
* fix compile error; basic-getitem seems work fine
* finish advanced-getitem, runs well but still has bug to fix
* fix parsing bug; getitem seems ok
* finish setitem, still has bug to fix
* fix some bug in advanced setitem
* fix CI error about bool-tensor indexing
* fix advance-indexing ci error
* remove transpose if there is no need
* support single py_boolean in index
* add assert densetensor to avoid error in auto-parallel mode
* add fast path for case single int tensor in getitem
* change set_value_dygraph_function to ad_func
* change vector type and fix the set_value_grad ut
* fix inplace version error in backward by assign
* remove any-check in bool tensor to boost getitem
* remove no-used python code, keep xpu code to fix; py-code add fast path from stack to unsqueeze
* fix py_bool cuda error
* fix cuda config error when gpu_memory_available after all-false bool-tensor indexing
* fix ci in 0-size gather_nd indexing
* add dist tensor check
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR types
Performance optimization
PR changes
Others
Description
Pcard-66985
For now, although there there is already complete indexing (i.e.
__getitem__
/__setitem__
) functional support, almost all the indexing parsing parts are running in Python, which brings much performance overhead.In this PR, we move indexing parsing of dygraph from Python to Pybind, aiming to optimize the performance (For static mode, still in Python).
some other things in this pr:
x[tensor[1,2]]
): by replace thestack
tounsqueeze
.transpose
in advanced getting.any
check in bool-mask indexing, since the any kernel and gpu->cpu memcpy is quite slow.cond
in bool-mask indexing, since currently this conditional block cannot be parsed in PIRFor convenience of review, the old indexing-parsing code will be deleted later.