[PHI][CINN] Fix paddle.where api for big tensor #72717

huangjiyi · 2025-05-14T12:07:49Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

Pcard-85711

修复 paddle.where（底层涉及 SelectKernel 和 WhereGradCUDAKernel）访存下标 int 溢出问题

性能：SelectKernel int64 性能下降约 2% (未实现模版)，WhereGradCUDAKernel int64 性能下降约 10%（已实现模版）

paddle-bot · 2025-05-14T12:07:54Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wanghuancoder · 2025-05-16T01:31:59Z

paddle/phi/kernels/funcs/select_impl.cu.h

+  IdT thread_fix =
      (static_cast<int>(cumsum_thread[0] - num_thread[0]) * store_rank);
  // get how many data need to store
-  int store_num = static_cast<int>(num_thread[0]) * store_rank;
+  IdT store_num = static_cast<int>(num_thread[0]) * store_rank;
  // thread store num data, each thread may has different num


z这里的static_cast还需要吗？

lshpku · 2025-05-16T02:04:37Z

paddle/phi/kernels/funcs/select_impl.cu.h

+  int64_t data_offset = BLOCK_ID_X * BLOCK_NUM_X * VecSize;
+  int64_t stride = BLOCK_NUM_X * GRID_NUM_X * VecSize;


BLOCK_ID_X和BLOCK_NUM_X都是uint32_t，这样乘法会在还没赋值到data_offset的时候就溢出，应该在BLOCK_ID_X进行cast，参考

Paddle/paddle/phi/backends/gpu/cuda/cuda_helper.h

Line 75 in 8165526

static_cast<int64_t>(blockIdx.x) * blockDim.x + threadIdx.x; \

，只要cast第一个后面的都自动cast了

* refine forrange (#72360) * refine forrange * refine forrange * reduce support big tensor (#71970) * reduce support big tensor * [PHI] Fix gridDim limit for reduce kernel (#72507) * [API] isclose support bigtensor (#72516) * isclose support bigtensor * refine * [API] isnan isinf isfinite support bigtensor (#72517) * isnan isinf isfinite support bigtensor * refine * [PHI] Fix cum kernel for big tensor (#72562) * [PHI] Preliminary fix for elementwise broadcast int32 shape overflow (#72584) * [PHI] Align linalg.solve kernel with torch (#72608) * Update strided copy kernel (#72662) * [PHI] Fix grid sample kernel for big tensor (#72628) * [PHI] Fix argsort big tensor bug (#72712) * [PHI] Fixed argsort big tensor bug * [PHI] Fixed shape mismatch problem. * [PHI] Fix contiguous kernel for big tensor (#72705) * [PHI] Fix flatten and split kernel for big tensor (#72634) * [PHI] Fix out-of-bound issue of paddle.take_along_axis (#72757) * [PHI] fix paddle.diag with big tensor (#72638) * [API] fix paddle.cross with big tensor (#72652) * [PHI] Fix paddle.where api for big tensor (#72717) * [PHI] Fix bincount kernel for big tensor (#72706) * fix bincount kernel for big tensor * use HostAlloc to alloc memory * add cpu test case * [PHI] Fix full_like kernel for big tensor (#72831) * [API] Fix int overflow and float16 support for paddle.frac (#72815) * [PHI] Align paddle.inner with torch in matmul logic (#72843) * [PHI] Fix paddle.var & paddle.std float16 overflow (#72650) * [PHI] Fix logsumexp precision problem (#72681) * [PHI] Debug for logsumexp, bug source found * [PHI] Removed GetNumBlocks func to get correct logsumexp * [PHI] Removed redundant debug VLOG * [PHI] Elegant grid bounded solution * [Accuracy diff No.55-56、76-77] Fix accuracy diff for var&std API (#72879) * [Accuracy diff No.21] Fix accuracy diff for heaviside API (#72894) --------- Co-authored-by: Shuhao Liang <50269654+lshpku@users.noreply.github.com> Co-authored-by: Qianyue He <46109954+Enigmatisms@users.noreply.github.com> Co-authored-by: Lei Ding <69283446+Dmovic@users.noreply.github.com> Co-authored-by: ggggxm <66855582+ggggxm@users.noreply.github.com> Co-authored-by: xkkkkkk23 <xiekeke@baidu.com> Co-authored-by: Zx <zhangxiao35@baidu.com> Co-authored-by: huangjiyi <43315610+huangjiyi@users.noreply.github.com> Co-authored-by: ooo oo <106524776+ooooo-create@users.noreply.github.com>

[PHI]Fix SelectKernel for big tensor

6295749

update WhereGradCUDAKernel

08a6a5c

huangjiyi changed the title ~~[PHI][CINN] Fix SelectKernel for big tensor~~ [PHI][CINN] Fix paddle.where api for big tensor May 14, 2025

huangjiyi added 2 commits May 15, 2025 17:08

add template for WhereGradCUDAKernel

9b3f183

rerun ci

a97b832

wanghuancoder reviewed May 16, 2025

View reviewed changes

lshpku reviewed May 16, 2025

View reviewed changes

huangjiyi added 2 commits May 16, 2025 11:16

refine

6e52d08

decrease kps change

fde5312

lshpku approved these changes May 19, 2025

View reviewed changes

huangjiyi merged commit b2ae891 into develop May 19, 2025
49 of 50 checks passed

huangjiyi deleted the fix_big_tensor_for_where branch May 19, 2025 04:09

wanghuancoder pushed a commit to wanghuancoder/Paddle that referenced this pull request May 27, 2025

[PHI] Fix paddle.where api for big tensor (PaddlePaddle#72717)

3a81f1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PHI][CINN] Fix paddle.where api for big tensor #72717

[PHI][CINN] Fix paddle.where api for big tensor #72717

Uh oh!

huangjiyi commented May 14, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented May 14, 2025

Uh oh!

wanghuancoder May 16, 2025

Uh oh!

huangjiyi May 16, 2025

Uh oh!

lshpku May 16, 2025

Uh oh!

huangjiyi May 16, 2025

Uh oh!

Uh oh!

Uh oh!

		int64_t data_offset = BLOCK_ID_X * BLOCK_NUM_X * VecSize;
		int64_t stride = BLOCK_NUM_X * GRID_NUM_X * VecSize;

[PHI][CINN] Fix paddle.where api for big tensor #72717

[PHI][CINN] Fix paddle.where api for big tensor #72717

Uh oh!

Conversation

huangjiyi commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented May 14, 2025

Uh oh!

wanghuancoder May 16, 2025

Choose a reason for hiding this comment

Uh oh!

huangjiyi May 16, 2025

Choose a reason for hiding this comment

Uh oh!

lshpku May 16, 2025

Choose a reason for hiding this comment

Uh oh!

huangjiyi May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

huangjiyi commented May 14, 2025 •

edited

Loading