Exporters From Japan

HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 date: Sat, 19 Jul 2025 11:18:18 GMT content-type: text/html; charset=utf-8 cache-control: no-cache content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/ referrer-policy: no-referrer-when-downgrade server-timing: pull_request_layout-fragment;desc="pull_request_layout fragment";dur=381.827334,conversation_content-fragment;desc="conversation_content fragment";dur=830.114833,conversation_sidebar-fragment;desc="conversation_sidebar fragment";dur=342.768835,nginx;desc="NGINX";dur=1.468732,glb;desc="GLB";dur=94.968629 strict-transport-security: max-age=31536000; includeSubdomains; preload vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With x-content-type-options: nosniff x-frame-options: deny x-voltron-version: fd8fbbc x-xss-protection: 0 server: github.com content-encoding: gzip accept-ranges: bytes set-cookie: _gh_sess=ioAOKP5eUpmShTKEuaeyvzuPgt1uPXJm1mm6npn1kSMpliX%2Bjav4o81hgrpjSEbWaGmz4UYozalDm6OmhsuHb0hqU5ix1lkkItGr4%2FUk3t4wS6brE70DfAiyyK4BXJvgkRFW9ZlPQ6O5caJiGwlay2jC7t9JwSl0T97Uq7UKpqtrQVEQJJjEi10R64QHfTCHf0QyYetyG%2FZgE7KdsFVAdyrohc1sybar3MzimEDV0IDlr8RkzXuy4fM4J0kGUGVQ0zCs33aLCkTERsJY7h3xqQ%3D%3D--YzTMTKYfWkAuUtmZ--AGeLWdWC7oCgvYmKmLp3dw%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax set-cookie: _octo=GH1.1.237959456.1752923898; Path=/; Domain=github.com; Expires=Sun, 19 Jul 2026 11:18:18 GMT; Secure; SameSite=Lax set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sun, 19 Jul 2026 11:18:18 GMT; HttpOnly; Secure; SameSite=Lax x-github-request-id: CF64:46557:412586:51ED48:687B7EFA [AOTI] Fix a special case compile time data type codegen for sym int variables by YUNQIUGUO · Pull Request #138106 · pytorch/pytorch · GitHub

Skip to content

Navigation Menu

Appearance settings

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

pytorch / pytorch Public

Notifications You must be signed in to change notification settings
Fork 24.7k
Star 91.6k

Code
Issues 5k+
Pull requests 1.3k
Actions
Projects 12
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

[AOTI] Fix a special case compile time data type codegen for sym int variables #138106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

YUNQIUGUO wants to merge 1 commit into pytorch:main from YUNQIUGUO:export-D64490039

Closed

[AOTI] Fix a special case compile time data type codegen for sym int variables #138106

YUNQIUGUO wants to merge 1 commit into pytorch:main from YUNQIUGUO:export-D64490039

Conversation 15 Commits 1 Checks 276 Files changed

Uh oh!

There was an error while loading. Please reload this page.

Conversation

Copy link

Contributor

YUNQIUGUO commented Oct 16, 2024 •

edited by pytorch-bot bot

Loading

Summary:
This change unblocks the CFR AOTI lowering runtime error.

TL;DR:

In this model, one triton kernel expects a scalar input dtype as i64, but getting an i32. The reason is "auto" can infer a smaller data type if the variable it passed in e.g. is i32. thus cause CUDA IMA.
Original problematic kernel: triton_poi_fused_add_ge_logical_and_logical_or_lt_46_grid_100.

This diff manually cast it to i64 for all symbolic arguments in compile time for i64 triton kernel inputs, instead of use auto var_x = {arg} in cpp wrapper code.

Test Plan:
Verified in FLB locally:

PYTORCH_NO_CUDA_MEMORY_CACHING=1 AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3 TORCH_LOGS="output_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCH_SHOW_CPP_STACKTRACES=1 CUDA_LAUNCH_BLOCKING=1 ~/fbsource/buck-out/v2/gen/fbcode/98e643f8bb44fe9d/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --skip-eager --skip-flop-estimation --lower-backend="AOT_INDUCTOR" --sync-mode=0 --precision bf16 --output-precision bf16  --lower-presets="ifr_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change" --remove-unexpected-type-cast=False --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/924293663/0/gpu_lowering/input.merge"```
Differential Revision: D64490039
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Copy link

pytorch-bot bot commented Oct 16, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138106

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 2267787 with merge base 620039c ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100) (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

pytorch-bot bot added ciflow/inductor module: inductor labels

Copy link

Contributor

facebook-github-bot commented Oct 16, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

facebook-github-bot added the fb-exported label

YUNQIUGUO force-pushed the export-D64490039 branch from 67f74f2 to 9640411 Compare

October 16, 2024 20:15

Copy link

Contributor

facebook-github-bot commented Oct 16, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO added the topic: not user facing topic category label

YUNQIUGUO force-pushed the export-D64490039 branch from 9640411 to 52d4e80 Compare

October 16, 2024 21:25

Copy link

Contributor

facebook-github-bot commented Oct 16, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO force-pushed the export-D64490039 branch from 52d4e80 to 68df8f6 Compare

October 16, 2024 22:27

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI] Fix a special case compile time data type codegen for sym int …

68df8f6

…variables (pytorch#138106)
Summary:
This change unblocks the CFR AOTI lowering runtime error.
TL;DR:
In this model, one triton kernel expects a scalar input dtype as i64, but getting an i32. The reason is "auto"  can infer a smaller data type if the variable get passed in e.g. is i32. thus cause CUDA IMA.
 Original problematic kernel: `triton_poi_fused_add_ge_logical_and_logical_or_lt_46_grid_100`. and third input `auto var_402 = u0`.
This diff explicitly specifies it to i64 for all symbolic arguments in compile time for i64 triton kernel inputs, instead of use `auto var_x = {arg}` in cpp wrapper code.
Test Plan:
Verified in FLB locally:
```
PYTORCH_NO_CUDA_MEMORY_CACHING=1 AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3 TORCH_LOGS="output_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCH_SHOW_CPP_STACKTRACES=1 CUDA_LAUNCH_BLOCKING=1 ~/fbsource/buck-out/v2/gen/fbcode/98e643f8bb44fe9d/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --skip-eager --skip-flop-estimation --lower-backend="AOT_INDUCTOR" --sync-mode=0 --precision bf16 --output-precision bf16  --lower-presets="ifr_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change" --remove-unexpected-type-cast=False --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/924293663/0/gpu_lowering/input.merge"```
Differential Revision: D64490039

Copy link

Contributor

facebook-github-bot commented Oct 16, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO force-pushed the export-D64490039 branch from 68df8f6 to c4aa337 Compare

October 17, 2024 00:47

Copy link

Contributor

facebook-github-bot commented Oct 17, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO force-pushed the export-D64490039 branch from c4aa337 to 98dacff Compare

October 17, 2024 05:33

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI] Fix a special case compile time data type codegen for sym int …

98dacff

…variables (pytorch#138106)
Summary:
This change unblocks the CFR AOTI lowering runtime error.
TL;DR:
In this model, one triton kernel expects a scalar input dtype as i64, but getting an i32. The reason is "auto"  can infer a smaller data type if the variable get passed in e.g. is i32. thus cause CUDA IMA.
 Original problematic kernel: `triton_poi_fused_add_ge_logical_and_logical_or_lt_46_grid_100`. and third input `auto var_402 = u0`.
This diff explicitly specifies it to i64 for all symbolic arguments in compile time for i64 triton kernel inputs, instead of use `auto var_x = {arg}` in cpp wrapper code.
Test Plan:
Verified in FLB locally:
```
PYTORCH_NO_CUDA_MEMORY_CACHING=1 AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3 TORCH_LOGS="output_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCH_SHOW_CPP_STACKTRACES=1 CUDA_LAUNCH_BLOCKING=1 ~/fbsource/buck-out/v2/gen/fbcode/98e643f8bb44fe9d/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --skip-eager --skip-flop-estimation --lower-backend="AOT_INDUCTOR" --sync-mode=0 --precision bf16 --output-precision bf16  --lower-presets="ifr_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change" --remove-unexpected-type-cast=False --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/924293663/0/gpu_lowering/input.merge"```
Differential Revision: D64490039

Copy link

Contributor

facebook-github-bot commented Oct 17, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO force-pushed the export-D64490039 branch from 98dacff to 4be7f82 Compare

October 17, 2024 16:32

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI] Fix a special case compile time data type codegen for sym int …

4be7f82

…variables (pytorch#138106)
Summary:
This change unblocks the CFR AOTI lowering runtime error.
TL;DR:
In this model, one triton kernel expects a scalar input dtype as i64, but getting an i32. The reason is "auto"  can infer a smaller data type if the variable get passed in e.g. is i32. thus cause CUDA IMA.
 Original problematic kernel: `triton_poi_fused_add_ge_logical_and_logical_or_lt_46_grid_100`. and third input `auto var_402 = u0`.
This diff explicitly specifies it to i64 for all symbolic arguments in compile time for i64 triton kernel inputs, instead of use `auto var_x = {arg}` in cpp wrapper code.
Test Plan:
Verified in FLB locally:
```
PYTORCH_NO_CUDA_MEMORY_CACHING=1 AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3 TORCH_LOGS="output_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCH_SHOW_CPP_STACKTRACES=1 CUDA_LAUNCH_BLOCKING=1 ~/fbsource/buck-out/v2/gen/fbcode/98e643f8bb44fe9d/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --skip-eager --skip-flop-estimation --lower-backend="AOT_INDUCTOR" --sync-mode=0 --precision bf16 --output-precision bf16  --lower-presets="ifr_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change" --remove-unexpected-type-cast=False --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/924293663/0/gpu_lowering/input.merge"```
Differential Revision: D64490039

Copy link

Contributor

facebook-github-bot commented Oct 17, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO force-pushed the export-D64490039 branch from 4be7f82 to 12cab86 Compare

October 17, 2024 18:19

Copy link

Contributor

facebook-github-bot commented Oct 17, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO force-pushed the export-D64490039 branch from 12cab86 to aa6d908 Compare

October 17, 2024 20:37

Copy link

Contributor

facebook-github-bot commented Oct 17, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO force-pushed the export-D64490039 branch from aa6d908 to 0240146 Compare

October 17, 2024 23:38

Copy link

Contributor

facebook-github-bot commented Oct 17, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI] Fix a special case compile time data type codegen for sym int …

2d258c8

…variables (pytorch#138106)
Summary:
This change unblocks the CFR AOTI lowering runtime error.
TL;DR:
In this model, one triton kernel expects a scalar input dtype as i64, but getting an i32. The reason is "auto"  can infer a smaller data type if the variable get passed in e.g. is i32. thus cause CUDA IMA.
 Original problematic kernel: `triton_poi_fused_add_ge_logical_and_logical_or_lt_46_grid_100`. and third input `auto var_402 = u0`.
This diff explicitly specifies it to i64 for all symbolic arguments in compile time for i64 triton kernel inputs, instead of use `auto var_x = {arg}` in cpp wrapper code.
Test Plan:
Verified in FLB locally:
```
PYTORCH_NO_CUDA_MEMORY_CACHING=1 AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3 TORCH_LOGS="output_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCH_SHOW_CPP_STACKTRACES=1 CUDA_LAUNCH_BLOCKING=1 ~/fbsource/buck-out/v2/gen/fbcode/98e643f8bb44fe9d/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --skip-eager --skip-flop-estimation --lower-backend="AOT_INDUCTOR" --sync-mode=0 --precision bf16 --output-precision bf16  --lower-presets="ifr_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change" --remove-unexpected-type-cast=False --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/924293663/0/gpu_lowering/input.merge"```
Differential Revision: D64490039

YUNQIUGUO force-pushed the export-D64490039 branch from 0240146 to 2d258c8 Compare

October 17, 2024 23:45

Copy link

Contributor

facebook-github-bot commented Oct 17, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.


          [AOTI] Fix a special case compile time data type codegen for sym int …

…variables (pytorch#138106)
Summary:
This change unblocks the CFR AOTI lowering runtime error.
TL;DR:
In this model, one triton kernel expects a scalar input dtype as i64, but getting an i32. The reason is "auto"  can infer a smaller data type if the variable get passed in e.g. is i32. thus cause CUDA IMA.
 Original problematic kernel: `triton_poi_fused_add_ge_logical_and_logical_or_lt_46_grid_100`. and third input `auto var_402 = u0`.
This diff explicitly specifies it to i64 for all symbolic arguments in compile time for i64 triton kernel inputs, instead of use `auto var_x = {arg}` in cpp wrapper code.
Test Plan:
Verified in FLB locally:
```
PYTORCH_NO_CUDA_MEMORY_CACHING=1 AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3 TORCH_LOGS="output_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCH_SHOW_CPP_STACKTRACES=1 CUDA_LAUNCH_BLOCKING=1 ~/fbsource/buck-out/v2/gen/fbcode/98e643f8bb44fe9d/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --skip-eager --skip-flop-estimation --lower-backend="AOT_INDUCTOR" --sync-mode=0 --precision bf16 --output-precision bf16  --lower-presets="ifr_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change" --remove-unexpected-type-cast=False --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/924293663/0/gpu_lowering/input.merge"```
Differential Revision: D64490039

YUNQIUGUO force-pushed the export-D64490039 branch from 2d258c8 to 2267787 Compare

October 18, 2024 18:01

Copy link

Contributor

facebook-github-bot commented Oct 18, 2024

This pull request was exported from Phabricator. Differential Revision: D64490039

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

ColinPeppler approved these changes

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label

Copy link

Contributor

facebook-github-bot commented Oct 19, 2024

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorch-bot[bot] reacted with thumbs up emoji

All reactions

👍 1 reaction

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

pytorchmergebot added the merging label

Copy link

Collaborator

pytorchmergebot commented Oct 19, 2024

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

pytorchmergebot added the Merged label

pytorchmergebot closed this in

ea412d5

pytorchmergebot removed the merging label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ColinPeppler ColinPeppler approved these changes

Assignees

No one assigned

Labels

ciflow/inductor ciflow/trunk

Trigger trunk jobs on your pull request

fb-exported Merged module: inductor topic: not user facing

topic category

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Uh oh!

There was an error while loading. Please reload this page.

4 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source