Run inductor micro benchmark on x86 metal runner #135042

huydhn · 2024-09-03T18:54:21Z

This enables inductor micro benchmark on CPU (x86):

Running on AWS metal runner for more accurate benchmark
I add a new arch column, which will be either x86_64 or arm64 for CPU or GPU name for GPU. We can use this later to differentiate between different setup, i.e. cuda (a100) vs cuda (a10g) or cpu (x86_64) vs cpu (arm64)

The next step would be to run this one cpu arm64, and cuda (a10g).

Testing

Here is the CSV results from my test run https://github.com/pytorch/pytorch/actions/runs/10709344180

name,metric,target,actual,dtype,device,arch,is_model
mlp_layer_norm_gelu,flops_utilization,0.8,17.36,bfloat16,cpu,x86_64,False
gather_gemv,memory_bandwidth(GB/s),990,170.80,int8,cpu,x86_64,False
gather_gemv,memory_bandwidth(GB/s),1060,204.78,bfloat16,cpu,x86_64,False
Mixtral-8x7B-v0.1,token_per_sec,175,26.68,int8,cpu,x86_64,True
Mixtral-8x7B-v0.1,memory_bandwidth(GB/s),1130,171.91,int8,cpu,x86_64,True
Mixtral-8x7B-v0.1,compilation_time(s),162,47.36,int8,cpu,x86_64,True
gemv,memory_bandwidth(GB/s),870,236.36,int8,cpu,x86_64,False
gemv,memory_bandwidth(GB/s),990,305.71,bfloat16,cpu,x86_64,False
Llama-2-7b-chat-hf,token_per_sec,94,14.01,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,memory_bandwidth(GB/s),1253,185.18,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,compilation_time(s),162,74.99,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,token_per_sec,144,25.09,int8,cpu,x86_64,True
Llama-2-7b-chat-hf,memory_bandwidth(GB/s),957,165.83,int8,cpu,x86_64,True
Llama-2-7b-chat-hf,compilation_time(s),172,70.69,int8,cpu,x86_64,True
layer_norm,memory_bandwidth(GB/s),950,172.03,bfloat16,cpu,x86_64,False

pytorch-bot · 2024-09-03T18:54:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135042

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b531007 with merge base 6c37674 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

yanboliang · 2024-09-04T20:19:42Z

We have to update the pre-defined numbers for CPU as well. What does the dashboard looks like after this change? Is CUDA and CPU in the same tab or separate one? I'd prefer they are separate ones.

yanboliang · 2024-09-04T20:20:19Z

We have to update the pre-defined numbers for CPU as well.

I'm ok to do this in a follow up PR.

huydhn · 2024-09-04T20:47:30Z

We have to update the pre-defined numbers for CPU as well. What does the dashboard looks like after this change? Is CUDA and CPU in the same tab or separate one? I'd prefer they are separate ones.

I expect the result to be on the same page at https://hud.pytorch.org/benchmark/llms, but is listed under different device types, i.e.

huydhn · 2024-09-04T20:56:28Z

.ci/pytorch/test.sh

@@ -596,6 +596,9 @@ test_single_dynamo_benchmark() {

 test_inductor_micro_benchmark() {
  TEST_REPORTS_DIR=$(pwd)/test/test-reports
+  if [[ "${TEST_CONFIG}" == *cpu* ]]; then


We have to update the pre-defined numbers for CPU as well.

I was about to add this part but accidentally removed it because of a merge conflict. Is there anything else you have in mind that we need?

(just curious to learn more about CPU benchmark is setup, if the remaining part is complex, let's do that in a separate PR)

Actually the perf target in each experiment is for CUDA only, we should extend it to support multiple targets on different devices, but I think we can do it in a separate PR.

huydhn · 2024-09-05T21:29:45Z

@pytorchbot merge -f 'This should just require lint I guess'

pytorchmergebot · 2024-09-05T21:31:26Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Upload stats workflow currently skips this https://github.com/pytorch/pytorch/actions/runs/10807251335/job/29977650639, this is a miss from #135042. So, the workflow is running but nothing has been uploaded yet. Pull Request resolved: #135780 Approved by: https://github.com/atalman

With pytorch/pytorch#135042, there is now information about the device arch from the benchmark to separate different CUDA or CPU types. Instead of showing device like CUDA, we need to be more specific, for example: * cpu (x86_64) * cpu (arm64) * cuda (NVIDIA A100-SXM4-40GB) ### Testing https://torchci-git-fork-huydhn-add-cpu-device-llm-a5f029-fbopensource.vercel.app/benchmark/llms shows different devices and their archs.

This enables inductor micro benchmark on CPU (x86): * Running on AWS metal runner for more accurate benchmark * I add a new `arch` column, which will be either x86_64 or arm64 for CPU or GPU name for GPU. We can use this later to differentiate between different setup, i.e. cuda (a100) vs cuda (a10g) or cpu (x86_64) vs cpu (arm64) The next step would be to run this one cpu arm64, and cuda (a10g). ### Testing Here is the CSV results from my test run https://github.com/pytorch/pytorch/actions/runs/10709344180 ``` name,metric,target,actual,dtype,device,arch,is_model mlp_layer_norm_gelu,flops_utilization,0.8,17.36,bfloat16,cpu,x86_64,False gather_gemv,memory_bandwidth(GB/s),990,170.80,int8,cpu,x86_64,False gather_gemv,memory_bandwidth(GB/s),1060,204.78,bfloat16,cpu,x86_64,False Mixtral-8x7B-v0.1,token_per_sec,175,26.68,int8,cpu,x86_64,True Mixtral-8x7B-v0.1,memory_bandwidth(GB/s),1130,171.91,int8,cpu,x86_64,True Mixtral-8x7B-v0.1,compilation_time(s),162,47.36,int8,cpu,x86_64,True gemv,memory_bandwidth(GB/s),870,236.36,int8,cpu,x86_64,False gemv,memory_bandwidth(GB/s),990,305.71,bfloat16,cpu,x86_64,False Llama-2-7b-chat-hf,token_per_sec,94,14.01,bfloat16,cpu,x86_64,True Llama-2-7b-chat-hf,memory_bandwidth(GB/s),1253,185.18,bfloat16,cpu,x86_64,True Llama-2-7b-chat-hf,compilation_time(s),162,74.99,bfloat16,cpu,x86_64,True Llama-2-7b-chat-hf,token_per_sec,144,25.09,int8,cpu,x86_64,True Llama-2-7b-chat-hf,memory_bandwidth(GB/s),957,165.83,int8,cpu,x86_64,True Llama-2-7b-chat-hf,compilation_time(s),172,70.69,int8,cpu,x86_64,True layer_norm,memory_bandwidth(GB/s),950,172.03,bfloat16,cpu,x86_64,False ``` Pull Request resolved: pytorch#135042 Approved by: https://github.com/yanboliang

Upload stats workflow currently skips this https://github.com/pytorch/pytorch/actions/runs/10807251335/job/29977650639, this is a miss from pytorch#135042. So, the workflow is running but nothing has been uploaded yet. Pull Request resolved: pytorch#135780 Approved by: https://github.com/atalman

Run inductor micro benchmark on x86 metal runner

641aee9

pytorch-bot bot added the release notes: releng release notes category label Sep 3, 2024

huydhn added 8 commits September 3, 2024 12:32

Handle AssertionError

037f074

Correctly pass the device name

7997a6f

Yet another missing device param

359bcc7

Set device in benchmark function call

ad99238

Use benchmark_cpu

7550e58

Fix run_gemv

500fe1f

It's working, upload the stats

cb1c970

Remove debug comment

16900e9

huydhn requested a review from yanboliang September 3, 2024 23:39

huydhn marked this pull request as ready for review September 3, 2024 23:39

huydhn requested a review from a team as a code owner September 3, 2024 23:39

Apply the same CPU config as inductor benchmark

258afd9

huydhn commented Sep 4, 2024

View reviewed changes

yanboliang approved these changes Sep 5, 2024

View reviewed changes

Ready to land

b531007

huydhn added the test-config/default label Sep 5, 2024

pytorchmergebot added the merging label Sep 5, 2024

pytorchmergebot added the Merged label Sep 5, 2024

pytorchmergebot closed this in 24a223c Sep 5, 2024

pytorchmergebot removed the merging label Sep 5, 2024

huydhn mentioned this pull request Sep 12, 2024

Fix the upload of x86 micro benchmark results #135780

Closed

huydhn mentioned this pull request Sep 12, 2024

Add arch to the device type on LLMs benchmark dashboard pytorch/test-infra#5644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run inductor micro benchmark on x86 metal runner #135042

Run inductor micro benchmark on x86 metal runner #135042

Uh oh!

huydhn commented Sep 3, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 3, 2024 •

edited

Loading

Uh oh!

yanboliang commented Sep 4, 2024

Uh oh!

yanboliang commented Sep 4, 2024

Uh oh!

huydhn commented Sep 4, 2024 •

edited

Loading

Uh oh!

huydhn Sep 4, 2024 •

edited

Loading

Uh oh!

yanboliang Sep 5, 2024

Uh oh!

huydhn commented Sep 5, 2024

Uh oh!

pytorchmergebot commented Sep 5, 2024

Uh oh!

Uh oh!

Run inductor micro benchmark on x86 metal runner #135042

Run inductor micro benchmark on x86 metal runner #135042

Uh oh!

Conversation

huydhn commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

pytorch-bot bot commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135042

✅ No Failures

Uh oh!

yanboliang commented Sep 4, 2024

Uh oh!

yanboliang commented Sep 4, 2024

Uh oh!

huydhn commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huydhn Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanboliang Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

huydhn commented Sep 5, 2024

Uh oh!

pytorchmergebot commented Sep 5, 2024

Merge started

Uh oh!

Uh oh!

huydhn commented Sep 3, 2024 •

edited

Loading

pytorch-bot bot commented Sep 3, 2024 •

edited

Loading

huydhn commented Sep 4, 2024 •

edited

Loading

huydhn Sep 4, 2024 •

edited

Loading