CARVIEW |
Select Language
HTTP/2 200
date: Sat, 26 Jul 2025 10:20:53 GMT
content-type: text/html; charset=utf-8
cache-control: max-age=0, private, must-revalidate
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
link: ; rel=preload; as=fetch; crossorigin=use-credentials
referrer-policy: no-referrer-when-downgrade
server-timing: issue_layout-fragment;desc="issue_layout fragment";dur=241.941327,issue_conversation_content-fragment;desc="issue_conversation_content fragment";dur=517.332809,issue_conversation_sidebar-fragment;desc="issue_conversation_sidebar fragment";dur=41.227006,nginx;desc="NGINX";dur=1.240678,glb;desc="GLB";dur=101.151565
strict-transport-security: max-age=31536000; includeSubdomains; preload
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With, Accept,Accept-Encoding, Accept, X-Requested-With
x-content-type-options: nosniff
x-frame-options: deny
x-voltron-version: a2eb102
x-xss-protection: 0
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=EZF463BtiKhrW6rumuk498%2F2zWgCoul4tZoHHlbcJcj95rb2Bj7c%2FRroE39VJuJSfbKhdL%2FIT8Nj05saC%2BwMkLEDdMO7WYcyc7HjW6ce3Qy4awegyL%2Fptlj9Tf68piLk%2BvqSLH0TW7ajwp3lRWqjfVl%2Bga1RgklutdmdxfO9KV4yMtq3olh9EPB55GhDT9378Eqh0jxoOmM04DAvvImPrWlj3MI%2BLlwp7QQe9vwh0o%2FNq1jUxqWEGzNCKuBsS1BlFIWIRxTcikGaMStzaOeGCw%3D%3D--g6PF%2F41ycTq8HaX%2F--ptQAqKAc4y0POvuThnkWrA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1702291854.1753525253; Path=/; Domain=github.com; Expires=Sun, 26 Jul 2026 10:20:53 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sun, 26 Jul 2026 10:20:53 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: CC96:1AEE08:44B10E:5AC427:6884AC05
[Bug]: Slowness and/or broken metrics visualization when Lineage metrics is large · Issue #32649 · apache/beam · GitHub
No typeNo projectsNone yetNo branches or pull requests
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Closed
Description
What happened?
Beam Java 2.59.0 introduced Lineage metrics support for file-based IO (FileIO, TextIO, etc).
- When a pipeline read from lots of files (e.g. using a file pattern and match lots of file), one observes Dataflow UI metrics based components are broken. For example, live throughput no longer shown, progress bar stale, user counters increment incompletely.
This is due to some internal limit of total job status response size of Dataflow runner (grpc limit ~20 MB). When the size is exceeded such limit, all metrics update (counter, stringset, etc) gets dropped
- Writes to lots of files (e.g. set a large shard number), one observe the following slowness:
Operation ongoing in step Write content to files/WriteFiles/FinalizeTempFileBundles/Finalize for at least 15m00s without outputting or completing in state process in thread pool-3-thread-2 with id 27
at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$RegularSetBuilderImpl.insertInHashTable(ImmutableSet.java:780)
at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$RegularSetBuilderImpl.add(ImmutableSet.java:763)
at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$Builder.add(ImmutableSet.java:527)
at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$Builder.add(ImmutableSet.java:478)
at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:475)
at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:549)
at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.metrics.StringSetData.combine(StringSetData.java:58)
at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.metrics.StringSetCell.update(StringSetCell.java:62)
at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.metrics.StringSetCell.add(StringSetCell.java:104)
at org.apache.beam.sdk.metrics.Metrics$DelegatingStringSet.add(Metrics.java:179)
at org.apache.beam.sdk.metrics.Lineage.add(Lineage.java:133)
This was because the stringset metrics is added in the finalize write step (after moving temp file to final destination), done on single worker. Unfortunately current implementation of stringSetData.addAll is of O(N^2) complexity -- each time it copies to a new ImmutableSet, and done this for N elements.
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
Issue actions
You can’t perform that action at this time.