Model: gemini/gemini-2.5-pro
files-to-prompt . -e go
on a checkout of https://github.com/google/oss-rebuild
Describe where the underlying data lives and provide CLI commands I can run to access a list of files in those buckets
Based on the provided documents, the underlying data for the OSS Rebuild project primarily lives in several Google Cloud Storage (GCS) buckets. Different types of data, such as build artifacts, logs, and final attestations, are stored in separate buckets for organization and access control.
Here is a description of where the data lives and the corresponding CLI commands to list the files in those buckets.
The system uses the following GCS buckets to store various types of data:
-
Attestation Bucket: This is the most important bucket for end-users. It stores the final, signed attestation bundles that verify the successful rebuild of a package. The
oss-rebuild
CLI tool pulls from this bucket by default.- Flag Name:
--attestation-bucket
(incmd/api/main.go
) or--bucket
(incmd/oss-rebuild/main.go
) - Default Name:
google-rebuild-attestations
- Flag Name:
-
Metadata Bucket: This bucket holds intermediate artifacts and metadata generated during the rebuild process. This includes the rebuilt package itself (
.whl
,.tgz
, etc.) and logs from various tools used in the build.- Flag Name:
--metadata-bucket
- Flag Name:
-
Logs Bucket: This bucket is dedicated to storing the raw build logs from Google Cloud Build (GCB), which executes the remote rebuilds.
- Flag Name:
--logs-bucket
- Flag Name:
-
Prebuild Tools Bucket: This bucket stores pre-compiled utility binaries (like
timewarp
and a write-onlygsutil
) that are used inside the build environment.- Flag Name:
--prebuild-bucket
- Flag Name:
-
Git Cache Bucket: To speed up build times, this bucket is used as a cache for git repositories, storing them as
.tgz
archives.- Flag Name:
--bucket
(incmd/git_cache/main.go
)
- Flag Name:
-
Agent Sessions & Metadata Buckets: When using the experimental AI agent features, these buckets store session data and build metadata related to the agent's operations.
- Flag Names:
--agent-sessions-bucket
,--agent-metadata-bucket
- Flag Names:
You can use the gsutil
command-line tool to list the files within these GCS buckets. The -r
flag is used for a recursive listing, which is helpful for browsing the directory structure.
This is the primary bucket you would interact with to find rebuild results.
-
To list all attestations in the default bucket:
gsutil ls -r gs://google-rebuild-attestations/**
-
To list attestations for a specific package (e.g.,
requests-2.28.1
from PyPI):# Usage: gsutil ls -r gs://<bucket-name>/<ecosystem>/<package-name>/<version>/** gsutil ls -r gs://google-rebuild-attestations/pypi/requests/2.28.1/**
To inspect the intermediate artifacts from a build, you would need the name of the metadata bucket used for your deployment.
- To list all metadata files (replace
<METADATA_BUCKET>
with the actual bucket name):gsutil ls -r gs://<METADATA_BUCKET>/**
To view the raw GCB logs for all builds.
- To list all GCB log files (replace
<LOGS_BUCKET>
with the actual bucket name):gsutil ls -r gs://<LOGS_BUCKET>/**
To see the utility binaries available to the build environment.
- To list all prebuilt tools (replace
<PREBUILD_BUCKET>
with the actual bucket name):gsutil ls -r gs://<PREBUILD_BUCKET>/**
To see which git repositories have been cached.
- To list all cached repos (replace
<GIT_CACHE_BUCKET>
with the actual bucket name):gsutil ls -r gs://<GIT_CACHE_BUCKET>/**
559,213 input, 3,343 output, {"candidatesTokenCount": 996, "promptTokensDetails": [{"modality": "TEXT", "tokenCount": 559213}], "thoughtsTokenCount": 2347}
If I have a thingy with this https://storage.googleapis.com/google-rebuild-attestations/pypi/python-slugify/6.1.2/python_slugify-6.1.2-py2.py3-none-any.whl/rebuild.intoto.jsonl as one of the URLs, what would the full URLs be to any other interesting pages about that - metadata or whatever - given the default buckets for this tool?
Of course. Given the URL to an attestation file and the default bucket naming conventions from the tool's documentation, you can trace back to other interesting artifacts like build logs and the rebuilt package itself.
The key to linking everything together is the invocationId
stored within the attestation file. This ID corresponds to an ObliviousID
used to sandbox build artifacts.
Here’s a step-by-step guide to finding the related URLs.
First, you need to download the rebuild.intoto.jsonl
file and extract the invocationId
. This file is a JSONL (newline-delimited JSON), containing multiple attestation statements. The invocationId
will be the same in all of them.
-
Download the file:
curl -o rebuild.intoto.jsonl "https://storage.googleapis.com/google-rebuild-attestations/pypi/python-slugify/6.1.2/python_slugify-6.1.2-py2.py3-none-any.whl/rebuild.intoto.jsonl"
-
Extract the
invocationId
: You can use a tool likejq
to parse the JSON and get the value. This ID is the unique identifier for the build run.# This command parses the first JSON object in the file INVOCATION_ID=$(jq -r '.predicate.runDetails.metadata.invocationId' rebuild.intoto.jsonl | head -n 1) echo $INVOCATION_ID
This will give you a UUID, for example:
f4b6c2d0-1b9a-4c8e-a9d7-8e4f0a2b1c3d
. Let's call this<INVOCATION_ID>
.
To get the most detailed logs, you need the Google Cloud Build (GCB) ID. This is stored in an info.json
file in the debug storage bucket.
Note: The debug storage bucket name is not standardized with a default; it must be configured with the --debug-storage
flag during the API service deployment. You would need to know this bucket's name. Let's assume its name is <DEBUG_STORAGE_BUCKET>
.
The info.json
file contains valuable metadata, including the GCB Build ID. Unfortunately, its path is keyed by a RunID
which is not present in the public attestation.
If you were the one who triggered the build, you would have the RunID
. A third-party observer cannot currently link a public attestation back to the debug logs without this RunID
.
However, if you do know the RunID
(e.g., from Firestore or logs if you are an operator), the path would be:
gs://<DEBUG_STORAGE_BUCKET>/<RUN_ID>/pypi/python-slugify/6.1.2/python_slugify-6.1.2-py2.py3-none-any.whl/info.json
Assuming you could access it, you would extract the BuildID
like this:
# Hypothetical if you knew the RUN_ID
gsutil cat gs://<DEBUG_STORAGE_BUCKET>/<RUN_ID>/.../info.json | jq -r '.BuildID'
This would give you the <GCB_BUILD_ID>
.
With the <INVOCATION_ID>
and bucket names, you can now construct the URLs.
-
Location: The default logs bucket is typically named
<PROJECT_ID>_cloudbuild
or configured via the--logs-bucket
flag. -
Path: The log file is named
log-<GCB_BUILD_ID>.txt
. -
URL (if you found
<GCB_BUILD_ID>
):- GCS URI:
gs://<LOGS_BUCKET>/log-<GCB_BUILD_ID>.txt
- HTTP URL:
https://storage.googleapis.com/<LOGS_BUCKET>/log-<GCB_BUILD_ID>.txt
- Cloud Console URL:
https://console.cloud.google.com/cloud-build/builds/<GCB_BUILD_ID>?project=<PROJECT_ID>
- GCS URI:
-
Location: The metadata bucket, configured via
--metadata-bucket
. Let's assume a default name ofgoogle-rebuild-metadata
. -
Path: The artifacts are stored in a path keyed by the
<INVOCATION_ID>
. -
URL:
- GCS URI:
gs://google-rebuild-metadata/<INVOCATION_ID>/python_slugify-6.1.2-py2.py3-none-any.whl
- HTTP URL:
https://storage.googleapis.com/google-rebuild-metadata/<INVOCATION_ID>/python_slugify-6.1.2-py2.py3-none-any.whl
- GCS URI:
The metadata bucket also contains other useful byproducts from the build, all keyed by the same <INVOCATION_ID>
.
-
Container Image (
image.tgz
):- URL:
https://storage.googleapis.com/google-rebuild-metadata/<INVOCATION_ID>/image.tgz
- URL:
-
Network Log (
netlog.json
), if a proxied build was used:- URL:
https://storage.googleapis.com/google-rebuild-metadata/<INVOCATION_ID>/netlog.json
- URL:
-
Syscall Log (
tetragon.jsonl
), if syscall monitoring was used:- URL:
https://storage.googleapis.com/google-rebuild-metadata/<INVOCATION_ID>/tetragon.jsonl
- URL:
Here is a summary of the URLs you can construct, using the <INVOCATION_ID>
from the attestation file and assuming default bucket names:
Item | Default Bucket Name | URL / Path Structure |
---|---|---|
Rebuilt Artifact | google-rebuild-metadata |
https://storage.googleapis.com/google-rebuild-metadata/<INVOCATION_ID>/python_slugify-6.1.2-py2.py3-none-any.whl |
Container Image | google-rebuild-metadata |
https://storage.googleapis.com/google-rebuild-metadata/<INVOCATION_ID>/image.tgz |
Network Log | google-rebuild-metadata |
https://storage.googleapis.com/google-rebuild-metadata/<INVOCATION_ID>/netlog.json |
Syscall Log | google-rebuild-metadata |
https://storage.googleapis.com/google-rebuild-metadata/<INVOCATION_ID>/tetragon.jsonl |
GCB Build Logs | <LOGS_BUCKET> |
https://storage.googleapis.com/<LOGS_BUCKET>/log-<GCB_BUILD_ID>.txt (Requires BuildID from debug storage) |
Debug Info (info.json ) |
<DEBUG_STORAGE_BUCKET> |
gs://<DEBUG_STORAGE_BUCKET>/<RUN_ID>/.../info.json (Requires RunID , which is not in the public attestation) |
560,302 input, 5,170 output, {"candidatesTokenCount": 1778, "promptTokensDetails": [{"modality": "TEXT", "tokenCount": 560302}], "thoughtsTokenCount": 3392}
path normalized/absl_py-2.0.0-py3-none-any.whl and invocation ID ae2407af-473c-48fd-9267-93b6499c5972
Of course! Using the invocation ID ae2407af-473c-48fd-9267-93b6499c5972
, we can construct the direct URLs to the build artifacts and other metadata stored in the default GCS buckets.
The "normalized path" you mentioned likely refers to how files are organized locally by the ctl
tool, but the remote GCS storage uses the invocation ID as the primary key for sandboxing build artifacts.
These files are stored in the metadata bucket (default: google-rebuild-metadata
) under a path named with the invocation ID.
-
Rebuilt Artifact (
.whl
file): This is the actual package file produced by the build.- HTTP URL:
https://storage.googleapis.com/google-rebuild-metadata/ae2407af-473c-48fd-9267-93b6499c5972/absl_py-2.0.0-py3-none-any.whl
- HTTP URL:
-
Build Container Image (
image.tgz
): This is the gzipped tarball of the Docker container image that was used to perform the build. It contains the full state of the build environment after the build script has run.- HTTP URL:
https://storage.googleapis.com/google-rebuild-metadata/ae2407af-473c-48fd-9267-93b6499c5972/image.tgz
- HTTP URL:
-
Network Log (
netlog.json
): If the build was run with the network proxy enabled, this file contains a summary of all network requests made during the build.- HTTP URL:
https://storage.googleapis.com/google-rebuild-metadata/ae2407af-473c-48fd-9267-93b6499c5972/netlog.json
- HTTP URL:
-
Syscall Log (
tetragon.jsonl
): If syscall monitoring was enabled, this file contains a log of system call events.- HTTP URL:
https://storage.googleapis.com/google-rebuild-metadata/ae2407af-473c-48fd-9267-93b6499c5972/tetragon.jsonl
- HTTP URL:
The detailed, line-by-line build logs are stored in a separate logs bucket. Accessing them requires the GCB Build ID, which is different from the invocation ID.
Limitation: The GCB Build ID is stored in a debug file (info.json
) that is keyed by a RunID
. Unfortunately, the public attestation does not contain the RunID
, making it difficult for an external observer to link back to these logs.
However, if you were an operator with access to the system's Firestore database or debug storage, you could find the GCB Build ID. If you had it (let's call it <GCB_BUILD_ID>
), the URLs would be:
-
Raw Log File: Stored in the logs bucket (e.g.,
my-gcp-project_cloudbuild
).- HTTP URL:
https://storage.googleapis.com/<LOGS_BUCKET>/log-<GCB_BUILD_ID>.txt
- HTTP URL:
-
Cloud Build UI: The user-friendly view in the Google Cloud Console.
- URL:
https://console.cloud.google.com/cloud-build/builds/<GCB_BUILD_ID>?project=<PROJECT_ID>
- URL:
Here is a summary of the most relevant URLs you can access directly with the information you have:
Item | Default Bucket Name | URL |
---|---|---|
Rebuilt Artifact | google-rebuild-metadata |
https://storage.googleapis.com/google-rebuild-metadata/ae2407af-473c-48fd-9267-93b6499c5972/absl_py-2.0.0-py3-none-any.whl |
Container Image | google-rebuild-metadata |
https://storage.googleapis.com/google-rebuild-metadata/ae2407af-473c-48fd-9267-93b6499c5972/image.tgz |
You can use the gsutil
command to list all the files associated with this specific invocation ID:
gsutil ls -r gs://google-rebuild-metadata/ae2407af-473c-48fd-9267-93b6499c5972/**
This will confirm which of the byproduct files (rebuilt artifact, image, netlog, etc.) exist for this build.
562,155 input, 3,110 output, {"candidatesTokenCount": 1207, "cachedContentTokenCount": 557018, "promptTokensDetails": [{"modality": "TEXT", "tokenCount": 562155}], "cacheTokensDetails": [{"modality": "TEXT", "tokenCount": 557018}], "thoughtsTokenCount": 1903}