Secret reporting endpoint

## What's the problem this feature will solve?

As part of our [ongoing collaboration](https://blog.gitguardian.com/uncovering-thousands-of-unique-secrets-in-pypi-packages/) to find exposed secrets in PyPI packages, we are working on a scanning pipeline that automatically scans newly released packages. In order to report our findings, we will need an endpoint we can call, with an agreed-upon schema.

## Describe the solution you'd like

### Schema

Ideally, the endpoint’s payload would be on a per artifact basis, allowing us to include metadata about the artifact alongside the list of secrets that were found. Here is a possible schema for the payload.

```jsx
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Artifact scanning report",
    "description": "The detail of all the findings for a given artifact",
    "type": "object",
    "required": [
        "release",
        "scan_info",
        "scan_results"
    ],
    "properties": {
        "release": {
            "type": "object",
            "required": [
                "title",
                "package_name",
                "version"
            ],
            "properties": {
                "title": {
                    "type": "string",
                    "examples": [
                        "ggshield 1.0.2"
                    ]
                },
                "package_name": {
                    "type": "string",
                    "examples": [
                        "ggshield"
                    ]
                },
                "version": {
                    "type": "string",
                    "examples": [
                        "1.0.2"
                    ]
                }
            }
        },
        "scan_info": {
            "type": "object",
            "required": [
                "scanner_version",
                "scanned_at"
            ],
            "properties": {
                "scanner_version": {
                    "type": "string",
                    "examples": [
                        "2.99.0"
                    ]
                },
                "scanned_at": {
                    "type": "date-time",
                    "examples": [
                        "2023-11-16T17:10:25Z"
                    ]
                }
            }
        },
        "scan_results": {
            "type": "array",
            "items": {
                "type": "object",
                "required": [
                    "artifact",
                    "secrets"
                ],
                "properties": {
                    "artifact": {
                        "type": "object",
                        "required": [
                            "name",
                            "sha256_digest"
                        ],
                        "properties": {
                            "name": {
                                "type": "string",
                                "examples": [
                                    "ggshield-1.0.2.zip"
                                ]
                            },
                            "sha256_digest": {
                                "type": "string",
                                "examples": [
                                    "13550350a8681c84c861aac2e5b440161c2b33a3e4f302ac680ca5b686de48de"
                                ]
                            }
                        }
                    },
                    "secrets": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "required": [
                                "detector_name",
                                "detector_display_name",
                                "company_name",
                                "filepath",
                                "matches",
                                "validity_status"
                            ],
                            "properties": {
                                "detector_name": {
                                    "type": "string",
                                    "examples": [
                                        "google_aiza"
                                    ]
                                },
                                "detector_display_name": {
                                    "type": "string",
                                    "examples": [
                                        "Google API Key"
                                    ]
                                },
                                "company_name": {
                                    "type": "string",
                                    "examples": [
                                        "Google"
                                    ]
                                },
                                "documentation_url": {
                                    "type": "uri",
                                    "examples": [
                                        "https://docs.gg.com/google_aiza"
                                    ]
                                },
                                "filepath": {
                                    "type": "string",
                                    "examples": [
                                        "/ggshield/connect/google.py"
                                    ]
                                },
                                "matches": {
                                    "type": "array",
                                    "items": {
                                        "type": "object",
                                        "required": [
                                            "match_name",
                                            "index_start",
                                            "index_end"
                                        ],
                                        "properties": {
                                            "match_name": {
                                                "type": "string",
                                                "examples": [
                                                    "apikey"
                                                ]
                                            },
                                            "index_start": {
                                                "type": "integer",
                                                "examples": [
                                                    12
                                                ]
                                            },
                                            "index_end": {
                                                "type": "integer",
                                                "examples": [
                                                    32
                                                ]
                                            }
                                        }
                                    }
                                },
                                "validity_status": {
                                    "type": "string",
                                    "enum": [
                                        "NO_CHECKER",
                                        "FAILED_TO_CHECK",
                                        "VALID",
                                        "INVALID"
                                    ],
                                    "examples": [
                                        "VALID"
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "examples": [
        {
            "release": {
                "title": "ggshield 1.0.2",
                "package_name": "ggshield",
                "version": "1.0.2"
            },
            "scan_info": {
                "scanner_version": "2.99.0",
                "scanned_at": "2023-11-16T17:10:25Z"
            },
            "scan_results": [
                {
                    "artifact": {
                        "name": "ggshield-1.0.2.zip",
                        "sha256_digest": "13550350a8681c84c861aac2e5b440161c2b33a3e4f302ac680ca5b686de48de"
                    },
                    "secrets": [
                        {
                            "detector_name": "google_aiza",
                            "detector_display_name": "Google API Key",
                            "company_name": "Google",
                            "documentation_url": "https://docs.gg.com/google_aiza",
                            "filepath": "/ggshield/connect/google.py",
                            "matches": [
                                {
                                    "match_name": "apikey",
                                    "index_start": 12,
                                    "index_end": 32
                                }
                            ],
                            "validity_status": "VALID"
                        }
                    ]
                }
            ]
        }
    ]
}

```

### Response

We do not expect the endpoint to return any data, we just need to be able to distinguish between a successful call and one that fails: standard status codes should be more than enough.

### API versioning

We have no strong requirement on this point, and will be fine with whichever solution you choose for the versioning of the schema.

### Call volume and rate limiting

Since we are planning to call the endpoint once per artifact in which we find secrets, the worst case would be that we find secrets in **every single artifact**. In that case, our volume of calls would be directly proportional to the number of releases. We consequently don’t expect our volume of calls to be such as to restricted by rate limiting.

### Authentication

This endpoint should not be publicly available. A possible approach would be to use both authentication via a secret (ideally just an API key) and an IP allowlist, to guarantee that only known entities have access to the endpoint.

### Remediation

In the case of prolonged downtime of the endpoint, we won’t be able to upload our findings. They will be persisted on our end, and can be re-uploaded at a later point. We do not plan to have a way to automate this: this will be done “manually”, on an ad-hoc fashion.

We would also probably need to have an automated way of revoking / renewing our own API key, to be able to remediate any leak on our end immediately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Secret reporting endpoint #14961

What's the problem this feature will solve?

Describe the solution you'd like

Schema

Response

API versioning

Call volume and rate limiting

Authentication

Remediation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Secret reporting endpoint #14961

Description

What's the problem this feature will solve?

Describe the solution you'd like

Schema

Response

API versioning

Call volume and rate limiting

Authentication

Remediation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions