CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 676
Add semver_ord(num)
SQL function and version.semver_ord
column
#10763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this excellent PR. This will significantly simplify the work involving version sorting. 💪 💪 💪
I've left some non-blocking nitpicks, but everything is currently great! 👍
-- In JSONB a number has higher precedence than a string but in | ||
-- semver it is the other way around, so we use true/false to | ||
-- work around this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some references to facilitate the review. The ordering for jsonb, as excerpted from the official doc1, is as follows:
Object > Array > Boolean > Number > String > null
Object with n pairs > object with n - 1 pairs
Array with n elements > array with n - 1 elements
The SemVer's pre-release ordering, as excerpted from the spec2, is as follows:
Precedence for two pre-release versions with the same major, minor, and patch version MUST be determined by comparing each dot separated identifier from left to right until a difference is found as follows:
1. Identifiers consisting of only digits are compared numerically.
2. Identifiers with letters or hyphens are compared lexically in ASCII sort order.
3. Numeric identifiers always have lower precedence than non-numeric identifiers.
4. A larger set of pre-release fields has a higher precedence than a smaller set, if all of the preceding identifiers are equal.
Footnotes
@Gankra you know a lot about semver edge cases. do you want to try and break this before we actually start using the new column? :D |
ahahahaha I think as long as it handles |
Also dang I know that page induces NaNs (improved from "bootloops crates.io's client") but what a chaotic sort haha. |
it's looking good so far :) |
are you sure you were in semver sorting mode? https://crates.io/crates/cursed-trying-to-break-cargo/versions?sort=semver looks okayish to me 😅 |
Oh you're totally right, it's so rare to see date and semver not be 1:1 for crates I forgot date is the default :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PL/pgSQL is very rusty, but this LGTM. 👍
insta::assert_snapshot!(check("0.0.0").await, @r#"[0, 0, 0, {}]"#); | ||
insta::assert_snapshot!(check("1.0.0-alpha.1").await, @r#"[1, 0, 0, [true, "alpha", false, 1, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, ""]]"#); | ||
|
||
// see https://crates.io/crates/cursed-trying-to-break-cargo/1.0.0-0.HDTV-BluRay.1020p.YTSUB.L33TRip.mkv – thanks @Gankra! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love that this is represented as-is. 😆
-- A JSONB object has higher precedence than an array, and versions with | ||
-- prerelease specifiers should have lower precedence than those without. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went to check that we're not relying on any undocumented behaviour, and found this gem:
The btree ordering for jsonb datums is seldom of great interest[...]
Little did they know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, see #10763 (comment) :)
We currently rely on the Rust `semver` crate to implement our "sort by semantic versioning" functionality, that is used by web interface, but also to determine the "default version". This has the downside that we need to load the full list of version numbers for a crate from the database to the API server, sort it and then throw away the ones that we don't need. This commit implements a `semver_ord(num)` pgSQL function that returns a JSONB array, which has the same ordering precedence as the Semantic Versioning spec (https://semver.org/#spec-item-11), with the small caveat that it only supports up to 15 prerelease parts. The maximum number of prerelease parts in our current dataset is 7, so 15 should be plenty. The database migration in this commit also adds a new `semver_ord` column to the `versions` table, and an on-insert trigger function that automatically derives the `semver_ord` column from the `num` column value. Once this migration has run, the existing versions can be backfilled by running the following SQL script, until all versions are processed: ```sql with versions_to_update as ( select id, num from versions where semver_ord = 'null'::jsonb limit 1000 ) update versions set semver_ord = semver_ord(num) where id in (select id from versions_to_update); ```
-- with versions_to_update as ( | ||
-- select id, num | ||
-- from versions | ||
-- where semver_ord = 'null'::jsonb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that I didn't update this correctly. this line should now be where semver_ord is null
.
We currently rely on the Rust
semver
crate to implement our "sort by semantic versioning" functionality, that is used by web interface, but also to determine the "default version". This has the downside that we need to load the full list of version numbers for a crate from the database to the API server, sort it and then throw away the ones that we don't need.Ideally, we would be using https://pgxn.org/dist/semver/ or https://github.com/pgcentralfoundation/pgrx to solve this, but unfortunately most PostgreSQL hosters don't allow/support custom extensions, and it would also make local development a bit more challenging.
As a workaround, this PR implements a
semver_ord(num)
pgSQL function that returns a JSONB array, which has the same ordering precedence as the Semantic Versioning spec (https://semver.org/#spec-item-11), with the small caveat that it only supports up to 15 prerelease parts. The maximum number of prerelease parts in our current dataset is 7, so 15 should be plenty.The database migration in this commit also adds a new
semver_ord
column to theversions
table, and an on-insert trigger function that automatically derives thesemver_ord
column from thenum
column value:Once this migration has run, the existing versions can be backfilled by running the following SQL script, until all versions are processed:
This PR does not yet implement any code to actually use this new column, since it will need to be backfilled first. Once that has happened and verified to produce correct results, we can start to migrate our codebase to move the semver sorting into the database and potentially even calculate the "default version" directly in the database too.