You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
random-100 through random-104: five random scores for comparison purposes (independent uniform random samples)
Thanks to the respective authors for their contributions, please make sure to cite them if you use the scores!
A few notes regarding the scores:
While different scores have different ranges, they are all structured in a way such that high scores indicate keeping an example is beneficial (according to a certain metric). This is the natural structure for some metrics, and perhaps counter-intuitive for some scores which had to be reversed for consistency reasons. For instance, the prototype scores are based on Cosine similarity which is, in its raw form, high for high similarity (= high redundancy); but we then reversed the scores for consistency with the other scores such that now high scores mean low Cosine similarity, i.e. keeping the example is good because there is reduced redundancy.
active-learning is essentially a binary score - the method selects examples such that 80% of ImageNet is used. To adapt it for our purposes, we assigned a value of either 1 or 0 depending on whether the images were included in this subset. This means that the active learning scores here aren't calibrated for pruning more / less than 20% of ImageNet.
About
Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)