| CARVIEW |
Select Language
HTTP/2 200
date: Mon, 29 Dec 2025 23:09:35 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"da598fe1bae14e1bd3c3cecc2c6fbcd2"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com github.githubassets.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com wss://alive-staging.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com marketplace-screenshots.githubusercontent.com/ copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com github.githubassets.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=%2BNhoncK%2B0Lmn%2FMVvI0N%2F7qzb9cyBrMOBQ2RAtmQOFlgBu0NXzf%2BbQj1AxdfT8QG1S3FHUcJqJm0ejD4cY9Fw%2BW%2F48%2BrkpTIPnhl4v87ym8cJaQ%2BXNJzgHH5j20fdZUoK2j5y35niV%2F%2Bo%2BRiEnqqL%2FpNzH58RMEvUNu%2F7XQKh6NsBtu9%2BIEuU%2F8GyADd1EYin9Uoz%2FcGJmBBeyJKjUDqcYvGEu%2FTs%2B04ptkHBVRL5tMazkTJKN6bqymsdQ%2BTvD914cP5GQ2%2Few6aPQHLajae3kA%3D%3D--5%2FXYacIRKuQcDsJJ--qu2ksln%2FUCF980e4JCW4dQ%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1274666881.1767049774; Path=/; Domain=github.com; Expires=Tue, 29 Dec 2026 23:09:34 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Tue, 29 Dec 2026 23:09:34 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: 8FB0:382597:68A1BD7:7D5C32F:69530A2E
Releases Β· huggingface/evaluate Β· GitHub
18 Sep 13:07
Loading
10 Jul 13:26
Loading
20 Jun 17:49
Loading
11 Sep 10:17
Loading
30 Apr 09:45
Loading
13 Oct 15:57
Loading
Loading
Loading
Loading
Loading
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 303
Releases: huggingface/evaluate
Releases Β· huggingface/evaluate
v0.4.6
a4fcb45
This commit was created on GitHub.com and signed with GitHubβs verified signature.
What's Changed
- Remove deprecated
HfFolderby @Wauplin in #701- this change adds support for
huggingface_hub>=1.0
- this change adds support for
- Update index.mdx by @meg-huggingface in #694
- Fix parity tests ci by @lhoestq in #696
- add leaderboards to docs by @burtenshaw in #697* Pin hfh in CI for updating repos by @lhoestq in #702
New Contributors
- @burtenshaw made their first contribution in #697
Full Changelog: v0.4.5...v0.4.6
Assets 2
v0.4.5
53b3324
This commit was created on GitHub.com and signed with GitHubβs verified signature.
Assets 2
v0.4.4
fab953d
This commit was created on GitHub.com and signed with GitHubβs verified signature.
Bug fixes
- support jiwer 4.0 by @lhoestq in #685
- Fix Perplexity Score For Tokenizers without bos_token_id by @kylehowells in #682
- Fix size attribute error for precision/recall/f1 by @Maxwell-Jia in #656
Other changes
- Add required hf_token secret to build main documentation by @albertvillanova in #635
- Pin numpy<2 as required by tensorflow to fix doc building by @albertvillanova in #631
- Support nltk>=3.9 to fix vulnerability by @albertvillanova in #629
- add tip in docs and readme referring to lighteval by @MoritzLaurer in #618
New Contributors
- @MoritzLaurer made their first contribution in #618
- @Maxwell-Jia made their first contribution in #656
- @kylehowells made their first contribution in #682
Full Changelog: v0.4.3...v0.4.4
Assets 2
0.4.3
5310084
This commit was created on GitHub.com and signed with GitHubβs verified signature.
This release adds support for datasets>=3.0 by removing calls to deprecated code
What's Changed
- Fix CI with temporary pin nltk<3.9 by @albertvillanova in #623
- Replace deprecated use_auth_token with token by @albertvillanova in #621
- remove ignore_url_params by @lhoestq in #624
Full Changelog: v0.4.2...v0.4.3
Assets 2
v0.4.2
a4bdc10
This commit was created on GitHub.com and signed with GitHubβs verified signature.
What's Changed
- Update the documentation and citation of mauve by @krishnap25 in #416
- Remove unused dependency by @daskol in #507
- Add confusion matrix by @osanseviero in #528
- Update python to 3.8 by @qubvel in #571
- Fix FileFreeLock by @lhoestq in #578
- Fix example doc in load function by @alexrs in #575
- Speeding up mean_iou metric computation by @qubvel in #569
New Contributors
- @rtrompier made their first contribution in #510
- @daskol made their first contribution in #507
- @qubvel made their first contribution in #571
- @alexrs made their first contribution in #575
Full Changelog: v0.4.1...v0.4.2
Assets 2
v0.4.1
87f7b37
This commit was created on GitHub.com and signed with GitHubβs verified signature.
The key has expired.
What's Changed
- Add code example to docstrings by @stevhliu in #374
- [Minor fix] Typo by @cakiki in #403
- [Docs] fixed a typo in bertscore readme by @hazrulakmal in #386
- Add max_length kwarg to docstring of Perplexity measurement by @kdutia in #411
- Fix minor typo in a_quick_tour.mdx by @tupini07 in #417
- Fix Docs base_evaluator.mdx by @jorahn in #418
- Update Gradio description to clarify text-based input by @BramVanroy in #427
- fix
addmethod by @hazrulakmal in #424 - Fix broken link in docs/a_quick_tour.mdx by @tupini07 in #419
- resolve #379 audio classification evaluator + docs by @Plutone11011 in #405
- fixed kwargs not being passed in combine by @Plutone11011 in #425
- add r^2 metric by @TKaanKoc in #407
- Update spaces gradio version to 3.19.1 by @BramVanroy in #426
- replace evaluate DownloadConfig with datasets by @lvwerra in #447
- Render Text2TextGenerationEvaluators' docstring examples by @mariosasko in #463
- Trigger CI on ci-* branches by @Wauplin in #467
- Update comet by @ricardorei in #443
- Fix
datasetsimport in Meteor metric by @mariosasko in #490 - fix scikit-learn package name suggestion by @bzz in #498
- Release: 0.4.1 by @lhoestq in #505
New Contributors
- @cakiki made their first contribution in #403
- @hazrulakmal made their first contribution in #386
- @kdutia made their first contribution in #411
- @tupini07 made their first contribution in #417
- @jorahn made their first contribution in #418
- @Plutone11011 made their first contribution in #405
- @TKaanKoc made their first contribution in #407
- @mariosasko made their first contribution in #463
- @Wauplin made their first contribution in #467
- @ricardorei made their first contribution in #443
- @bzz made their first contribution in #498
- @lhoestq made their first contribution in #505
Full Changelog: v0.4.0...v0.4.1
Assets 2
v0.4.0
What's Changed
- add trainer integration docs by @lvwerra in #325
- Stop using model-defined truncation in perplexity calculation by @mathemakitten in #333
- Don't use eval for Evaluator instances in the doc by @fxmarty in #341
- fix caching by @lvwerra in #336
- Fix #327 set default row of gradio webui to 1 and drop empty/blank row by @Raibows in #335
- Update pr docs actions by @mishig25 in #344
- Fix
scikit-learninstall in spaces by @lvwerra in #345 - added MASE, sMAPE and MAPE metrics by @kashif in #330
- fix sklearn dependency in mape, mase and smape by @lvwerra in #346
- Update link text by @stevhliu in #360
- Corrected range of MAE by @clefourrier in #359
- Revert "Update pr docs actions" by @mishig25 in #363
- Evaluation suite by @mathemakitten in #337
- Matthews correlation coefficient by @sanderland in #362
- fix tf version by @lvwerra in #372
- Add TextGeneration Evaluator by @NimaBoscarino in #350
- Fix typo in rouge types by @davebulaval in #364
- Add
Evaluateusage forscikit-learnby @awinml in #368 - Adding metric visualization by @sashavor in #342
- Add NIST metric by @BramVanroy in #250
- add GitHub Actions CI by @lvwerra in #375
- Add Evaluate Usage for Keras and Tensorflow by @arjunpatel7 in #370
- fix version by @lvwerra in #380
- CharacTER: MT metric by @BramVanroy in #286
- CharCut: another character-based MT evaluation metric by @BramVanroy in #290
- asr model evaluator addition + doc by @bayartsogt-ya in #378
- Docs for EvaluationSuite by @mathemakitten in #340
- Update the documentation of Mauve by @krishnap25 in #377
- fix-ci-badge by @lvwerra in #385
New Contributors
- @Raibows made their first contribution in #335
- @kashif made their first contribution in #330
- @clefourrier made their first contribution in #359
- @davebulaval made their first contribution in #364
- @awinml made their first contribution in #368
- @arjunpatel7 made their first contribution in #370
- @bayartsogt-ya made their first contribution in #378
- @krishnap25 made their first contribution in #377
Full Changelog: v0.3.0...v0.4.0
Assets 2
1 person reacted
v0.3.0
What's Changed
- add multilabel f1 eval usage by @fcakyon in #221
- Force get_supported_tasks() to return a list instead of dict keys by @mathemakitten in #227
- Unpin rouge_score by @albertvillanova in #220
- Remove import statement in Measurement Card by @meg-huggingface in #231
- make rouge support multi-ref by @lvwerra in #229
- Fix enforce string by @lvwerra in #230
- Fix examples in perplexity measurement docs by @mathemakitten in #238
- Add Wilcoxon's signed rank test by @douwekiela in #237
- Add support for two input columns for TextClassificationEvaluator by @fxmarty in #205
- fix bug in TEMPLATE_REQUIRE: add comma by @BramVanroy in #248
- Minor quicktour doc suggestions by @stevhliu in #236
- Clarify error message for ChrF no. references by @BramVanroy in #247
- only track unique missing dependencies by @BramVanroy in #246
- Update evaluate in spaces by @lvwerra in #228
- add
commit_hashto args by @lvwerra in #253 - Change perplexity to be calculated with base e by @mathemakitten in #242
- Rebase for previous PR by @mathemakitten in #254
- Fix docstrings with new perplexities with base e by @mathemakitten in #255
- add a tokenizer option to rouge by @lvwerra in #258
- Adding list_duplicates=True to example. by @meg-huggingface in #263
- Minor change in describing what this does. by @meg-huggingface in #267
- Mapping example output to returned output. by @meg-huggingface in #268
- Changes "duplicates_list" to "duplicates_dict" (since it's dict) by @meg-huggingface in #265
- Changes "duplicates_list" to "duplicates_dict" in the example. by @meg-huggingface in #264
- Add slow flag to two column parity test by @lvwerra in #273
- Remove
handle_impossible_answerfrom the defaultPIPELINE_KWARGSin the question answering evaluator by @fxmarty in #272 - Toxicity Measurement by @sashavor in #262
- Automatically choose dataset split if none provided by @mathemakitten in #232
- Fix YAML in Toxicity by @lvwerra in #278
- Added metric Brier Score by @kadirnar in #275
- Check for mismatch in device setup in evaluator by @mathemakitten in #287
- Fix transfomers import in the evaluator by @mathemakitten in #291
- Add support for name field when loading data by @mathemakitten in #283
- Adding regard measurement by @sashavor in #271
- Raise exception instead of assert in BertScore by @BramVanroy in #292
- fix regard yaml by @lvwerra in #295
- Add CONTRIBUTING.md by @mathemakitten in #293
- Refactor kwargs and configs by @lvwerra in #188
- Revert "Refactor kwargs and configs" by @lvwerra in #299
- Add missing
splitandsubsetkwarg into other evaluators by @mathemakitten in #301 - Adding HONEST score by @sashavor in #279
- fix wrong sorting in check by @sanderland in #305
- Fix HONEST yaml by @lvwerra in #303
- Refactor current_features to selected_feature_format by @mathemakitten in #306
- replace datasets list with local list of tasks by @lvwerra in #309
- Adding torch to the requirements by @sashavor in #311
- Honest space fix by @sashavor in #312
- Use HTML relative paths for tiles by @lewtun in #318
- Test for valid YAML files by @mathemakitten in #308
- add versioning the
HubEvaluationModuleFactoryby @lvwerra in #314 - Add text2text evaluator by @lvwerra in #261
- try main if tag does not work by @lvwerra in #322
New Contributors
- @fcakyon made their first contribution in #221
- @meg-huggingface made their first contribution in #231
- @stevhliu made their first contribution in #236
- @kadirnar made their first contribution in #275
- @sanderland made their first contribution in #305
Full Changelog: v0.2.2...v0.3.0
Assets 2
3 people reacted
v0.2.2
What's Changed
- Update CLI docs by @lvwerra in #218
- Add a fingerprint for each EvaluationModule by @mathemakitten in #206
- Fix loading error by @lvwerra in #222
Full Changelog: v0.2.1...v0.2.2
Assets 2
2 people reacted
v0.2.1
What's Changed
- Add measurements to quality and style checks by @lvwerra in #203
- Add comparisons and measurements to code quality tests by @lvwerra in #204
- Remove mention to datasets from docs by @albertvillanova in #207
- Adding label distribution measurement by @sashavor in #202
- Fix spaces tagging by @lvwerra in #217
- set datasets to >=2.0.0 by @lvwerra in #216
Full Changelog: v0.2.0...v0.2.1
Assets 2
Previous Next
You canβt perform that action at this time.