CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 07:39:50 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"8864b917fece37c7c10ffc58f2d73184"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=ZJWlEG6bD1FkYS5W%2FJdw5kSRGq1WXmJ0HWi78Tsl%2F3BBa2hex3GbCgOwKVGa0ifEOXTlb8amLe%2BVVP3M2hSueR%2F%2BNAHx0q1DERTiOXHeYxqE5fYNm4c24HzWsYLUxTdlvmntNZcj5fHRbqYMdAcfwspuxiBRxmYP4kvl3tOaCXAlBcedlKHmn7GmuteATpDVbiqFUfWeOKh30gaVbtRhw%2Br%2Bf4AuErpCbQwBeKnvusYbeHyDLJ4NKN7Fyw1ATGS5xXK76Kt1ENeeH4wz5lM0UA%3D%3D--oa2tRjmLjQBgox1T--m1VND%2F8l4znkQDh0%2Fn8TWg%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.112164690.1753256389; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 07:39:49 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 07:39:49 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: EB72:26221E:660A6E:80129A:688091C5
Releases · huggingface/evaluate · GitHub
10 Jul 13:26
Loading
20 Jun 17:49
Loading
11 Sep 10:17
Loading
30 Apr 09:45
Loading
13 Oct 15:57
Loading
13 Dec 13:35
Loading
13 Oct 13:04
Loading
29 Jul 14:58
Loading
28 Jul 13:13
Loading
25 Jul 14:34
Loading
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 286
Releases: huggingface/evaluate
Releases · huggingface/evaluate
v0.4.5
53b3324
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Assets 2
v0.4.4
fab953d
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Bug fixes
- support jiwer 4.0 by @lhoestq in #685
- Fix Perplexity Score For Tokenizers without bos_token_id by @kylehowells in #682
- Fix size attribute error for precision/recall/f1 by @Maxwell-Jia in #656
Other changes
- Add required hf_token secret to build main documentation by @albertvillanova in #635
- Pin numpy<2 as required by tensorflow to fix doc building by @albertvillanova in #631
- Support nltk>=3.9 to fix vulnerability by @albertvillanova in #629
- add tip in docs and readme referring to lighteval by @MoritzLaurer in #618
New Contributors
- @MoritzLaurer made their first contribution in #618
- @Maxwell-Jia made their first contribution in #656
- @kylehowells made their first contribution in #682
Full Changelog: v0.4.3...v0.4.4
Assets 2
0.4.3
5310084
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
This release adds support for datasets>=3.0
by removing calls to deprecated code
What's Changed
- Fix CI with temporary pin nltk<3.9 by @albertvillanova in #623
- Replace deprecated use_auth_token with token by @albertvillanova in #621
- remove ignore_url_params by @lhoestq in #624
Full Changelog: v0.4.2...v0.4.3
Assets 2
v0.4.2
a4bdc10
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
What's Changed
- Update the documentation and citation of mauve by @krishnap25 in #416
- Remove unused dependency by @daskol in #507
- Add confusion matrix by @osanseviero in #528
- Update python to 3.8 by @qubvel in #571
- Fix FileFreeLock by @lhoestq in #578
- Fix example doc in load function by @alexrs in #575
- Speeding up mean_iou metric computation by @qubvel in #569
New Contributors
- @rtrompier made their first contribution in #510
- @daskol made their first contribution in #507
- @qubvel made their first contribution in #571
- @alexrs made their first contribution in #575
Full Changelog: v0.4.1...v0.4.2
Assets 2
v0.4.1
87f7b37
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Compare
What's Changed
- Add code example to docstrings by @stevhliu in #374
- [Minor fix] Typo by @cakiki in #403
- [Docs] fixed a typo in bertscore readme by @hazrulakmal in #386
- Add max_length kwarg to docstring of Perplexity measurement by @kdutia in #411
- Fix minor typo in a_quick_tour.mdx by @tupini07 in #417
- Fix Docs base_evaluator.mdx by @jorahn in #418
- Update Gradio description to clarify text-based input by @BramVanroy in #427
- fix
add
method by @hazrulakmal in #424 - Fix broken link in docs/a_quick_tour.mdx by @tupini07 in #419
- resolve #379 audio classification evaluator + docs by @Plutone11011 in #405
- fixed kwargs not being passed in combine by @Plutone11011 in #425
- add r^2 metric by @TKaanKoc in #407
- Update spaces gradio version to 3.19.1 by @BramVanroy in #426
- replace evaluate DownloadConfig with datasets by @lvwerra in #447
- Render Text2TextGenerationEvaluators' docstring examples by @mariosasko in #463
- Trigger CI on ci-* branches by @Wauplin in #467
- Update comet by @ricardorei in #443
- Fix
datasets
import in Meteor metric by @mariosasko in #490 - fix scikit-learn package name suggestion by @bzz in #498
- Release: 0.4.1 by @lhoestq in #505
New Contributors
- @cakiki made their first contribution in #403
- @hazrulakmal made their first contribution in #386
- @kdutia made their first contribution in #411
- @tupini07 made their first contribution in #417
- @jorahn made their first contribution in #418
- @Plutone11011 made their first contribution in #405
- @TKaanKoc made their first contribution in #407
- @mariosasko made their first contribution in #463
- @Wauplin made their first contribution in #467
- @ricardorei made their first contribution in #443
- @bzz made their first contribution in #498
- @lhoestq made their first contribution in #505
Full Changelog: v0.4.0...v0.4.1
Assets 2
v0.4.0
Compare
What's Changed
- add trainer integration docs by @lvwerra in #325
- Stop using model-defined truncation in perplexity calculation by @mathemakitten in #333
- Don't use eval for Evaluator instances in the doc by @fxmarty in #341
- fix caching by @lvwerra in #336
- Fix #327 set default row of gradio webui to 1 and drop empty/blank row by @Raibows in #335
- Update pr docs actions by @mishig25 in #344
- Fix
scikit-learn
install in spaces by @lvwerra in #345 - added MASE, sMAPE and MAPE metrics by @kashif in #330
- fix sklearn dependency in mape, mase and smape by @lvwerra in #346
- Update link text by @stevhliu in #360
- Corrected range of MAE by @clefourrier in #359
- Revert "Update pr docs actions" by @mishig25 in #363
- Evaluation suite by @mathemakitten in #337
- Matthews correlation coefficient by @sanderland in #362
- fix tf version by @lvwerra in #372
- Add TextGeneration Evaluator by @NimaBoscarino in #350
- Fix typo in rouge types by @davebulaval in #364
- Add
Evaluate
usage forscikit-learn
by @awinml in #368 - Adding metric visualization by @sashavor in #342
- Add NIST metric by @BramVanroy in #250
- add GitHub Actions CI by @lvwerra in #375
- Add Evaluate Usage for Keras and Tensorflow by @arjunpatel7 in #370
- fix version by @lvwerra in #380
- CharacTER: MT metric by @BramVanroy in #286
- CharCut: another character-based MT evaluation metric by @BramVanroy in #290
- asr model evaluator addition + doc by @bayartsogt-ya in #378
- Docs for EvaluationSuite by @mathemakitten in #340
- Update the documentation of Mauve by @krishnap25 in #377
- fix-ci-badge by @lvwerra in #385
New Contributors
- @Raibows made their first contribution in #335
- @kashif made their first contribution in #330
- @clefourrier made their first contribution in #359
- @davebulaval made their first contribution in #364
- @awinml made their first contribution in #368
- @arjunpatel7 made their first contribution in #370
- @bayartsogt-ya made their first contribution in #378
- @krishnap25 made their first contribution in #377
Full Changelog: v0.3.0...v0.4.0
Assets 2
1 person reacted
v0.3.0
Compare
What's Changed
- add multilabel f1 eval usage by @fcakyon in #221
- Force get_supported_tasks() to return a list instead of dict keys by @mathemakitten in #227
- Unpin rouge_score by @albertvillanova in #220
- Remove import statement in Measurement Card by @meg-huggingface in #231
- make rouge support multi-ref by @lvwerra in #229
- Fix enforce string by @lvwerra in #230
- Fix examples in perplexity measurement docs by @mathemakitten in #238
- Add Wilcoxon's signed rank test by @douwekiela in #237
- Add support for two input columns for TextClassificationEvaluator by @fxmarty in #205
- fix bug in TEMPLATE_REQUIRE: add comma by @BramVanroy in #248
- Minor quicktour doc suggestions by @stevhliu in #236
- Clarify error message for ChrF no. references by @BramVanroy in #247
- only track unique missing dependencies by @BramVanroy in #246
- Update evaluate in spaces by @lvwerra in #228
- add
commit_hash
to args by @lvwerra in #253 - Change perplexity to be calculated with base e by @mathemakitten in #242
- Rebase for previous PR by @mathemakitten in #254
- Fix docstrings with new perplexities with base e by @mathemakitten in #255
- add a tokenizer option to rouge by @lvwerra in #258
- Adding list_duplicates=True to example. by @meg-huggingface in #263
- Minor change in describing what this does. by @meg-huggingface in #267
- Mapping example output to returned output. by @meg-huggingface in #268
- Changes "duplicates_list" to "duplicates_dict" (since it's dict) by @meg-huggingface in #265
- Changes "duplicates_list" to "duplicates_dict" in the example. by @meg-huggingface in #264
- Add slow flag to two column parity test by @lvwerra in #273
- Remove
handle_impossible_answer
from the defaultPIPELINE_KWARGS
in the question answering evaluator by @fxmarty in #272 - Toxicity Measurement by @sashavor in #262
- Automatically choose dataset split if none provided by @mathemakitten in #232
- Fix YAML in Toxicity by @lvwerra in #278
- Added metric Brier Score by @kadirnar in #275
- Check for mismatch in device setup in evaluator by @mathemakitten in #287
- Fix transfomers import in the evaluator by @mathemakitten in #291
- Add support for name field when loading data by @mathemakitten in #283
- Adding regard measurement by @sashavor in #271
- Raise exception instead of assert in BertScore by @BramVanroy in #292
- fix regard yaml by @lvwerra in #295
- Add CONTRIBUTING.md by @mathemakitten in #293
- Refactor kwargs and configs by @lvwerra in #188
- Revert "Refactor kwargs and configs" by @lvwerra in #299
- Add missing
split
andsubset
kwarg into other evaluators by @mathemakitten in #301 - Adding HONEST score by @sashavor in #279
- fix wrong sorting in check by @sanderland in #305
- Fix HONEST yaml by @lvwerra in #303
- Refactor current_features to selected_feature_format by @mathemakitten in #306
- replace datasets list with local list of tasks by @lvwerra in #309
- Adding torch to the requirements by @sashavor in #311
- Honest space fix by @sashavor in #312
- Use HTML relative paths for tiles by @lewtun in #318
- Test for valid YAML files by @mathemakitten in #308
- add versioning the
HubEvaluationModuleFactory
by @lvwerra in #314 - Add text2text evaluator by @lvwerra in #261
- try main if tag does not work by @lvwerra in #322
New Contributors
- @fcakyon made their first contribution in #221
- @meg-huggingface made their first contribution in #231
- @stevhliu made their first contribution in #236
- @kadirnar made their first contribution in #275
- @sanderland made their first contribution in #305
Full Changelog: v0.2.2...v0.3.0
Assets 2
3 people reacted
v0.2.2
Compare
What's Changed
- Update CLI docs by @lvwerra in #218
- Add a fingerprint for each EvaluationModule by @mathemakitten in #206
- Fix loading error by @lvwerra in #222
Full Changelog: v0.2.1...v0.2.2
Assets 2
2 people reacted
v0.2.1
Compare
What's Changed
- Add measurements to quality and style checks by @lvwerra in #203
- Add comparisons and measurements to code quality tests by @lvwerra in #204
- Remove mention to datasets from docs by @albertvillanova in #207
- Adding label distribution measurement by @sashavor in #202
- Fix spaces tagging by @lvwerra in #217
- set datasets to >=2.0.0 by @lvwerra in #216
Full Changelog: v0.2.0...v0.2.1
Assets 2
v0.2.0
Compare
What's New
evaluator
The evaluator
has been extended to three new tasks:
"image-classification"
"token-classification"
"question-answering"
combine
With combine
one can bundle several metrics into a single object that can be evaluated in one call and also used in combination with the evalutor
.
What's Changed
- Fix typo in WER docs by @pn11 in #147
- Fix rouge outputs by @lvwerra in #158
- add tutorial for custom pipeline by @lvwerra in #154
- refactor
evaluator
tests by @lvwerra in #155 - rename
input_texts
topredictions
in perplexity by @lvwerra in #157 - Add link to GitHub author by @lewtun in #166
- Add
combine
to compose multiple evaluations by @lvwerra in #150 - test string casting only on first element by @lvwerra in #159
- remove unused fixtures from unittests by @lvwerra in #170
- Add a test to check that Evaluator evaluations match transformers examples by @fxmarty in #163
- Add smaller model for
TextClassificationEvaluator
test by @fxmarty in #172 - Add tags to spaces by @lvwerra in #162
- Rename evaluation modules by @lvwerra in #160
- Update push_evaluations_to_hub.py by @lvwerra in #174
- update evaluate dependency for spaces by @lvwerra in #175
- Add
ImageClassificationEvaluator
by @fxmarty in #173 - attempting to let meteor handle multiple references per prediction by @sashavor in #164
- fixed duplicate calculation of spearmanr function in metrics wrapper. by @benlipkin in #176
- forbid hyphens in template for module names by @lvwerra in #177
- switch from Github to Hub module factory for canonical modules by @lvwerra in #180
- Fix bertscore idf by @lvwerra in #183
- refactor evaluator base and task classes by @lvwerra in #185
- Avoid importing tensorflow when importing evaluate by @NouamaneTazi in #135
- Add QuestionAnsweringEvaluator by @fxmarty in #179
- Evaluator perf by @ola13 in #178
- Fix QuestionAnsweringEvaluator for squad v2, fix examples by @fxmarty in #190
- Rename perf metric evaluator by @lvwerra in #191
- Fix typos in QA Evaluator by @lewtun in #192
- Evaluator device placement by @lvwerra in #193
- Change test command in installation.mdx to use exact_match by @mathemakitten in #194
- Add
TokenClassificationEvaluator
by @fxmarty in #167 - Pin rouge_score by @albertvillanova in #197
- add poseval by @lvwerra in #195
- Combine docs by @lvwerra in #201
- Evaluator column loading by @lvwerra in #200
- Evaluator documentation by @lvwerra in #199
New Contributors
- @pn11 made their first contribution in #147
- @fxmarty made their first contribution in #163
- @benlipkin made their first contribution in #176
- @NouamaneTazi made their first contribution in #135
- @mathemakitten made their first contribution in #194
Full Changelog: v0.1.2...v0.2.0
Assets 2
Previous Next
You can’t perform that action at this time.