Carview!

@Wauplin

What's Changed

Remove deprecated HfFolder by @Wauplin in #701
- this change adds support for huggingface_hub>=1.0
Update index.mdx by @meg-huggingface in #694
Fix parity tests ci by @lhoestq in #696
add leaderboards to docs by @burtenshaw in #697* Pin hfh in CI for updating repos by @lhoestq in #702

New Contributors

@burtenshaw made their first contribution in #697

Full Changelog: v0.4.5...v0.4.6

@lhoestq

What's Changed

Support datasets 4 by @lhoestq in #689

Full Changelog: v0.4.4...v0.4.5

@lhoestq

Bug fixes

support jiwer 4.0 by @lhoestq in #685
Fix Perplexity Score For Tokenizers without bos_token_id by @kylehowells in #682
Fix size attribute error for precision/recall/f1 by @Maxwell-Jia in #656

Other changes

Add required hf_token secret to build main documentation by @albertvillanova in #635
Pin numpy<2 as required by tensorflow to fix doc building by @albertvillanova in #631
Support nltk>=3.9 to fix vulnerability by @albertvillanova in #629
add tip in docs and readme referring to lighteval by @MoritzLaurer in #618

New Contributors

@MoritzLaurer made their first contribution in #618
@Maxwell-Jia made their first contribution in #656
@kylehowells made their first contribution in #682

Full Changelog: v0.4.3...v0.4.4

@albertvillanova

This release adds support for datasets>=3.0 by removing calls to deprecated code

What's Changed

Fix CI with temporary pin nltk<3.9 by @albertvillanova in #623
Replace deprecated use_auth_token with token by @albertvillanova in #621
remove ignore_url_params by @lhoestq in #624

Full Changelog: v0.4.2...v0.4.3

@krishnap25

What's Changed

Update the documentation and citation of mauve by @krishnap25 in #416
Remove unused dependency by @daskol in #507
Add confusion matrix by @osanseviero in #528
Update python to 3.8 by @qubvel in #571
Fix FileFreeLock by @lhoestq in #578
Fix example doc in load function by @alexrs in #575
Speeding up mean_iou metric computation by @qubvel in #569

New Contributors

@rtrompier made their first contribution in #510
@daskol made their first contribution in #507
@qubvel made their first contribution in #571
@alexrs made their first contribution in #575

Full Changelog: v0.4.1...v0.4.2

@stevhliu

What's Changed

Add code example to docstrings by @stevhliu in #374
[Minor fix] Typo by @cakiki in #403
[Docs] fixed a typo in bertscore readme by @hazrulakmal in #386
Add max_length kwarg to docstring of Perplexity measurement by @kdutia in #411
Fix minor typo in a_quick_tour.mdx by @tupini07 in #417
Fix Docs base_evaluator.mdx by @jorahn in #418
Update Gradio description to clarify text-based input by @BramVanroy in #427
fix add method by @hazrulakmal in #424
Fix broken link in docs/a_quick_tour.mdx by @tupini07 in #419
resolve #379 audio classification evaluator + docs by @Plutone11011 in #405
fixed kwargs not being passed in combine by @Plutone11011 in #425
add r^2 metric by @TKaanKoc in #407
Update spaces gradio version to 3.19.1 by @BramVanroy in #426
replace evaluate DownloadConfig with datasets by @lvwerra in #447
Render Text2TextGenerationEvaluators' docstring examples by @mariosasko in #463
Trigger CI on ci-* branches by @Wauplin in #467
Update comet by @ricardorei in #443
Fix datasets import in Meteor metric by @mariosasko in #490
fix scikit-learn package name suggestion by @bzz in #498
Release: 0.4.1 by @lhoestq in #505

New Contributors

@cakiki made their first contribution in #403
@hazrulakmal made their first contribution in #386
@kdutia made their first contribution in #411
@tupini07 made their first contribution in #417
@jorahn made their first contribution in #418
@Plutone11011 made their first contribution in #405
@TKaanKoc made their first contribution in #407
@mariosasko made their first contribution in #463
@Wauplin made their first contribution in #467
@ricardorei made their first contribution in #443
@bzz made their first contribution in #498
@lhoestq made their first contribution in #505

Full Changelog: v0.4.0...v0.4.1

@lvwerra

What's Changed

add trainer integration docs by @lvwerra in #325
Stop using model-defined truncation in perplexity calculation by @mathemakitten in #333
Don't use eval for Evaluator instances in the doc by @fxmarty in #341
fix caching by @lvwerra in #336
Fix #327 set default row of gradio webui to 1 and drop empty/blank row by @Raibows in #335
Update pr docs actions by @mishig25 in #344
Fix scikit-learn install in spaces by @lvwerra in #345
added MASE, sMAPE and MAPE metrics by @kashif in #330
fix sklearn dependency in mape, mase and smape by @lvwerra in #346
Update link text by @stevhliu in #360
Corrected range of MAE by @clefourrier in #359
Revert "Update pr docs actions" by @mishig25 in #363
Evaluation suite by @mathemakitten in #337
Matthews correlation coefficient by @sanderland in #362
fix tf version by @lvwerra in #372
Add TextGeneration Evaluator by @NimaBoscarino in #350
Fix typo in rouge types by @davebulaval in #364
Add Evaluate usage for scikit-learn by @awinml in #368
Adding metric visualization by @sashavor in #342
Add NIST metric by @BramVanroy in #250
add GitHub Actions CI by @lvwerra in #375
Add Evaluate Usage for Keras and Tensorflow by @arjunpatel7 in #370
fix version by @lvwerra in #380
CharacTER: MT metric by @BramVanroy in #286
CharCut: another character-based MT evaluation metric by @BramVanroy in #290
asr model evaluator addition + doc by @bayartsogt-ya in #378
Docs for EvaluationSuite by @mathemakitten in #340
Update the documentation of Mauve by @krishnap25 in #377
fix-ci-badge by @lvwerra in #385

New Contributors

@Raibows made their first contribution in #335
@kashif made their first contribution in #330
@clefourrier made their first contribution in #359
@davebulaval made their first contribution in #364
@awinml made their first contribution in #368
@arjunpatel7 made their first contribution in #370
@bayartsogt-ya made their first contribution in #378
@krishnap25 made their first contribution in #377

Full Changelog: v0.3.0...v0.4.0

@fcakyon

What's Changed

add multilabel f1 eval usage by @fcakyon in #221
Force get_supported_tasks() to return a list instead of dict keys by @mathemakitten in #227
Unpin rouge_score by @albertvillanova in #220
Remove import statement in Measurement Card by @meg-huggingface in #231
make rouge support multi-ref by @lvwerra in #229
Fix enforce string by @lvwerra in #230
Fix examples in perplexity measurement docs by @mathemakitten in #238
Add Wilcoxon's signed rank test by @douwekiela in #237
Add support for two input columns for TextClassificationEvaluator by @fxmarty in #205
fix bug in TEMPLATE_REQUIRE: add comma by @BramVanroy in #248
Minor quicktour doc suggestions by @stevhliu in #236
Clarify error message for ChrF no. references by @BramVanroy in #247
only track unique missing dependencies by @BramVanroy in #246
Update evaluate in spaces by @lvwerra in #228
add commit_hash to args by @lvwerra in #253
Change perplexity to be calculated with base e by @mathemakitten in #242
Rebase for previous PR by @mathemakitten in #254
Fix docstrings with new perplexities with base e by @mathemakitten in #255
add a tokenizer option to rouge by @lvwerra in #258
Adding list_duplicates=True to example. by @meg-huggingface in #263
Minor change in describing what this does. by @meg-huggingface in #267
Mapping example output to returned output. by @meg-huggingface in #268
Changes "duplicates_list" to "duplicates_dict" (since it's dict) by @meg-huggingface in #265
Changes "duplicates_list" to "duplicates_dict" in the example. by @meg-huggingface in #264
Add slow flag to two column parity test by @lvwerra in #273
Remove handle_impossible_answer from the default PIPELINE_KWARGS in the question answering evaluator by @fxmarty in #272
Toxicity Measurement by @sashavor in #262
Automatically choose dataset split if none provided by @mathemakitten in #232
Fix YAML in Toxicity by @lvwerra in #278
Added metric Brier Score by @kadirnar in #275
Check for mismatch in device setup in evaluator by @mathemakitten in #287
Fix transfomers import in the evaluator by @mathemakitten in #291
Add support for name field when loading data by @mathemakitten in #283
Adding regard measurement by @sashavor in #271
Raise exception instead of assert in BertScore by @BramVanroy in #292
fix regard yaml by @lvwerra in #295
Add CONTRIBUTING.md by @mathemakitten in #293
Refactor kwargs and configs by @lvwerra in #188
Revert "Refactor kwargs and configs" by @lvwerra in #299
Add missing split and subset kwarg into other evaluators by @mathemakitten in #301
Adding HONEST score by @sashavor in #279
fix wrong sorting in check by @sanderland in #305
Fix HONEST yaml by @lvwerra in #303
Refactor current_features to selected_feature_format by @mathemakitten in #306
replace datasets list with local list of tasks by @lvwerra in #309
Adding torch to the requirements by @sashavor in #311
Honest space fix by @sashavor in #312
Use HTML relative paths for tiles by @lewtun in #318
Test for valid YAML files by @mathemakitten in #308
add versioning the HubEvaluationModuleFactory by @lvwerra in #314
Add text2text evaluator by @lvwerra in #261
try main if tag does not work by @lvwerra in #322

New Contributors

@fcakyon made their first contribution in #221
@meg-huggingface made their first contribution in #231
@stevhliu made their first contribution in #236
@kadirnar made their first contribution in #275
@sanderland made their first contribution in #305

Full Changelog: v0.2.2...v0.3.0

@lvwerra

What's Changed

Update CLI docs by @lvwerra in #218
Add a fingerprint for each EvaluationModule by @mathemakitten in #206
Fix loading error by @lvwerra in #222

Full Changelog: v0.2.1...v0.2.2

@lvwerra

What's Changed

Add measurements to quality and style checks by @lvwerra in #203
Add comparisons and measurements to code quality tests by @lvwerra in #204
Remove mention to datasets from docs by @albertvillanova in #207
Adding label distribution measurement by @sashavor in #202
Fix spaces tagging by @lvwerra in #217
set datasets to >=2.0.0 by @lvwerra in #216

Full Changelog: v0.2.0...v0.2.1

Releases: huggingface/evaluate

v0.4.6

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.5

What's Changed

Contributors

Uh oh!

v0.4.4

Bug fixes

Other changes

New Contributors

Contributors

Uh oh!

0.4.3

What's Changed

Contributors

Uh oh!

v0.4.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.2

What's Changed

Contributors

Uh oh!

v0.2.1

What's Changed

Contributors

Uh oh!