You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lemon dataset has been prepared to investigate the possibilities to tackle the issue of fruit quality control. It contains 2690 annotated images (1056 x 1056 pixels). Raw lemon images have been captured using the procedure described in the following blogpost and manually annotated using CVAT.
Here's an example of raw unannotated data:
and some annotated samples:
Labels
Name
Attribute
Type
Default
Description
Example
condition
healthy
boolean
true
Determine whether the fruit is healthy. If not regions with identified issues are annotated
condition
greening
boolean
false
Determine whether the fruit contains areas that are not uniformly yellow and have green areas.
image_quality
blurry
boolean
false
Fruit image is blurry
image_quality
cropped
boolean
false
Not all fruit parts are on the image
image_quality
unnatural_color
boolean
false
There are issues with color representation.
image_quality
no_data
boolean
false
There are black spots on the fruit image that do not contain data.
illness
-
region
-
-
gangrene
-
region
-
-
mould
-
region
-
-
blemish
artificial
region + boolean
-
-
dark_style_remains
-
region
-
After pollination the remains of style are preserved in the fruit. A dark area around the remain of style indicates an unhealthy fruit. This place is the region from which the fruit starts rotting or catches mould.
pedicel
-
region
-
Pedicel refers to a structure connecting a single flower to its inflorescence.
artifact
-
region
-
Image contains artifacts i.e. regions that are not related to a fruit and are a result of wrong image processing. Those regions should be identified and described.
File name
You will notice that file names are composed to form a specific identifier e.g.:
0037_G_I_120_A: 0037 (individual fruit instance), 120 (relative photo angle), A (photo position). Some of them are restricted to the original project and cannot be published.
If you use the lemons data set in a scientific publication, we would appreciate references to the following paper:
Biblatex entry:
@misc{softwaremill_2020,
author = {Maciej Adamiak},
title = {Lemons quality control dataset},
institution = {SoftwareMill},
month = jul,
year = 2020,
doi = {10.5281/zenodo.3965568},
url = {https://github.com/softwaremill/lemon-dataset}
}
License
MIT License
Copyright (c) 2020 SoftwareMill
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.