Why CORe50?

Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data becomes available is infeasible, due to computational and storage issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g. robotics), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques.

In this page we provide a new dataset and benchmark CORe50, specifically designed for Continuous Object Recognition, and introduce baseline approaches for different continuous learning scenarios. If you plan to use this dataset or other resources you'll find in this page, please cite our latest paper:

Vincenzo Lomonaco and Davide Maltoni. "CORe50: a new Dataset and Benchmark for Continuous Object Recognition". arXiv preprint arXiv:1705.03550 (2017).

Dataset

CORe50, specifically designed for (C)ontinuous (O)bject (Re)cognition, is a collection of 50 domestic objects belonging to 10 categories: plug adapters, mobile phones, scissors, light bulbs, cans, glasses, balls, markers, cups and remote controls. Classification can be performed at object level (50 classes) or at category level (10 classes). The first task (the default one) is much more challenging because objects of the same category are very difficult to be distinguished under certain poses.

The dataset has been collected in 11 distinct sessions (8 indoor and 3 outdoor) characterized by different backgrounds and lighting. For each session and for each object, a 15 seconds video (at 20 fps) has been recorded with a Kinect 2.0 sensor delivering 300 RGB-D frames.

Objects are hand hold by the operator and the camera point-of-view is that of the operator eyes. The operator is required to extend his arm and smoothly move/rotate the object in front of the camera. A subjective point-of-view with objects at grab-distance is well-suited for a number of robotic applications. The grabbing hand (left or right) changes throughout the sessions and relevant object occlusions are often produced by the hand itself.

core50 dataset examples

Fig.1 Example images of the 50 objects in CORe50. Each column denotes one of the 10 categories.


The presence of temporal coherent sessions (i.e., videos where the objects gently move in front of the camera) is another key feature since temporal smoothness can be used to simplify object detection, improve classification accuracy and to address semi-supervised (or unsupervised) scenarios.

In Fig. 1 you can see some image examples of the 50 objects in CORe50 where each column denotes one of the 10 categories and each row a different object. The full dataset consists of 164,866 128×128 RGB-D images: 11 sessions × 50 objects × (around 300) frames per session. Three of the eleven sessions (#3, #7 and #10) have been selected for test and the remaining 8 sessions are used for training. We tried to balance as much as possible the difficulty of training and test session with respect to: indoor/outdoor, holding hand (left or right) and complexity of the background. For more information about the dataset take a look a the section "CORe50" in the paper.

Benchmark

Popular datasets such as ImageNet and Pascal VOC, provide a very good playground for classification and detection approaches. However, they have been designed with “static” evaluation protocols in mind; the entire dataset is split in just two parts: a training set is used for (one-shot) learning and a separate test set is used for accuracy evaluation.

Splitting the training set into a number of batches is essential to train and test continuous learning approaches, a hot research topic that is currently receiving much attention. Unfortunately, most of the existing datasets are not well suited to this purpose because they lack a fundamental ingredient: the presence of multiple (unconstrained) views of the same objects taken in different sessions (varying background, lighting, pose, occlusions, etc.). Focusing on Object Recognition we consider three continuous learning scenarios:

core50 dataset examples core50 dataset examples core50 dataset examples

Fig. 2 Mid-CaffeNet accuracy in the NI, NC and NIC scenarios (average over 10 runs).


As argued by many researchers, naïve approaches cannot avoid catastrophic forgetting in complex real-world scenarios such as NC and NIC. In our work we have designed simple baselines which can perform markedly better than naïve strategies but still leave much room for improvements (see Fig. 2). Check it out the full results on our paper or download them as tsv or python dicts in the section below!

Download

In order to facilitate the usage of the benchmark we freely release the dataset, the code to reproduce the baselines and all the materials which could be useful for speeding-up the creation of new continuous learning strategies on CORe50.

Dataset

The dataset directory tree is not that different from what you may expect. For each session (s1, s2, ..., s11) we have 50 directories (o1, o2, ..., o50) representing the 50 objects contained in the dataset. Below you can see to which class each object instance id corresponds to:

[o1, ..., o5] -> plug adapters
[o6, ..., o10] -> mobile phones
[o11, ..., o15] -> scissors
[o16, ..., o20] -> light bulbs
[o21, ..., o25] -> cans
[o26, ..., o30] -> glasses
[o31, ..., o35] -> balls
[o36, ..., o40] -> markers
[o41, ..., o45] -> cups
[o46, ..., o50] -> remote controls
In each object directories, the temporal coherent frames are characterized by an unique filename with the format "C_[session_num]_[obj_num]_[frame_seq_id].png" :

CORe50/
  |
  |--- s1/
  |    |------ o1/
  |    |       |---- C_01_01_XXX.png
  |    |       |---- ...
  |    | 
  |    |------ o2/
  |    |------ ...
  |    |------ o50/
  |
  |--- s2/
  |--- s3/
  |--- ...
  |--- s11/

Since we make available both the 350 x 350 original images and their cropped version (128 x 128), we thought it would be useful also to release the bounding boxes with respect to the original image size.
The bbox coordinates for each image are automatically extracted based on a very simple tracking technique, briefly described in the paper. In the bbox.zip you can download below you will find for each object and session a different txt file. Each file follows the format:

Color000: 142 160 269 287
Color001: 143 160 270 287
Color002: 145 160 272 287
Color003: 149 160 276 287
Color004: 149 159 276 286
...

So, for each image ColorID, we have the bbox in the common format [min x, min y, max x, max y] of the image coordinates system.

Full-size_350x350_images.zip. 350 x 350 images before tracking in the png format.
Cropped_128x128_images.zip. 128 x 128 images used for the CORe50 benchmark in the png format.
Bbox.zip. Bounding boxes for the full-size version in the txt format.

In order to better track the moving objects or to further improve the object recognition accuracy, we release also the depth map in the same format we have seen before for the colored images:

CORe50/
  |
  |--- s1/
  |    |------ o1/
  |    |       |---- D_01_01_XXX.png
  |    |       |---- ...
  |    | 
  |    |------ o2/
  |    |------ ...
  |    |------ o50/
  |
  |--- s2/
  |--- s3/
  |--- ...
  |--- s11/

As you can see from Fig. 3, the depth map is not perfect (further enhancing preprocessing steps can be performed) but it can be also easily converted in a segmentation map using a moving threshold.

core50 dataset examples core50 dataset examples

Fig. 3 Example of a depth map (dark is far) for the object 33 with a complex background. Chessboard pattern where the depth information is missing.


Full-size_350x350_depth.zip. Depth map for the full-size version in the png format (grayscale, transparent when the depth information is missing).
Cropped_128x128_depth.zip. Depth map for the cropped version in the png format (grayscale, transparent when the depth information is missing).

Finally, to make sure everything is there, you can check the exact number of (color + depth) images you're going to find for each object and session:

Dataset_dims.zip. Exact number of frames for each object and session.

Configuration Files

Do you wanna use a different DL framework or programming language but still being able to compare your results with our benchmark? Well that's easy! Just download the batches filelists for each experiment in plain .txt format! The filelists directory tree will look like this:

filelists/
  |
  |--- NI_inc/
  |       |------ Run0/
  |       |         |------ train_batch_00_filelist.txt
  |       |         |------ train_batch_01_filelist.txt
  |       |         |------ ...
  |       |         |------ test_filelist.txt
  |       |         
  |       |------ Run1/
  |       |------ ...
  |       |------ Run9/
  |
  |--- NI_cum/
  |--- NC_inc/
  |--- NC_cum/
  |--- NIC_inc/
  |--- NIC_cum/
So, for each scenario (NI, NC, NIC) we have two different filelists folders depending on the main strategy, cumulative or incremental, with the _cum or _inc suffix respectively. Then, for each of these folders, the configuration files of 10 different runs are reported. For each of these runs a number of "train_batch_XX_filelist.txt" is available along with the "test_filelist.txt". Each file is formatted in the caffe filelist format.

Please note that: (i) the order may highly impact the accuracy in a continuous learning scenario! This is why a multiple runs configuration is needed; (ii) the labels are coherent only for each run among the train-test filelists and do not stay necessary the same among the scenarios due to technical implementation details!
batches_filelists.zip. Plain text filelists for each experiment and run.

Results

If you want to compare your strategy with our benchmark or just take a closer look at the tabular results, we are making available the tab separated values (.tsv) files for each experiment! Which you can easily import in excel or parse with your favourite programming language ;-) The format of the text files is the following:
###################################
# scenario: NI
# net: mid-caffenet
# strategy: naive
###################################
RunID	Batch0	Batch1	Batch2	Batch3	Batch4	Batch5	Batch6	Batch7
0	44,30%	35,50%	55,26%	55,86%	54,39%	48,90%	47,77%	60,71%
1	47,56%	48,94%	51,83%	53,64%	51,26%	43,39%	58,98%	55,61%
2	40,06%	53,04%	46,65%	45,19%	44,09%	50,92%	54,42%	57,01%
3	46,29%	48,66%	54,64%	52,48%	44,36%	51,55%	50,64%	44,26%
4	28,40%	43,77%	43,27%	51,00%	58,26%	56,89%	52,08%	61,94%
5	34,65%	34,36%	50,14%	56,89%	59,67%	56,66%	56,16%	52,66%
6	40,80%	51,92%	51,44%	53,12%	58,19%	49,72%	50,33%	48,14%
7	30,96%	51,54%	46,28%	51,69%	56,88%	55,44%	55,72%	49,98%
8	34,32%	32,19%	40,55%	52,06%	57,86%	57,41%	57,80%	61,86%
9	46,06%	47,62%	51,80%	46,92%	42,30%	57,70%	59,98%	50,01%
avg	38,59%	44,44%	48,90%	52,44%	53,88%	52,32%	53,77%	54,69%
dev.std	6,87%	7,88%	4,84%	3,59%	6,75%	4,74%	4,06%	6,18%
....
For convenience, we have split the .tsv in three main files with respect of the three different scenarios, but with this format you can easily concatenate them in a single file if you need it. For an even simpler analysis we have already prepared for you a Python pickled OrderedDict which you can simply load and use to access/plot the results:
 # loading the picked file
>>> import pickle as pkl
>>> pkl_file = open('results.pkl', 'rb')
>>> exps = pkl.load(pkl_file)

# using the dict like exps[scenario][net][strategy][run][batch]
>>> print exps['NI']['mid-caffenet']['naive']['avg'].values()
[38.59, 44.44, 48.90, 52.44, 53.88, 52.32, 53.77, 54.69]
>>> print exps['NI']['mid-caffenet']['naive']['avg']['Batch0']
38.59
A similar approach can be used with the sequential experiments of Fig. 5 in the paper. You can find more code examples about extracting results data in the README.txt of the Github repo. Otherwise you can directy download the tsvs and the python dicts here:
tsv_results.zip. Tab-separated-values results for each scenario.
results.pkl. Pre-loaded Python dict (pickled).
seq.tsv. Tab-separated-values results for the sequential experiments.
seq_results.pkl. Pre-loaded Python dict (pickled) for the sequential experiments.

Code

The raw code is already available in the master branch of this github repository! However, more scripts, examples and documentation notebooks are coming... stay tuned! If you want to contribute or you find a bug, please make a PR or simply open an issue (also questions are welcomed)! We guarantee at least an answer in 48hrs! :-)
core50-master.zip. All the code, materials, scripts in order to use/reproduce the benchmark.

Contacts

This Dataset and Benchmark has been developed at the University of Bologna with the effort of different people:


For further inquiries you are free to contact Vincenzo Lomonaco through his email: vincenzo.lomonaco@unibo.it