Computer Vision and Machine Learning Datasets
Computer Vision and Machine Learning requires standardised datasets for evaluation, comparison and learning. Datasets associated with our projects are linked below.
Computer Vision and Machine Learning requires standardised datasets for evaluation, comparison and learning. Datasets associated with our projects are linked below.
Download page PascalVOC2012 train and Cellpose
Download page PascalVOC2012 validation
For our eye tracking-based experiments, we used the PascalVOC2012 dataset, a widely adopted benchmark in computer vision and deep learning research. It is a publicly available dataset designed for image segmentation, detection, and classification, featuring exclusively realistic scenes. The annotated objects fall into 20 distinct classes. We used the latest version of the dataset, which was continuously extended between 2005 and 2012. It comprises 11,630 images with 27,450 annotated objects and 6,929 segmentation masks.
For our experiments in segmentation and classification tasks, we focused on the segmentation subset of PascalVOC2012, which is divided into training and validation splits. As an additional comparison, we also included a part of the Cellpose dataset, which contains segmentation masks of cells from various microscopy images.
Gaze data was recorded using a screen-based eye tracker (Tobii Pro Fusion) with a sampling frequency of 120 Hz. Before starting the annotation process, each participant underwent a calibration procedure to ensure accurate eye tracking data. In total, 11 different individuals contributed to the dataset. The setup was designed to guide users through the annotation process in a consistent way: to inform participants about the object of interest, the bounding box and ground truth polygon mask were displayed for 0.5 seconds prior to observation. For very small objects, an initial zoom-in was applied, and there was no time limit for viewing.To ensure clean segmentation-related gaze data, participants were instructed to indicate the start and end of each object inspection by pressing a button. Our annotation tool also allowed users to repeat an observation if needed and to freely navigate within the image using standard mouse interactions such as dragging, scrolling, and zooming.
The corresponding raw gaze data used in our studies are available through the links above.
Kockwelp, J., Beckmann, D., & Risse, B. (2025). Human Gaze Improves Vision Transformers by Token Masking. in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops
Beckmann, D., Kockwelp, J., Gromoll, J., Kiefer, F., & Risse, B. (2024). SAM meets Gaze: Passive Eye Tracking for Prompt-based Instance Segmentation. Proceedings of Machine Learning Research, 2023.
Kockwelp, J., Gromoll, J., Wistuba, J., & Risse, B. (2023). EyeGuide - From Gaze Data to Instance Segmentation. The British Machine Vision Conference (BMVC), Aberdeen.
The acute myeloid leukemia (AML) cell dataset consists of 96x96 pixel images of individual cells extracted from whole slide images (WSIs) of bone marrow smears. Each cell is linked to the genetic assessment of one of 408 patients, focusing on specific mutations. Due to the sensitive nature of this data, access is restricted and available only through a qualified request process. For more information, please contact Benjamin Risse (b.risse@uni-muenster.de).
The collision database (called Larvae Collision Dataset 2 Animals; short: LCD2A) contains 1352 image sequences with approximately 159300 individual images resulting from an interaction analysis (collisions) experiment of Drosophila larvae. The images were acquired by utilizing the FIM2c setup. For further information please refer to
This dataset (called "Larvae Collision Dataset 2 to 3" or LCD2t3) represents a refined Version of the original collision database LCD2A resulting from an interaction analysis (collisions) experiment of Drosophila larvae. The images were acquired by utilizing the FIM2c setup. For further information please refer to
The heartbeat database contains 39 image sequences with approximately 52700 individual images showing an (irregular) heartbeat of Drosophila melanogaster pupae. The images were acquired by utilizing the FIM setup. For further information please refer to