Computer Vision & Graphics Group

Fish Tracking and Segmentation Dataset

 

Introduction

The fish tracking and segmentation dataset is designed to facilitate benchmarking for tracking and segmentation in aquaculture applications. Two videos (A and B) of cultivated Sulawsi ricefish shoals were recorded under controlled environment and semi-automatically labeled. The dataset is characterized by high fish density, frequent occlusions, and varying illumination conditions, making it a challenging benchmark for object tracking and segmentation algorithms. Each video contains over 101 frames annotated with bounding boxes, segmentation masks, and consistent track IDs across frames.

Downloads:

For convenience, the dataset is provide in three different formats. However, we recommend the use of the Datumaro format, as this is the only format supporting bounding boxes, segmentations and consistent tracks. Yolo_ultralytics is easy to use for object detection tasks on bounding boxes, while MOT format is suitable for multi-object tracking evaluations.

All submitted papers or any publicly available text using this Fish Tracking and Segmentation Dataset must cite the following paper: Palm T., Seibold C., Hilsmann A., Eisert P., Video-Based Locomotion Analysis for Fish Health Monitoring, Proc. Int. Conf. on Computer Vision Theory and Applications (VISAPP), Spain, Marbella, March. 2026.

Profile

Video A Video B
Resolution 2448x2048
#Frames 101 101
#Tracks 256 45
Avg. #fish ~91.2 ~19.5
Camera a2A2448-75ucPRO
Lens Ricoh FL-CC0614A-2M (6mm)


Below the spatial location of the fish instance in video A and B are shown as heatmaps, as well as a scatterplot depicting the relative fish sizes to the image size.

Heatmap showing fish positions for Video A Heatmap showing fish positions for Video B
Scatterplot depicting the relative fish sizes to the image size

Related publication

Palm T., Seibold C., Hilsmann A., Eisert P., Video-Based Locomotion Analysis for Fish Health Monitoring, Proc. Int. Conf. on Computer Vision Theory and Applications (VISAPP), Spain, Marbella, March. 2026.

Acknowledgements

This work was partly funded by the German Federal Ministry for Economic Affairs and Energy (FischFitPro, grant no. 16KN095536) and the Federal Ministry of Research, Technology and Space (REFRAME, grant no. 01IS24073A).