Data
Training Data Overview
The dataset we are providing will be made up of 8x 225-frame videos of stereo camera images acquired from a da Vinci Xi robot during several different porcine procedures. To avoid redundancy, we sample the frames from 30 Hz video at 2 Hz. To extract the 1280x1024 camera images from the video frames, crop the image from the pixel (320, 28).
In each frame we have hand labelled the articulated parts of the robotic surgical instruments, where we divide the instrument up into a rigid shaft, an articulated wrist and claspers. We additionally label a further miscellaneous category for any other surgical instrument such as a laparoscopic instrument or a drop-in ultrasound probe etc. This is so you can train your algorithms not to mistake these objects for robotic surgical instruments.
In the training set directory there will be a file called parts_mapping.json. This contains the mapping between the numerical value assigned to each part and the name of that part. There will also be a file called type_mapping.json. This contains a unique numerical value which maps between each instrument type and their real names.
Example Frames
Test Data and Evaluation Overview
The test set will consist of 8x 75-frame sequences containing footage sampled immediately after each training sequence and 2 full 300-frame sequences. These sequences will be sampled at the same rate as the training set.
Participants will be evaluated on each test set separately. If a machine learning approach is taken to the problem, participants should exclude the corresponding training set when evaluating on one of the 75-frame sequences. This is to avoid bias in the training.
Submissions will be compared with hand-labelled ground truth using intersection-over-union (IOU). We invite participants to attempt 3 different problems:
- Compute binary instrument segmentations, where each instrument pixel should be labelled 255 and each background pixel should be labelled 0.
- Compute multi-label instrument segmentations, where each instrument pixel should be labelled with the corresponding index given in the training set and each background pixel should be labelled 0.
- Compute instrument type segmentations, where each instrument pixel should be labelled with the corresponding instrument type as given in the training set and each background pixel should be labelled 0.
The total user score will be given as a weighted average of the 3 sub-tasks, with a 50/35/15 weighting. Although you do not have to enter all of the tasks, you have a much higher chance of winning if you do!