| CARVIEW |
Colonoscopy 3D Video Dataset (C3VD)
from Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration
Johns Hopkins University

Abstract
Screening colonoscopy is an important clinical application for several 3D computer vision techniques, including depth estimation, surface reconstruction, and missing region detection. However, the development, evaluation, and comparison of these techniques in real colonoscopy videos remain largely qualitative due to the difficulty of acquiring ground truth data. In this work, we present a Colonoscopy 3D Video Dataset (C3VD) acquired with a high definition clinical colonoscope and high-fidelity colon models for benchmarking computer vision methods in colonoscopy. We introduce a novel multimodal 2D-3D registration technique to register optical video sequences with ground truth rendered views of a known 3D model. The different modalities are registered by transforming optical images to depth maps with a Generative Adversarial Network and aligning edge features with an evolutionary optimizer. This registration method achieves an average translation error of 0.321 millimeters and an average rotation error of 0.159 degrees in simulation experiments where error-free ground truth is available. The method also leverages video information, improving registration accuracy by 55.6% for translation and 60.4% for rotation compared to single frame registration. 22 short video sequences were registered to generate 10,015 total frames with paired ground truth depth, surface normals, optical flow, occlusion, six degree-of-freedom pose, coverage maps, and 3D models. The dataset also includes screening videos acquired by a gastroenterologist with paired ground truth pose and 3D surface models. The dataset and registration source code are available at durr.jhu.edu/C3VD.
Citation
Please cite our publication if you use code or data from this site.
@article{bobrow2023,
title={Colonoscopy 3D video dataset with paired depth from 2D-3D registration},
author={Bobrow, Taylor L and Golhar, Mayank and Vijayan, Rohan and Akshintala, Venkata S and Garcia, Juan R and Durr, Nicholas J},
journal={Medical Image Analysis},
pages={102956},
year={2023},
publisher={Elsevier},
}
Results
Colonoscopy video frames (left) are registered with rendered views of a ground truth 3D model (right). Edge features (overlay) are aligned by optimizing a loss function (bottom).
Real colonoscope frames are paired with registered ground truth depth, surface normals, occlusion, and optical flow frames
Dataset
C3VD contains 22 registered videos with paired ground truth depth, surface normals, optical flow, occlusion, six degree-of-freedom pose, coverage maps, and 3D models. The dataset also includes 4 screening colonoscopy videos acquired by a gastroenterologist with paired ground truth pose and 3D surface models. 3D model files and molds are also available for download. Registration and rendering code is made available on GitHub.
Registered Videos
For each registered video frame, the dataset includes:
- Depth frame: depth along the camera frame’s z-axis, clamped from 0-100 millimeters. Values are linearly scaled and encoded as a 16-bit grayscale image.
- Surface normal frame: reported with respect to the camera coordinate system. X/Y/Z components are stored in separate R/G/B color channels. Components are linearly scaled from ± 1 to 0-65535. Values are encoded as a 16-bit color image.
- Optical flow frame: computed flowing from the current frame to the previous frame, meaning the first frame in the sequence has no value. Values are saved in a color image, where the R-channel contains X-direction motion (left→right, -20 to 20 pixels), and the G-channel contains Y-direction motion (up→down, -20 to 20 pixels). Values are linearly scaled from 0 to 65535 and encoded as a 16-bit color image.
- Occlusion frame: encoded as an 8-bit binary image. Pixels occluding other mesh faces within 100mm of the camera origin are assigned a value of 255, and all other pixels are assigned a value of 0.
- Camera pose: saved in a file named pose.txt. Each line contains a homogenous camera-to-world transformation matrix (flattened in row-major order) corresponding to each frame.
For each video sequence, we also provide:
- 3D model and coverage map: ground truth triangulated mesh, stored as a Wavefront OBJ file named coverage_mesh.obj. Coverage is embedded in the OBJ file by texture vertices assigned to each face (vt=1 is observed, vt=2 is unobserved).
| Model | Texture | Video | # Frames | Download | |
|---|---|---|---|---|---|
| Cecum | 1 | a | 276 | Preview | cecum_t1_a.zip (2.86 GB) |
| Cecum | 1 | b | 765 | Preview | cecum_t1_b.zip (8.36 GB) |
| Cecum | 2 | a | 370 | Preview | cecum_t2_a.zip (3.71 GB) |
| Cecum | 2 | b | 1,142 | Preview | cecum_t2_b.zip (11.06 GB) |
| Cecum | 2 | c | 595 | Preview | cecum_t2_c.zip (6.13 GB) |
| Cecum | 3 | a | 730 | Preview | cecum_t3_a.zip (6.80 GB) |
| Cecum | 4 | a | 465 | Preview | cecum_t4_a.zip (5.04 GB) |
| Cecum | 4 | b | 425 | Preview | cecum_t4_b.zip (4.41 GB) |
| Descending Colon | 4 | a | 148 | Preview | desc_t4_a.zip (1.24 GB) |
| Sigmoid Colon | 1 | a | 700 | Preview | sigmoid_t1_a.zip (5.20 GB) |
| Sigmoid Colon | 2 | a | 514 | Preview | sigmoid_t2_a.zip (4.22 GB) |
| Sigmoid Colon | 3 | a | 613 | Preview | sigmoid_t3_a.zip (4.58 GB) |
| Sigmoid Colon | 3 | b | 536 | Preview | sigmoid_t3_b.zip (4.21 GB) |
| Transcending Colon | 1 | a | 61 | Preview | trans_t1_a.zip (0.59 GB) |
| Transcending Colon | 1 | b | 700 | Preview | trans_t1_b.zip (5.07 GB) |
| Transcending Colon | 2 | a | 194 | Preview | trans_t2_a.zip (1.58 GB) |
| Transcending Colon | 2 | b | 103 | Preview | trans_t2_b.zip (0.97 GB) |
| Transcending Colon | 2 | c | 235 | Preview | trans_t2_c.zip (1.83 GB) |
| Transcending Colon | 3 | a | 250 | Preview | trans_t3_a.zip (1.83 GB) |
| Transcending Colon | 3 | b | 214 | Preview | trans_t3_b.zip (1.66 GB) |
| Transcending Colon | 4 | a | 382 | Preview | trans_t4_a.zip (3.10 GB) |
| Transcending Colon | 4 | b | 597 | Preview | trans_t4_b.zip (4.61 GB) |
Screening Videos
In addition to the video sequence, each file also contains camera pose information saved in a file named pose.txt. Each line contains a homogenous pose (flattened in row-major order) corresponding to each frame.
| Model | Texture | # Frames | Download | |
|---|---|---|---|---|
| Full Colon | 1 | 5,458 | Preview | screening_t1.zip (8.13 GB) |
| Full Colon | 2 | 5,100 | Preview | screening_t2.zip (7.09 GB) |
| Full Colon | 3 | 4,726 | Preview | screening_t3.zip (7.07 GB) |
| Full Colon | 4 | 4,774 | Preview | screening_t4.zip (7.36 GB) |
3D Model Files
| Model | Object Download | Mold Download |
|---|---|---|
| Ascending Colon | ascend_model.obj (25.4 MB) | ascend_mold.zip (18.7 MB) |
| Cecum | cecum_model.obj (54.8 MB) | cecum_mold.zip (24.9 MB) |
| Descending Colon | desc_model.obj (38.0 MB) | desc_mold.zip (26.6 MB) |
| Sigmoid Colon | sigmoid_model.obj (20.8 MB) | sigmoid_mold.zip (42.2 MB) |
| Transcending Colon | trans_model.obj (18.3 MB) | trans_mold.zip (24.1 MB) |
| Full Colon | full_model.obj (194.8 MB) |
Calibration Files
Revision History
10/14/2023 | Updated the dataset file names to reflect peer-review completion.
05/03/2023 | Revised ground truth surface normal frames and updated naming convention:
- Corrected an error in the rendering code clipped negative surface normal z-components to 0 and resulted in some surface normals having a non-unitary length.
- Surface normal axes were updated from +x pointing right, +y pointing up, and +z pointing out of the screen to +x pointing right, +y pointing down, and +z pointing into the screen to be consistent with the camera coordinate system as shown in Figure 3 of the paper.
- The naming convention of the frames was updated to include zero padding (e.g. 0005_color.png).
This work is licensed under CC BY-NC-SA 4.0
