EPOSH - Honda Research Institute USA
Introduction
Most video clips are between 10 − 30 sec long and are recorded around construction zones. For each video about 10 frames are manually selected and annotated. We annotate a total of 5, 630 perspective images.
Using COLMAP, we reconstruct a 3D dense point cloud given a video clip. We then annotate semantic labels of 3D points manually. A total of about 70, 000 BEV image / ground truth pairs are constructed.
The below showing distribution of classes and corresponding attributes in the perspective EPOSH dataset. The left subplot shows classes and the right subplot shows the corresponding attributes and affordance classes in the dataset.
The below figure shows a sample annotation from the EPOSH dataset,
Availability of code
The EPOSH dataset and code used for this work will be available soon. If you would like to get a notification upon release, please leave your email here.