Ordered Atomic Activity for Fine-Grained Interactive Traffic Scenario Understanding
IEEE International Conference on Computer Vision (ICCV) 2023, Paris
We introduce a novel representation called Ordered Atomic Activity for interactive scenario understanding. The representation decomposes each scenario into a set of ordered atomic activities, where each activity consists of an action and the corresponding actors involved and the order denotes the temporal development of the scenario. The design also helps in identifying important interactive relationships such as yielding. The action is a high-level semantic motion pattern that is grounded in the surrounding road topology, which we decompose into zones and corners with unique IDs. For example, a group of pedestrians crossing on the left side is denoted as C1 → C4: P+, as depicted in Figure 1. We collect a new large-scale dataset called OATS (Ordered Atomic Activities in interactive Traffic Scenarios), comprising 1026 video clips (∼ 20s) captured at intersections. Each clip is labeled with the proposed language, resulting in 59 activity categories and 6512 annotated activity instances. We propose three fine-grained scenario understanding tasks, i.e., multi-label Atomic Activity recognition, recognition, activity order prediction, and interactive scenario retrieval. We implement various state-of-the-art algorithms and conduct extensive experiments on OATS. We found the existing methods cannot achieve satisfactory performance, indicating new opportunities for the community to develop new algorithms for these tasks toward better interactive scenario understanding