ComPhy:

Compositional Physical Reasoning ofObjects and Events from Videos


Abstract

Objects’ motions in nature are governed by complex interactions and their properties. While some properties, such as shape and material, can be identified via the object’s visual appearances, others like mass and electric charge are not directly visible. The compositionality between the visible and hidden properties poses unique challenges for AI models to reason from the physical world, whereas humanscan effortlessly infer them with limited observations. Existing studies on video reasoning mainly focus on visually observable elements such as object appearance, movement, and contact interaction. In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning(ComPhy) dataset. For a given set of objects, ComPhy includes few videos of them moving and interacting under different initial conditions. The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions posted on one of the videos. Evaluation results of several state-of-the-art video reasoning models on ComPhy show unsatisfactory performance as they fail to capture these hidden properties. We further propose an oracle neural-symbolic framework named Compositional Physics Learner (CPL), combining visual perception, physical property learning, dynamic prediction, and symbolic execution into a unified framework. CPL can effectively identify objects’ physical properties from their interactions and predict their dynamics to answer questions.


Example

In each example, models will be given one target video, four reference videos and a set of questions related to the target video. To answer the questions, the models need to unravel objects' compositional hidden properties, such as mass and charge, and use this knowledge to predict objects' dynamics.

Example 1
Example 2
Example 3

Target Video


Reference Videos

Questions

Factual Question

Is the cyan cube heavier than the rubber cylinder?

No.


Are there any uncharged blue cylinders that enter the scene

Yes.


Counterfactual Question

If the rubber cylinder were lighter, which of the following would happen?

The cube would collide with the rubber cylinder


The rubber cylinder and the sphere would collide


The metal object would collide with the sphere

Predictive Question

What will happen next?

The rubber cylinder and the metal object collide


The rubber cylinder and the sphere collide


The cube collides with the sphere


Target Video


Reference Videos

Questions

Factual Question

What are the colors of the two objects that are charged?

Yellow and blue.


What is the direction of the blue cube when the video ends?

Left.


Counterfactual Question

If the blue sphere were oppositely charged, what would happen?

The yellow sphere and the rubber cube would collide


The yellow object and the blue sphere would collide


The blue cube and the metal cube would collide


The yellow object and the red object would collide

Predictive Question

What will happen next?

The blue cube and the red cube collide


The blue sphere collides with the metal cube


Target Video


Reference Videos

Questions

Factual Question

What color is the moving rubber object that is uncharged?

Red.


How many moving cyan objects are charged?

1.


Counterfactual Question 1

If the sphere were oppositely charged, which of the following would happen?

The cylinder and the sphere would collide


The cylinder would collide with the cyan cube

Counterfactual Question 2

If the cyan object were uncharged, which event would happen?

The sphere and the cyan object would collide


The cylinder would collide with the sphere


The cylinder would collide with the gray cube


The sphere would collide with the yellow object


Dataset

Training and Validation Videos Train and validation Video Annotations Reference Videos Reference Annotations
Training Questions and Answers Validation Questions and Answers
Testing Videos Testing Reference Questions



Paper & Code

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan






Built and maintained by MIT-IBM Watson AI Lab.
Copyright © 2021 ComPhy