Single View 3D Reconstruction and Parsing Using Geometric Commonsense for Scene Understanding

Abstract

My thesis studies this topic in three perspective: 3D scene reconstruction to understand the 3D structure of a scene. Geometry and physics reasoning to understand the relationships of objects in a scene. The interaction between human action and objects in a scene.Specifically, the 3D reconstruction builds a unified grammatical framework capable of reconstructing a variety of scene types from a single input image. The key idea of our approach is to study a novel commonsense reasoning framework that mainly exploits two types of prior knowledges: prior distributions over a single dimension of objects, e.g., that the length of a sedan is about 4.5 meters; pair-wise relationships between the dimensions of scene entities, e.g., that the length of a sedan is shorter than a bus. These unary or relative geometric knowledge, once extracted, are fairly stable across different types of natural scenes, and are informative for enhancing the understanding of various scenes in both 2D images and 3D world. Methodologically, we propose to construct a hierarchical graph representation as a unified representation of the input image and related geometric knowledge. We formulate these objectives with a unified probabilistic formula and develop a data-driven Monte Carlo method to infer the optimal solution with both bottom-to-up and top-down computations. Results with comparisons on public datasets showed that our method clearly outperforms the alternative methods.For geometry and physics reasoning, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph to represent the energy landscape and use a Swendsen-Wang Cut method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.Detecting potential dangers in the environment is a fundamental ability of living beings. In order to endure such ability to a robot, my thesis presents an algorithm for detecting potential falling objects, i.e. physically unsafe objects, given an input of 3D point clouds captured by the range sensors. We formulate the falling risk as a probability or a potential that an object may fall given human action or certain natural disturbances, such as earthquake and wind. Our approach differs from traditional object detection paradigm, it first infers hidden and situated "causes of the scene, and then introduces intuitive physical mechanics to predict possible "effects as consequences of the causes. In particular, we infer a disturbance field by making use of motion capture data as a rich source of common human pose movement. We show that, by applying various disturbance fields, our model achieves a human level recognition rate of potential falling objects on a dataset of challenging and realistic indoor scenes.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,038

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Scene Perception.Ronald A. Rensink - 2000 - In A. E. Kazdin (ed.), Encyclopedia of Psychology. Oxford University Press. pp. 151-155.
Formal analysis of recognition scenes in the Odyssey.Peter Gainsford - 2003 - Journal of Hellenic Studies 123:41-59.
Image and Virtual Scene.J. Byst?ický - 2003 - Filozofia 58:383-395.
Chromatic diversity of natural scenes.J. M. M. Linhares, S. M. C. Nascimento, D. H. Foster & K. Amano - 2004 - In Robert Schwartz (ed.), Perception. Malden Ma: Blackwell. pp. 65-65.
Visuomotor extrapolation.David Whitney - 2008 - Behavioral and Brain Sciences 31 (2):220-221.
Scene congruency biases Binocular Rivalry.Liad Mudrik, Leon Y. Deouell & Dominique Lamy - 2011 - Consciousness and Cognition 20 (3):756-767.

Analytics

Added to PP
2017-06-07

Downloads
3 (#1,713,259)

6 months
2 (#1,201,619)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references