Learning to grasp, be it from experience or data, has transformed how we view robotic grasping. In the last decade alone, the idea of learning has resulted in numerous approaches that can quickly generate successful grasps on a wide variety of unknown objects, outperforming previous non-learning-based methods by a large margin. However, many of the learning-based methods reach such a performance by limiting themselves to the generation of 4-Degree of Freedom (DoF) top-down parallel-jaw grasps on singulated objects. These limitations facilitate learning by constraining the search space, but also prevent sampling 6-DoF multi-finger grasps that are useful in, for example, semantic grasping. This dissertation aims to determine whether explicit scene understanding, such as completely reconstructing object shapes from partial point clouds instead of using the point clouds directly, can lift these limitations. More specifically, it investigates the methods and benefits of including explicit scene understanding when learning singulated objects and objects in clutter 6-DoF parallel jaw and multi-finger grasp samplers. To this aim, we first explore 4-DoF grasping and present a shape reconstruction method that enables 4-DoF top-down grasping methods to generate complete 6-DoF grasps. The same reconstruction method is also applied to enable quick 6-DoF multi-finger grasp sampling. Then, we investigate how to represent object shape and composition uncertainties in grasping, resulting in two grasp planners: one robust grasp plannerover uncertain shape completions and one POMDP planner over object composition uncertainties. Finally, we explore grasping objects in cluttered scenes, where we propose to reconstruct every object in the scene with object segmentation and object reconstruction to facilitate grasping. This process, named scene completion, was fundamental for developing a fast target-driven multi-finger grasp sampler for grasping objects in clutter. Together, all the results indicate that explicit scene understanding does increase the generality, robustness, and performance of approaches that learn 6-DoF parallel jaw and multi-finger grasping, albeit at a higher computational cost. Consequently, we recommend that roboticists consider explicit scene understanding when developing new grasping approaches.
|Julkaisun otsikon käännös
|Towards Robust 6-DoF Multi-Finger Grasping in Clutter with Explicit Scene Understanding
|Julkaistu - 2022