Past Research
2022 Funded Projects
Project Description
The main objective of this project is to investigate the benefits and challenges of targeted 3D and semantic reconstruction and to develop quality-adaptive semantically guided Simultaneous Localization and Mapping (SLAM) algorithms. The goal is to make an agent (e.g., Boston Dynamics Spot robot) able to navigate and find a target object (or other semantics) in an unknown or partially known environment while reconstructing the scene in a quality-adaptive manner. We interpret being quality-adaptive as making the reconstruction accuracy and detailedness dependent on finding the target class – i.e., reconstruct only until we are certain that the observed object does not belong to the target class.
Principal Investigator
Prof. Marc Pollefeys
Dr. Iro Armeni
Dr. Daniel Barath
Duration
01.09.2022 - 01.03.2024 (18 months)
Most important achieved milestones
1. Quality-Adaptive 3D Semantic Reconstruction. An algorithm for quality-adaptive semantic reconstruction was designed. This algorithm employs a multi-layer voxel structure to represent the environment. Each voxel encapsulates a truncated signed distance function (TSDF) value indicating the distance to the nearest 3D surface, alongside color, texture information, surface normal, and potential semantic classifications. Adaptive voxel subdivision into eight smaller voxels is governed by multiple criteria, including predefined target semantic categories. This approach allows users to delineate objects requiring high-resolution reconstruction from those less critical for the task at hand. The algorithm categorizes resolution into three levels: coarse (8 cm voxel size), middle (4 cm), and fine (1 cm), adjustable based on task requirements. Furthermore, a criterion based on geometric complexity has been established, facilitating the high-quality automatic reconstruction of complex structures irrespective of their semantic classification.
Our current extension on this method entails separating the SLAM reconstruction's geometric complexity from texture details, aiming for high-quality renderings without storing excessively detailed geometry. This is particularly relevant for simple geometries with complex textures, where current methods result in unwarranted reconstruction complexity and substantial storage demands. The proposed solution involves utilizing a coarse, adaptable voxel structure for geometry, with color data in 3D texture boxes, leveraging a triplanar mapping algorithm for enhanced rendering quality with minimal geometric detail.
2. An algorithm was introduced to enhance Voxblox++ significantly, enabling high-quality, real-time, incremental 3D panoptic segmentation of the environment. This method combines 2D-to-3D semantic and instance mapping to surpass the accuracy of recent 2D-to-3D semantic instance segmentation techniques on large-scale public datasets. Improvements over Voxblox++ include
- a novel application of 2D semantic prediction confidence in the mapping process,
- a new method for segmenting semantic-instance consistent surface regions (super-points) and
- a new graph optimization-based approach for semantic labeling and instance refinement.
3. Another significant contribution of the project is a novel matching algorithm that incorporates semantics for enhanced feature identification within a SLAM pipeline. This method generates a semantic descriptor from each feature's vicinity, which is integrated with the conventional visual descriptor for feature matching. Demonstrated improvements in accuracy, verified using publicly available datasets, underscore the method's effectiveness while maintaining real-time performance capabilities.
Most important publications
- Oguzhan Ilter, Iro Armeni, Marc Pollefeys, Daniel Barath (ICRA 2024): external page Semantically Guided Feature Matching for Visual SLAM
- Yang Miao, Iro Armeni, Marc Pollefeys, Daniel Barath (IROS 2024): external page Volumetric Semantically Consistent 3D Panoptic Mapping
- Jianhao Zheng, Daniel Barath, Marc Pollefeys, Iro Armeni (ECCV 2024): external page MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps
Links to images and videos
Project Description
Within the project, we investigate representations of soft and/or articulated robots and objects to allow general manipulation pipelines. To apply these representations for real manipulation tasks, we develop a dexterous robotic platform.
Principal Investigator
Prof. Robert Katzschmann
Prof. Fisher Yu
Duration
01.07.2022 - 01.01.2024 (18 months)
Most important achieved milestones
1. We present ICGNet, which uses pointcloud data to create an embedding that contains both surface and volumetric information, and can be used to predict occupancy, object classes and physics or application specific details like grasp poses.
2. We developed a real time tracking framework for soft and articulated robots that allows for real time mesh construction from pointcloud data with point-wise errors that are almost an order of magnitude lower than the state of the art.
3. As an application platform, we constructed a dexterous robotic hand that is capable of precise and fast manipulation of objects.
Most important publications
Yasunori Toshimitsu, Benedek Forrai, Barnabas Gavin Cangan, Ulrich Steger, Manuel Knecht, Stefan Weirich and Robert K. Katzschmann (Humanoids 2023): Getting the Ball Rolling: Learning a Dexterous Policy for a Biomimetic Tendon-Driven Hand with Rolling Contact Joints
René Zurbrügg, Yifan Liu, Francis Engelmann, Suryansh Kumar, Marco Hutter, Vaishakh Patil and Fisher Yu (ICRA 2024): external page ICGNet: A Unified Approach for Instance-Centric Grasping
Elham Amin Mansour, Hehui Zheng and Robert K. Katzschmann (ROBOVIS 2024): external page Fast Point Cloud to Mesh Reconstruction for Deformable Object Tracking
Links to images and videos
external page ICGNet Architecture (CC BY-NC-ND 4.0)
external page Grasp prediction pipeline (CC BY-NC-ND 4.0)
external page Predicted grasps with ICGNet (CC BY-NC-ND 4.0)
external page Dexterous robotic hand demo video (Apache-2.0, BSD-3-Clause)
external page Robotic hands learning in simulation (Apache-2.0, BSD-3-Clause)
external page Robotic hands + rendering pointclouds for ICGNet in simulation (Apache-
2.0, BSD-3-Clause)