Robotic Grasp Prediction for Selective Objects in a Cluttered Environment using Object Centric Masking

Video demo

Abstract

Intelligent grasping is a very active field of research in robotics. In the current scenario, a robot can identify the object pose (position and orientation) and pick it up by predicting a grasp rectangle but cannot predict the grasp parameters for a particular class of object out of multiple objects in a cluttered environment. Also, the architectures used for predicting grasp parameters usually suffer from the problem of finding regions of interest (ROI) in a large-size image or image with multiple objects, which deteriorates the prediction quality. To solve this shortcoming, we have proposed a method to identify the ROI and concentrate the grasp prediction model on the ROI. We have used a state-of-the-art object detection model YOLOv5 and grasp prediction model GR-ConvNet as major components of our method. The YOLOv5 network is used to localize the object of interest in the scene and outputs the position coordinates for the object of interest. We propose a novel center-passing mechanism for GR-ConvNet, which considers the object location as reference while pre-processing the image. It uses a white mask technique for isolating the object of interest. With our white masking approach, we are claiming an accuracy of 96.25% on Cornell Dataset. Our method is also tested in a real-world scenario for two objects: a duster and screwdriver in a cluttered scenes. Experimentally we have demonstrated a successful grasp execution on a 7 DOF Baxter robot with 87.80% and 79.07% accuracy on sparsely and densely placed objects respectively.