In this paper, a Reinforcement Learning approach to the problem of grasping a target object from clutter by a robotic arm is addressed. A layered architecture is devised to the scope. The bottom layer is in charge of planning robot motion in order to relocate objects while taking into account robot constraints, whereas the top layer takes decision about which obstacles to relocate. In order to generate an optimal sequence of obstacles according to some metrics, a tree is dynamically built where nodes represent sequences of relocated objects and edge weights are updated according to a Q-learning-inspired algorithm. Four different exploration strategies of the solution tree are considered, ranging from a random strategy to a ε-Greedy learning-based exploration. The four strategies are compared based on some predefined metrics and in scenarios with different complexity. The learning-based approaches are able to provide optimal relocation sequences despite the high dimensional search space, with the ε-Greedy strategy showing better performance, especially in complex scenarios.
Task-motion planning via tree-based Q-learning approach for robotic object displacement in cluttered spaces
Golluccio G.
;Di Vito D.;Marino A.;Bria A.;Antonelli G.
2021-01-01
Abstract
In this paper, a Reinforcement Learning approach to the problem of grasping a target object from clutter by a robotic arm is addressed. A layered architecture is devised to the scope. The bottom layer is in charge of planning robot motion in order to relocate objects while taking into account robot constraints, whereas the top layer takes decision about which obstacles to relocate. In order to generate an optimal sequence of obstacles according to some metrics, a tree is dynamically built where nodes represent sequences of relocated objects and edge weights are updated according to a Q-learning-inspired algorithm. Four different exploration strategies of the solution tree are considered, ranging from a random strategy to a ε-Greedy learning-based exploration. The four strategies are compared based on some predefined metrics and in scenarios with different complexity. The learning-based approaches are able to provide optimal relocation sequences despite the high dimensional search space, with the ε-Greedy strategy showing better performance, especially in complex scenarios.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.