Learning Navigation Behaviors End to End
A longstanding goal of behavior-based robotics is to solve high-level navigation tasks using end to end navigation behaviors that directly map sensors to actions. Navigation behaviors, such as reaching a goal or following a path without collisions, can be learned from exploration and interaction with the environment, but are constrained by the type and quality of a robot's sensors, dynamics, and actuators. Traditional motion planning handles varied robot geometry and dynamics, but typically assumes high-quality observations. Modern vision-based navigation typically considers imperfect or partial observations, but simplifies the robot action space. With both approaches, the transition from simulation to reality can be difficult. Here, we learn two end to end navigation behaviors that avoid moving obstacles: point to point and path following. These policies receive noisy lidar observations and output robot linear and angular velocities. We train these policies in small, static environments with Shaped-DDPG, an adaptation of the Deep Deterministic Policy Gradient (DDPG) reinforcement learning method which optimizes reward and network architecture. Over 500 meters of on-robot experiments show , these policies generalize to new environments and moving obstacles, are robust to sensor, actuator, and localization noise, and can serve as robust building blocks for larger navigation tasks. The path following and point and point policies are 83
READ FULL TEXT