The design of high-performing robotic controllers constitutes an example of expensive optimization in uncertain environments due to the often large parameter space and noisy performance metrics. There are several evaluative techniques that can be employed for on-line controller design. Adequate benchmarks help in the choice of the right algorithm in terms of final performance and evaluation time. In this paper, we use multi-robot obstacle avoidance as a benchmark to compare two different evaluative learning techniques: Particle Swarm Optimization and Q-learning. For Q-learning, we implement two different approaches: one with discrete states and discrete actions, and another one with discrete actions but a continuous state space. We show that continuous PSO has the highest fitness overall, and Q-learning with continuous states performs significantly better than Q-learning with discrete states. We also show that in the single robot case, PSO and Q-learning with discrete states require a similar amount of total learning time to converge, while the time required with Q-learning with continuous states is significantly larger. In the multi-robot case, both Q-learning approaches require a similar amount of time as in the single robot case, but the time required by PSO can be significantly reduced due to the distributed nature of the algorithm.
Looking for publications? You might want to consider searching on the EPFL Infoscience site which provides advanced publication search capabilities.
Population-based learning techniques have been proven to be effective in dealing with noise and are thus promising tools for the optimization of robotic controllers, which have inherently noisy performance evaluations. This article discusses how the results and guidelines derived from tests on benchmark functions can be extended to the fitness distributions encountered in robotic learning. We show that the large-amplitude noise found in robotic evaluations is disruptive to the initial phases of the learning process of PSO. Under these conditions, neither increasing the population size nor increasing the number of iterations are efficient strategies to improve the performance of the learning. We also show that PSO is more sensitive to good spurious evaluations of bad solutions than bad evaluations of good solutions, i.e., there is a non-symmetric effect of noise on the performance of the learning.
In this paper we study the automatic synthesis of robotic controllers for the coordinated movement of multiple mobile robots. The algorithm used to learn the controllers is a noise-resistant version of Particle Swarm Optimization, which is applied in two different settings: centralized and distributed learning. In centralized learning, every robot runs the same controller and the performance is evaluated with a global metric. In the distributed learning, robots run different controllers and the performance is evaluated independently on each robot with a local metric. Our results from learning in simulation show that it is possible to learn a cooperative task in a fully distributed way employing a local metric, and we validate the simulations with real robot experiments where the best solutions from distributed and centralized learning achieve similar performances.
Evaluative techniques offer a tremendous potential for on-line controller design. However, when the optimization space is large and the performance metric is noisy, the time needed to properly evaluate candidate solutions becomes prohibitively large and, as a consequence, the overall adaptation process becomes extremely time consuming. Distributing the adaptation process reduces the required time and increases robustness to failure of individual agents. In this paper, we analyze the role of the four algorithmic parameters that determine the total evaluation time in a distributed implementation of a Particle Swarm Optimization algorithm. For a multi-robot obstacle avoidance case study, we explore in simulation the lower boundaries of these parameters with the goal of reducing the total evaluation time so that it is feasible to implement the adaptation process within a limited amount of time determined by the robots’ energy autonomy. We show that each parameter has a different impact on the final fitness and propose some guidelines for choosing these parameters for real robot implementations.
Evaluative techniques offer a tremendous potential for online controller design. However, when the optimization space is large and the performance metric is noisy, the overall adaptation process becomes extremely time consuming. Distributing the adaptation process reduces the required time and increases robustness to failure of individual agents. In this paper, we analyze the role of the four algorithmic parameters that determine the total evaluation time in a distributed implementation of a Particle Swarm Optimization (PSO) algorithm. For an obstacle avoidance case study using up to eight robots, we explore in simulation the lower boundaries of these parameters and propose a set of empirical guidelines for choosing their values. We then apply these guidelines to a real robot implementation and show that it is feasible to optimize 24 control parameters per robot within 2 h, a limited amount of time determined by the robots’ battery life. We also show that a hybrid simulate-and-transfer approach coupled with a noise-resistant PSO algorithm can be used to further reduce experimental time as compared to a pure real-robot implementation.
The ability to move in complex environments is a fundamental requirement for robots to be a part of our daily lives. Increasing the controller complexity may be a desirable choice in order to obtain an improved performance. However, these two aspects may pose a considerable challenge on the optimization of robotic controllers. In this paper, we study the trade-offs between the complexity of reactive controllers and the complexity of the environment in the optimization of multi-robot obstacle avoidance for resource-constrained platforms. The optimization is carried out in simulation using a distributed, noise-resistant implementation of Particle Swarm Optimization, and the resulting controllers are evaluated both in simulation and with real robots. We show that in a simple environment, linear controllers with only two parameters perform similarly to more complex non-linear controllers with up to twenty parameters, even though the latter ones require more evaluation time to be learned. In a more complicated environment, we show that there is an increase in performance when the controllers can differentiate between front and backwards sensors, but increasing further the number of sensors and adding non-linear activation functions provide no further benefit. In both environments, augmenting reactive control laws with simple memory capabilities causes the highest increase in performance. We also show that in the complex environment the performance measurements are noisier, the optimal parameter region is smaller, and more iterations are required for the optimization process to converge.