Site Search  

Research » LearningControl

The "learning" component in perception-action-learning system is a hallmark of autonomous systems. It is really the ability to learn and to adapt to new tasks and a changing environment that makes human and non-human animals so superior to any artifact we have created so far. The Autonomous Motion Department has a specific focus on learning for control.

First of all, it is worthwhile highlighting why learning control deserves a special status in the world of machine learning and statistical learning. The majority of learning projects in machine learning operate out of a specialized setting, i.e., someone provides one or multiple datasets, and the goal of learning is to discover some structure in this data, e.g., by clustering, density estimation, classification, or regression. The computational cost of extracting the structure is often of lower importance, as long as the problem remains computable within a reasonable time and potentially large computational resouces like multi-node supercomputers.

In learning control, several different issues arise. First of all, movement systems need to generate their own data. Thus, there is often an interesting trade-off between exploiting of what has been learned so far, and trying to explore new parts of the world -- this problem is known as the exporation-exploitation tradeoff.

Second, the movement system never stops creating new data. It is therefore important to have learning systems that can continue learning forever, that are able to incorporate new data at a sufficiently fast time-scale, and that are able to grow their representational power as a function of the complexity of the data experienced.

A third problem is that movement systems have a rather high dimensional state, resulting from many sensors and sensory modalities and a large number of degrees of freedom for movement. Thus, learning in hundreds, thousands, or even hundred thousands of dimension is not an unreasonable request. Besides the complexity in learning from data sets with a huge number of dimensions, it also needs to be pointed out that many of these dimensions carry no or only redundant information for the task to be learned. Detecting such redundancy and irrelevancy, and exploiting it for the task goal, are other interesting and complex requirements for learning control.

At last, accomplishing learning with maximal computational and data efficiency is of great importance. Computational efficiency refers to the amount of computation that is required to add a new data point to the learning system. Data efficiency is concerned to how much information is "squeezed out" of every data point -- e.g., a gradient descent update is usually quite inferior to a Newton update. Given that movement systems have to compute largely with on-board computers, computing resources are limited.

As a final point it should be pointed out that learning control has an additional issue: wrong decisions can lead to physical harm to the environment and the movement system itself. Thus, robustness, stability, and confidence of actions taken is of great importance. Ideally, proofs of stability, convergence, and boundedness are of great importance.

The "Learning Control" group in the Autonomous Motion Department has its focus on developing learning algorithms for control that can work in the above scenarios, ideally in a complete black box fashion, without the need to tune any open parameters and with convergence guarantees. Incremental learning with growing representational structure has been one of our main research thrusts, particularly in probabilistic setting of learning function approximators with local linear models. A second important topic is Reinforcement Learning, i.e., how to improve movement execution from trial-and-error learning. A novel upcoming topic addresses how to learn new feedback controllers from a large number of sensory feedback signals or feature vectors that are derived from sensory feedback. Often, we prefer algorithms that have analytical inference equations but are only approximation of the best possible inference, rather than optimal inference that may require massive computation or sampling.

Designed by: Nerses Ohanyan & Jan Peters
Page last modified on March 18, 2014, at 04:17 PM