Continuous Inverse Optimal Control With Locally Optimal Examples
Brief paper
Continuous-time inverse quadratic optimal control problem☆Abstract
In this paper, the problem of finite horizon inverse optimal control (IOC) is investigated, where the quadratic cost function of a dynamic process is required to be recovered based on the observation of optimal control sequences. We propose the first complete result of the necessary and sufficient condition for the existence of corresponding standard linear quadratic (LQ) cost functions. Under feasible cases, the analytic expression of the whole solution space is derived and the equivalence of weighting matrices in LQ problems is discussed. For infeasible problems, an infinite dimensional convex problem is formulated to obtain a best-fit approximate solution with minimal control residual. And the optimality condition is solved under a static quadratic programming framework to facilitate the computation. Finally, numerical simulations are used to demonstrate the effectiveness and feasibility of the proposed methods.
Introduction
The optimality principle has been investigated since a long time ago as an important tool to analyze natural phenomena, such as Fermat's law in optics and Lagrange dynamics in mechanics (Pauwels, Henrion, & Lasserre, 2016). It is a general hypothesis that natural processes are generated based on some optimal criteria, which leads to the promising topic of inverse optimization (Aswani, Shen, & Siddiq, 2018). Such problems arise in numerous varieties in the fields of control theory, machine learning, and game theory (Hadfield-Menell et al., 2016, Konstantakopoulos et al., 2017, Pavan et al., 2014).
In this paper we mainly focus on the problem of inverse optimal control, namely how to recover the optimization criterion of a dynamic system from the observation of its optimal policies. Such estimation could then help us develop a better understanding of the physical system and makes it possible to predict future decisions or reproduce a similar optimal controller in other applications. In recent years, the problem of inverse optimal control has regained increasing popularity in the fields of robotics, bionics, economics and operations (Chittaro et al., 2013, Finn et al., 2016, Mombaur et al., 2010). For example, inverse optimal control is a promising tool to investigate the mechanisms underlying the human locomotion and to implement it in designing humanoid robots (Berret et al., 2011, Berret and Jean, 2016, Mainprice et al., 2016). Other applications can also be found in problems such as environmental amenities valuation (Zhou, 2017) and market-bidding modeling (Saez-Gallego, Morales, Zugno, & Madsen, 2016).
The problem of reconstructing cost functions has been investigated intensively. Among the existing literature, one well-studied direction is to treat it as a parameter identification problem, where quite a few numerical results have been developed. In most works the cost function is usually assumed to be a linear combination of certain basic functions, with the weights to be identified. On one hand, in some papers like Berret et al. (2011) and Mombaur et al. (2010), the problem is solved in a bi-level hierarchical framework and learning methods are utilized. But a forward optimal control problem has to be solved repeatedly in each inner loop to test optimality of a candidate cost function, which would lead to a computational bottleneck. On the other hand, in Hatz et al., 2012, Johnson et al., 2013, Keshavarz et al., 2011 and Pauwels et al. (2016), the problem structure is better exploited and the optimal control model is characterized by its optimality conditions. Then the problem is reformulated as a residual optimization problem, where the inner loop forward optimal control problem is replaced by a set of constraints based on Karush–Kuhn–Tucker conditions or Hamilton–Jacobi–Bellman equations.
This paper is restricted to inverse linear quadratic (LQ) problems since the LQ cost is not only well defined, but also widely used for application purposes. Furthermore, such structure makes it possible to investigate the well-posedness of inverse problems and to characterize its solution space analytically. Consequently positive semi-definiteness on the concerned matrices needs to be imposed, which in fact makes the inverse problem harder since this implies that the weighting matrices in the quadratic cost functions must be estimated in semi-definite cones, rather than in as in other methods.
The infinite-time case has been well-studied to recover a pair of positive definite matrices from a constant feedback matrix . Firstly Kalman studied the single-input case in frequency domain with the return difference condition, which is then extended to the multi-input case by Anderson and Moore (1989). In time domain based on the study of matrix equations, Jameson and Kreindler (1973) give a necessary and sufficient condition for the existence of a symmetric matrix solution. However, the obtained weighting matrix cannot be guaranteed to be constant and nonnegative. From then on, the results of Anderson and Jameson are improved to derive various results for the existence problem, such as Fujii (1987) and Sugimoto and Yamamoto (1987). Then in recent years, the tool of Linear Matrix Inequality (LMI) and convex optimization are used in Boyd, El Ghaoui, Feron, and Balakrishnan (1994) and Priess, Conway, Choi, Popovich, and Radcliffe (2015) to estimate positive definite matrices.
However, the inverse LQ problem in finite time is still an open problem. The difficulty lies in that the feedback matrix is time-varying and a differential Riccati equation (DRE) is involved. To the best of our knowledge, there exist only a few partial results related to this problem. Based on parametrization of the solution to DRE (see Ferrante, Marro, & Ntogramatzidis, 2005), Nori and Frezza (2004) propose a systematic method to reconstruct a canonical cost with the relaxation of adding a non-zero cross-term. Then Jean and Maslovskaya (2018) make a step forward to investigate the uniqueness of the canonical class. But under this framework, a non-trivial mixed identification–estimation problem has to be solved first for identifiability. And how to recover a standard LQ cost without cross-terms still remains a problem.
Recall that the motivation for inverse optimal control is to explain the optimality behind a dynamic process. Hence the recovered optimality criterion is expected to be associated with some practical meaning, such as minimal energy consumption. This is why cross-terms are always excluded in forward LQ problems. With this motivation, in this paper we focus on the standard finite-time inverse LQ problem without cross-terms. The main contribution of this paper is three-folded:
- (a)
-
To the best of our knowledge, our result is the first attempt for a complete result concerning the well-posedness of the standard finite-time inverse LQ problem. The necessary and sufficient condition for the existence of corresponding LQ cost functions is given by a set of LMI conditions. For feasible cases, the analytic expression of the whole solution space is derived and its uniqueness is analyzed, which also sheds new light on explaining the equivalence of weighting matrices in forward LQ problems.
- (b)
-
In infeasible cases, a best-fit approximate cost function is computed through a well-posed infinite dimensional convex problem by minimizing the control residual. The optimality condition is derived in the form of a matrix boundary value problem (BVP) constrained in positive semi-definite cones, which is solved by transforming into a static quadratic programming problem.
- (c)
-
Our results provide a more general perspective in the sense that the canonical form with a non-zero cross-term in existing works could also be obtained by our method, but not vice versa.
Section snippets
Notations and mathematical preliminaries
In this paper, we denote and as the space of dimensional column vectors and dimensional matrices respectively. For any two matrices and , means is positive semi-definite. We use and to denote the space of continuous functions and normalized bounded variations over respectively. For some special matrix spaces in , we denote as the space of Hermitian matrices, as the positive semi-definite cone. Both of them are Hilbert spaces, on which the
Problem formulations
Consider the standard finite time LQ problem: where , , and .
Here we make the standard assumption on the system that is controllable, has full column rank. For the forward problem, there exists a unique optimal feedback control that minimizes the quadratic cost function: where is the positive semi-definite solution to DRE: Then the inverse optimal control problem is formulated as
Exact solution to the inverse problem
In this section the analytic solutions to Problem 2 is investigated. For any feasible solution , there exists a unique solution , whose expression can be computed explicitly. Then the existence problem is equivalent to the feasibility of a LMI problem. Furthermore, for feasible problems, the structure of the solution space is analyzed and an optimal solution can be obtained through semi-definite programming (SDP).
Approximate solution for infeasible cases
In this section we consider infeasible cases of Problem 2 where exact solutions do not exist. For example, the optimal controller might be observed from noisy experimental data. Following our previous paper (see Li, Zhang, Yao and Hu, 2018), a convex programming problem is formulated to obtain an optimal approximation of the cost function by minimizing the control residual.
Problem 4
The optimal approximate cost function is obtained by solving the following convex optimization problem on and
Simulation results
In this section, numerical simulations are given to illustrate the proposed methods for solving the inverse LQ problem for both feasible and infeasible cases.
Conclusions
In this paper, the identifiability and solutions of the inverse LQ problem are investigated respectively. The necessary and sufficient condition for the existence of corresponding LQ cost functions is given in the form of LMI conditions. For feasible cases, the whole solution space is shown to be a closed and bounded convex set, which is the intersection of an affine manifold and the positive semi-definite cone. A sufficient condition for a unique cost function is also proposed. For infeasible
Yibei Li received her B.E. degree in Automation from Harbin Institute of Technology, China, in 2015. She received her licentiate degree in Applied and Computational Mathematics from KTH Royal Institute of Technology, Sweden, in 2019. She currently is a Ph.D. student at the Division of Optimization and Systems Theory, Department of Mathematics, KTH. Her research interests include inverse optimization, nonlinear control and multi-agent systems.
References (29)
-
Valuing environmental amenities through inverse optimization: Theory and case study
Journal of Environmental Economics and Management
(2017)
- et al.
A parametrization of the solutions of the finite–horizon LQ problem with general cost and boundary conditions
Automatica
(2005)
- et al.
Optimal control: Linear quadratic methods
(1989)
- et al.
Inverse optimization with noisy data
Operations Research
(2018)
- et al.
Evidence for composite cost functions in arm movement planning: an inverse optimal control approach
PLoS Computational Biology
(2011)
- et al.
Why don't we move slower? The value of time in the neural control of action
Journal of Neuroscience
(2016)
- et al.
Linear matrix inequalities in system and control theory, Vol. 15
(1994)
- et al.
Numerical mathematics and computing
(2012)
- et al.
On inverse optimal control problems of human locomotion: Stability and robustness of the minimizers
Journal of Mathematical Sciences
(2013)
- et al.
Guided cost learning: Deep inverse optimal control via policy optimization
A new approach to the LQ design from the viewpoint of the inverse regulator problem
IEEE Transactions on Automatic Control
(1987)
Cooperative inverse reinforcement learning
Estimating parameters in optimal control problems
SIAM Journal on Scientific Computing
(2012)
Inverse problem of linear optimal control
SIAM Journal on Control
(1973)
Cited by (12)
Recommended articles (6)
Yibei Li received her B.E. degree in Automation from Harbin Institute of Technology, China, in 2015. She received her licentiate degree in Applied and Computational Mathematics from KTH Royal Institute of Technology, Sweden, in 2019. She currently is a Ph.D. student at the Division of Optimization and Systems Theory, Department of Mathematics, KTH. Her research interests include inverse optimization, nonlinear control and multi-agent systems.
Yu Yao received his B.Sc., M.Sc. and Ph.D. degrees in Automatic Control, 1983, 1986 and 1990, respectively, all from Harbin Institute of Technology, China. He is currently a professor in School of Astronautics in Harbin Institute of Technology, China. His research interests include robust control, nonlinear systems and flight control.
Xiaoming Hu received the B.S. degree from the University of Science and Technology of China in 1983, and the M.S. and Ph.D. degrees from the Arizona State University in 1986 and 1989 respectively. He served as a research assistant at the Institute of Automation, the Chinese Academy of Sciences, from 1983 to 1984. From 1989 to 1990 he was a Gustafsson Postdoctoral Fellow at KTH Royal Institute of Technology, Stockholm, where he is currently a professor of Optimization and Systems Theory. His main research interests are in nonlinear control systems, nonlinear observer design, sensing and active perception, motion planning, control of multi-agent systems, and mobile manipulation.
© 2020 Elsevier Ltd. All rights reserved.
Source: https://www.sciencedirect.com/science/article/pii/S0005109820301758
0 Response to "Continuous Inverse Optimal Control With Locally Optimal Examples"
Publicar un comentario