[优化]Levenberg-Marquardt 最小二乘优化

LM(Levenberg-Marquardt)算法属于信赖域法,将变量行走的长度 h 控制在一定的信赖域之内,保证泰勒展开有很好的近似效果。
LM算法使用了一种带阻尼的高斯-牛顿方法。

1.理论

最小二乘问题

x=arg\min_{x} {F(x)}=arg\min_{x} \frac{1}{2} \sum_{i=1}^N\parallel f(x) \parallel ^2 \\ F(x)=\frac{1}{2} \sum_{i=1}^N\parallel f(x) \parallel ^2=\frac{1}{2}\parallel\mathbf{f}(x)\parallel^2=\frac{1}{2}\mathbf{f}(x)^T\mathbf{f}(x)

\mathbf{f} 一阶泰勒展开:

\mathbf{f}(x+h)=\mathbf{f}(x)+\mathbf{J}(x) h + O(h^Th) \\

去掉高阶项,带入到 F :

F(x+h)\approx L(h)=\frac{1}{2}\mathbf{f}^T\mathbf{f}+h^T\mathbf{J}\mathbf{f}+\frac{1}{2}h^T\mathbf{J}^T\mathbf{J}h \\

阻尼法的的思想是再加入一个阻尼项  \frac{1}{2}\mu h^Th

h=arg\min_h {\mathbf{G}(h)} = arg\min_h \frac{1}{2}\mathbf{f}^T\mathbf{f}+h^T\mathbf{J}^T\mathbf{f}+\frac{1}{2}h^T\mathbf{J}^T\mathbf{J}h + \frac{1}{2}\mu h^Th\\

对上式求偏导数,并令为0.


\mathbf{J}^Tf+\mathbf{J}^T\mathbf{J}h+\mu h= 0 \\ (\mathbf{J}^T\mathbf{J}+\mu \mathbf{I})h=-\mathbf{J}^Tf\\ (\mathbf{H}+\mu \mathbf{I} )h=-g


阻尼参数 \mu 的作用有

1. 对与 \mu>0(\mathbf{H}+\mu \mathbf{I} ) 正定,保证了 h 是梯度下降的方向。
2. 当 \mu 较大时: h\approx-\frac{1}{\mu}g=-\frac{1}{\mu}F'(x) ,其实就是梯度、最速下降法,当离最终结果较远的时候,很好。
3. 当 \mu 较小时,方法接近与高斯牛顿,当离最终结果很近时,可以获得二次收敛速度,很好。

看来,\mu 的选取很重要。初始时,取

\mathbf{A}_0=\mathbf{J}(x_0)^T\mathbf{J}(x_0)\\ \mu_0=\tau \cdot \max_i\{a_{ii}^0\}

其他情况,利用cost增益来确定:

\varrho=\frac{F(x)-f(x+h)}{ L(0)-L(h)} \\

迭代终止条件

1.一阶导数为0: F'(x)=g(x)=0,使用 \parallel g(x) \parallel < \varepsilon_1 ,  \varepsilon_1 是设定的终止条件;
2.x变化步距离足够小, \parallel h\parallel=\parallel x_{new}-x \parallel < \varepsilon_2( \parallel x \parallel + \varepsilon_2 )
3.超过最大迭代次数。

LM算法的步骤为

begin
k:=0,\ \nu:=2;\ x:=x_0
H:=J^TJ;\ g:=J^Tf
found=(\parallel g \parallel _\infty < \varepsilon_1);\ \mu:=\tau\cdot max\{a_{ii}\}
while(not found) and k < kmax
k:= k +1; Solve(H+\mu I) h = -g
if( \parallel h\parallel< \varepsilon_2( \parallel x \parallel + \varepsilon_2 )
found = true;
else
x_{new}:= x+h
\varrho=\frac{F(x)-f(x+h)}{ L(0)-L(h)}
if( \varrho>0 ) {判断能不能接收这一步}
x:=x_{new}
H=J^TJ; \ g:=J^Tf
found = (\parallel g \parallel_\infty < \varepsilon_1)
\mu:=\mu \cdot max\{\frac13, 1-(2\varrho-1)^3 \};\ \nu=2
else
\mu:=\mu \cdot \nu; \nu = 2 \cdot \nu

end

2. 算法实现

问题:(高斯牛顿同款问题)非线性方程: y=exp(ax^2+bx+c) ,给定 N 组观测数据 \{x_i,y_i\} ,求系数 X=[ a,b,c ]^T .

分析:令 f(X)=y-exp(ax^2+bx+c) ,N组数据可以组成一个大的非线性方程组

F(X)=\left[\begin{array}[c]{c} y_1-exp(ax_1^2+bx_1+c)\\ \dots\\ y_N-exp(ax_N^2+bx_N+c)\end{array} \right]

我们可以构建一个最小二乘问题:

x = \mathrm{arg}\min_{x}\frac{1}{2}\parallel F(X) \parallel^2 .

要求解这个问题,根据推导部分可知,需要求解雅克比。

J(X)=\left[\begin{matrix} -x_1^2exp(ax_1^2+bx_1+c) & -x_1exp(ax_1^2+bx_1+c) &-exp(ax_1^2+bx_1+c) \\ \dots & \dots & \dots \\ -x_N^2exp(ax_N^2+bx_N+c) & -x_Nexp(ax_N^2+bx_N+c) &-exp(ax_N^2+bx_N+c) \end{matrix} \right]

使用推导部分所述的步骤就可以进行解算。代码实现:

ydsf16/LevenbergMarquardtgithub.com图标
/**
 * This file is part of LevenbergMarquardt Solver.
 *
 * Copyright (C) 2018-2020 Dongsheng Yang <ydsf16@buaa.edu.cn> (Beihang University)
 * For more information see <https://github.com/ydsf16/LevenbergMarquardt>
 *
 * LevenbergMarquardt is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * LevenbergMarquardt is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with LevenbergMarquardt. If not, see <http://www.gnu.org/licenses/>.
 */

#include <iostream>
#include <eigen3/Eigen/Core>
#include <eigen3/Eigen/Dense>
#include <opencv2/opencv.hpp>
#include <eigen3/Eigen/Cholesky>
#include <chrono>

/* 计时类 */
class Runtimer{
public:
    inline void start()
    {
        t_s_  = std::chrono::steady_clock::now();
    }
    
    inline void stop()
    {
        t_e_ = std::chrono::steady_clock::now();
    }
    
    inline double duration()
    {
        return std::chrono::duration_cast<std::chrono::duration<double>>(t_e_ - t_s_).count() * 1000.0;
    }
    
private:
    std::chrono::steady_clock::time_point t_s_; //start time ponit
    std::chrono::steady_clock::time_point t_e_; //stop time point
};

/*  优化方程 */
class LevenbergMarquardt{
public:
    LevenbergMarquardt(double* a, double* b, double* c):
    a_(a), b_(b), c_(c)
    {
        epsilon_1_ = 1e-6;
        epsilon_2_ = 1e-6;
        max_iter_ = 50;
        is_out_ = true;
    }
    
    void setParameters(double epsilon_1, double epsilon_2, int max_iter, bool is_out)
    {
        epsilon_1_ = epsilon_1;
        epsilon_2_ = epsilon_2;
        max_iter_ = max_iter;
        is_out_ = is_out;
    }
    
    void addObservation(const double& x, const double& y)
    {
        obs_.push_back(Eigen::Vector2d(x, y));
    }
    
    void calcJ_fx()
    {
        J_ .resize(obs_.size(), 3);
        fx_.resize(obs_.size(), 1);
        
        for ( size_t i = 0; i < obs_.size(); i ++)
        {
            const Eigen::Vector2d& ob = obs_.at(i);
            const double& x = ob(0);
            const double& y = ob(1);
            double j1 = -x*x*exp(*a_ * x*x + *b_*x + *c_);
            double j2 = -x*exp(*a_ * x*x + *b_*x + *c_);
            double j3 = -exp(*a_ * x*x + *b_*x + *c_);
            J_(i, 0 ) = j1;
            J_(i, 1) = j2;
            J_(i, 2) = j3;
            fx_(i, 0) = y - exp( *a_ *x*x + *b_*x +*c_);
        }
    }
    
    void calcH_g()
    {
        H_ = J_.transpose() * J_;
        g_ = -J_.transpose() * fx_;
    }
        
    double getCost()
    {
        Eigen::MatrixXd cost= fx_.transpose() * fx_;
        return cost(0,0);
    }
    
    double F(double a, double b, double c)
    {
        Eigen::MatrixXd fx;
        fx.resize(obs_.size(), 1);
        
        for ( size_t i = 0; i < obs_.size(); i ++)
        {
            const Eigen::Vector2d& ob = obs_.at(i);
            const double& x = ob(0);
            const double& y = ob(1);
            fx(i, 0) = y - exp( a *x*x + b*x +c);
        }
        Eigen::MatrixXd F = 0.5 * fx.transpose() * fx;
        return F(0,0);
    }
    
    double L0_L( Eigen::Vector3d& h)
    {
           Eigen::MatrixXd L = -h.transpose() * J_.transpose() * fx_ - 0.5 * h.transpose() * J_.transpose() * J_ * h;
           return L(0,0);
    }

    void solve()
    {
        int k = 0;
        double nu = 2.0;
        calcJ_fx();
        calcH_g();
        bool found = ( g_.lpNorm<Eigen::Infinity>() < epsilon_1_ );
        
        std::vector<double> A;
        A.push_back( H_(0, 0) );
        A.push_back( H_(1, 1) );
        A.push_back( H_(2,2) );
        auto max_p = std::max_element(A.begin(), A.end());
        double mu = *max_p;
        
        double sumt =0;

        while ( !found && k < max_iter_)
        {
            Runtimer t;
            t.start();
            
            k = k +1;
            Eigen::Matrix3d G = H_ + mu * Eigen::Matrix3d::Identity();
            Eigen::Vector3d h = G.ldlt().solve(g_);
            
            if( h.norm() <= epsilon_2_ * ( sqrt(*a_**a_ + *b_**b_ + *c_**c_ ) +epsilon_2_ ) )
                found = true;
            else
            {
                double na = *a_ + h(0);
                double nb = *b_ + h(1);
                double nc = *c_ + h(2);
                
                double rho =( F(*a_, *b_, *c_) - F(na, nb, nc) )  / L0_L(h);

                if( rho > 0)
                {
                    *a_ = na;
                    *b_ = nb;
                    *c_ = nc;
                    calcJ_fx();
                    calcH_g();
                                      
                    found = ( g_.lpNorm<Eigen::Infinity>() < epsilon_1_ );
                    mu = mu * std::max<double>(0.33, 1 - std::pow(2*rho -1, 3));
                    nu = 2.0;
                }
                else
                {
                    mu = mu * nu; 
                    nu = 2*nu;
                }// if rho > 0
            }// if step is too small
            
            t.stop();
            if( is_out_ )
            {
                std::cout << "Iter: " << std::left <<std::setw(3) << k << " Result: "<< std::left <<std::setw(10)  << *a_ << " " << std::left <<std::setw(10)  << *b_ << " " << std::left <<std::setw(10) << *c_ << 
                " step: " << std::left <<std::setw(14) << h.norm() << " cost: "<< std::left <<std::setw(14)  << getCost() << " time: " << std::left <<std::setw(14) << t.duration()  <<
                " total_time: "<< std::left <<std::setw(14) << (sumt += t.duration()) << std::endl;
            }   
        } // while
        
        if( found  == true)
            std::cout << "\nConverged\n\n";
        else
            std::cout << "\nDiverged\n\n";
        
    }//function 
    
    
    
    Eigen::MatrixXd fx_; 
    Eigen::MatrixXd J_; // 雅克比矩阵
    Eigen::Matrix3d H_; // H矩阵
    Eigen::Vector3d g_;
    
    std::vector< Eigen::Vector2d> obs_; // 观测
   
   /* 要求的三个参数 */
   double* a_, *b_, *c_;
    
    /* parameters */
    double epsilon_1_, epsilon_2_;
    int max_iter_;
    bool is_out_;
};//class LevenbergMarquardt
int main(int argc, char **argv) {
    const double aa = 0.1, bb = 0.5, cc = 2; // 实际方程的参数
    double a =0.0, b=0.0, c=0.0; // 初值
    
    /* 构造问题 */
    LevenbergMarquardt lm(&a, &b, &c);
    lm.setParameters(1e-10, 1e-10, 100, true);
    
    /* 制造数据 */
    const size_t N = 100; //数据个数
    cv::RNG rng(cv::getTickCount());
    for( size_t i = 0; i < N; i ++)
    {
        /* 生产带有高斯噪声的数据 */
        double x = rng.uniform(0.0, 1.0) ;
        double y = exp(aa*x*x + bb*x + cc) + rng.gaussian(0.05);
        
        /* 添加到观测中 */
        lm.addObservation(x, y);
    }
    /* 用LM法求解 */
    lm.solve();
    
    return 0;
}

编辑于 2018-08-20

文章被以下专栏收录

    本专栏整理、汇聚SLAM各类关键技术、算法和个人感悟,欢迎各位同仁、大佬投稿。 相关文章,若有问题,欢迎指正。相关代码请见:https://github.com/ydsf16