支持向量机系列(5)——SMO算法解对偶问题

邵德鑫邵德鑫

对偶问题:

\min\limits_{\alpha} \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}y_iy_j\alpha_i\alpha_j(x_i \cdot x_j)-\sum_{i=1}^{N}\alpha_i

s.t. \sum_{i=1}^{N}\alpha_iy_i=0 and 0\leq\alpha_i\leq C

K_{ij}=(x_i \cdot x_j)v_i=\sum_{j=3}^{N}y_j\alpha_jK_{ij}

SMO算法,每次选取两个α,记为α1和α2,来优化,固定其他的α,也就是说 \alpha_1y_1+\alpha_2y_2=k ,k是个常数,y1和y2取值{-1,1},于是有 y_1^2=y_2^2=1 ,则 \alpha_1=ky_1-\alpha_2y_1y_2

\min\limits_{\alpha} \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}y_iy_j\alpha_i\alpha_j(x_i \cdot x_j)-\sum_{i=1}^{N}\alpha_i

=\min\limits_{\alpha} \frac{1}{2}\sum_{i=3}^{N}\sum_{j=3}^{N}y_iy_j\alpha_i\alpha_jK_{ij}-\sum_{i=3}^{N}\alpha_i+ \frac{1}{2}(y_1^2\alpha_1^2K_{11}+y_2^2\alpha_2^2K_{22}+2y_1y_2\alpha_1\alpha_2K_{12}+ 2y_1\alpha_1v_1+2y_2\alpha_2v_2)-(\alpha_1+\alpha_2) =\min\limits_{\alpha_1,\alpha_2} \frac{1}{2}(\alpha_1^2K_{11}+\alpha_2^2K_{22}+2y_1y_2\alpha_1\alpha_2K_{12}+

\alpha_1=ky_1-\alpha_2y_1y_2

\alpha_1^2=(k-\alpha_2y_2)^2=k^2-2k\alpha_2y_2+\alpha_2^2 ,

2y_1y_2\alpha_1\alpha_2K_{12}=2y_2(k-\alpha_2y_2)\alpha_2K_{12}=2(ky_2-\alpha_2)\alpha_2K_{12} ,

2y_1\alpha_1v_1=2(k-\alpha_2y_2)v_1

代入原式

\min\limits_{\alpha} \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}y_iy_j\alpha_i\alpha_j(x_i \cdot x_j)-\sum_{i=1}^{N}\alpha_i

=\min\limits_{\alpha_2} \frac{1}{2}((k^2-2k\alpha_2y_2+\alpha_2^2)K_{11}+\alpha_2^2K_{22}+2(ky_2-\alpha_2)\alpha_2K_{12}+

2(k-\alpha_2y_2)v_1+2y_2\alpha_2v_2)-(\alpha_1+\alpha_2)

=\min\limits_{\alpha_2} \frac{1}{2}(K_{11}+K_{22}-2K_{12})\alpha_2^2+

(y_1y_2-1-ky_2K_{11}+ky_2K_{12}-y_2v_1+y_2v_2)\alpha_2+ 0.5k^2K_{11}+kv_1-ky_1

记为 \min\limits_{\alpha_2} W(\alpha_2) ,由上述形式可以看出是凸函数,在 \frac{\partial{W}}{\partial{\alpha_2}}=0 处取极值

\frac{\partial{W}}{\partial{\alpha_2}}=(K_{11}+K_{22}-2K_{12})\alpha_2+y_1y_2-1-ky_2K_{11}+ky_2K_{12}-y_2v_1+y_2v_2

=0

上述表达式中 K_{11}y_1 等都是已知的值, v_1v_2 都是初始化 \alpha_1, \alpha_2 (记为 \alpha_1^{old}, \alpha_2^{old} )后确定的值,解出的 \alpha_2 即为新的值,记为 \alpha_2^{new}

K_{11}+K_{22}-2K_{12}=x_1^2+x_2^2-2x_1x_2=(x_1-x_2)^2 记为 \kappa

g(x_i)=\sum_{j=1}^{N}y_j\alpha_jK_{ij}

v_1=\sum_{j=3}^{N}y_j\alpha_jK_{1j}=g(x_1)-y_1\alpha_1K_{11}-y_2\alpha_2K_{12}

=g(x_1)-y_1(ky_1-\alpha_2y_1y_2)K_{11}-y_2\alpha_2K_{12}

=g(x_1)-(k-\alpha_2y_2)K_{11}-y_2\alpha_2K_{12}

=g(x_1)+y_2\alpha_2(K_{11}-K_{12})-kK_{11}

v_2=\sum_{j=3}^{N}y_j\alpha_jK_{2j}=g(x_2)-y_1\alpha_1K_{21}-y_2\alpha_2K_{22}

=g(x_2)-y_1(ky_1-\alpha_2y_1y_2)K_{21}-y_2\alpha_2K_{22}

=g(x_2)-(k-\alpha_2y_2)K_{21}-y_2\alpha_2K_{22}

=g(x_2)+y_2\alpha_2(K_{21}-K_{22})-kK_{21}

v_1-v_2=g(x_1)-g(x_2)+y_2(K_{11}+K_{22}-2K_{12})\alpha_2-kK_{11}+kK_{12}

\frac{\partial{W}}{\partial{\alpha_2}}=\kappa\alpha_2+y_1y_2-1-ky_2K_{11}+ky_2K_{12}-y_2(v_1-v_2)=0

\kappa\alpha_2^{new}=-y_1y_2+1+ky_2K_{11}-ky_2K_{12}+y_2(v_1-v_2)

y_2\kappa\alpha_2^{new}=-y_1+y_2+kK_{11}-kK_{12}+(v_1-v_2)

=-y_1+y_2+kK_{11}-kK_{12}+g(x_1)-g(x_2)+ y_2(K_{11}+K_{22}-2K_{12})\alpha_2^{old}-kK_{11}+kK_{12}

=(g(x_1)-y_1)-(g(x_2)-y_2)+y_2\kappa\alpha_2^{old}

E_i=g(x_i)-y_i

\alpha_2^{new}=\alpha_2^{old}+\frac{y_2(E_1-E_2)}{\kappa}

2 条评论