简介
梯度提升主要是基于数学最值问题
数学描述
目标函数为
obj(θ)=∑i=1nl(yi,y^i(t))+∑k=1tw(fk)obj(\theta) = \sum_{i=1}^n l(y_i, \hat y_i^{(t)}) + \sum_{k=1}^t w(f_k)obj(θ)=i=1∑nl(yi,y^i(t))+k=1∑tw(fk)
其中ttt表示集成的树的个数,y^i(t)=y^i(t−1)+ft(xi)\hat y_i^{(t)} = \hat y_i^{(t - 1)} + f_t(x_i)y^i(t)=y^i(t−1)+ft(xi)
在集成第ttt个树时,目标函数表示为
obj(t)=∑i=1nl(yi,y^i(t))+∑k=1tw(fk)=∑i=1nl(yi,y^i(t−1)+ft(xi))+w(ft)+constant
\begin{align} obj^{(t)} &= \sum_{i=1}^n l(y_i, \hat y_i^{(t)}) + \sum_{k=1}^t w(f_k) \\ &= \sum_{i=1}^n l(y_i, \hat y_i^{(t - 1)} + f_t(x_i)) + w(f_t) + constant
\end{align}
obj(t)=i=1∑nl(yi,y^i(t))+k=1∑tw(fk)=i=1∑nl(yi,y^i(t−1)+ft(xi))+w(ft)+constant
对l(yi,y^i(t−1)+ft(xi))l(y_i, \hat y_i^{(t - 1)} + f_t(x_i))l(yi,y^i(t−1)+ft(xi))泰勒级数展开为
l(yi,y^i(t−1)+ft(xi))=l(yi,y^i(t−1))+gift(xi)+12hift2(xi)l(y_i, \hat y_i^{(t - 1)} + f_t(x_i)) = l(y_i, \hat y_i^{(t - 1)}) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)l(yi,y^i(t−1)+ft(xi))=l(yi,y^i(t−1))+gift(xi)+21hift2(xi)其中gi=∂y^i(t−1)l(yi,y^i(t−1)),hi=∂y^i(t−1)2l(yi,y^i(t−1))g_i=\partial_{\hat y_i^{(t - 1)}} l(y_i, \hat y_i^{(t - 1)}), h_i=\partial_{\hat y_i^{(t - 1)}}^2 l(y_i, \hat y_i^{(t - 1)})gi=∂y^i(t−1)l(yi,y^i(t−1)),hi=∂y^i(t−1)2l(yi,y^i(t−1))所以替换后,删除常量后有
obj(t)=∑i=1n[gift(xi)+12hift2(xi)]+w(ft)obj^{(t)} =\sum_{i=1}^n \left[ g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)\right ] + w(f_t) obj(t)=i=1∑n[gift(xi)+21hift2(xi)]+w(ft)