主要组件

Boosting

void GBDT::Init(const Config* gbdt_config, const Dataset* train_data,const ObjectiveFunction* objective_function,const std::vector<const Metric*>& training_metrics) override

初始化，主要是创建样本采样策略data_sample_strategy_,设置目标函数objective_function_,创建tree_learner_,创建train_score_updater_，配置training_metrics_

void GBDT::Train(int snapshot_freq, const std::string& model_output_path) override

训练处理

bool GBDT::TrainOneIter(const score_t* gradients, const score_t* hessians) override

单次迭代训练

void GBDT::Boosting()

计算梯度和海森矩阵

void UpdateScore(const Tree* tree, const int cur_tree_id)

树训练完后更新评分

TreeLearner

目标函数ObjectiveFunciton

二分类对数损失

一般定义为
$\cdot f(x)))$
其中 $y$ 是标签，取值为 ${−1,1}\left \{-1,1\right \}$ ， $f (x)$ 是模型输出的分数，令 $z=y⋅f(x)z=y\cdot f(x)$ ，则损失函数为 $L = l o g (1 + e x p (- z))$
对 $z$ 求导有 $∂L∂z=−exp(−z)1+exp(−z)=−11+exp(z)\frac{\partial L}{\partial z} = \frac{-exp(-z)}{1+exp(-z)} = -\frac{1}{1+exp(z)}$ ,所以对 $f (x)$ 求导有
$∂L∂f(x)=∂L∂z⋅∂z∂f(x)=−y1+exp(y⋅f(x))\frac{\partial L}{\partial f(x)} = \frac{\partial L}{\partial z} \cdot \frac{\partial z}{\partial f(x)} = - \frac{y}{1+exp(y \cdot f(x))}$
在BinaryLogloss中对损失函数添加了缩放因子sigmoid_，即
$\cdot \sigma \cdot f(x)))$
对 $f (x)$ 求导
$∂L∂f(x)=−y⋅σ1+exp(y⋅σ⋅f(x))\frac{\partial L}{\partial f(x)}=-\frac{y \cdot \sigma}{1+exp(y\cdot \sigma \cdot f(x))}$
BinaryLogloss在计算梯度时添加了样本权重weights_[i]和标签权重label_weight

const double response = -label * sigmoid_ / (1.0f + std::exp(label * sigmoid_ * score[i]));
gradients[i] = static_cast<score_t>(response * label_weight * weights_[i]);