# 集成方法

## 集成学习

### 1.名称简介

集成学习：构建并**结合多个学习器**来完成学习任务；

基学习方法/强学习器：由一个现有算法的多个实例集成组合的方法。如n个决策树组成集成算法：随机森林。

基学习器/弱学习器：基学习方法中的个体学习器，如随机森林则的决策树算法；

组件学习器：不同算法共同集成中的个体学习器，如决策树和神经网络.....等多算法集成；

参考：<https://www.cnblogs.com/pinard/p/6131423.html> ![img](https://images2015.cnblogs.com/blog/1042406/201612/1042406-20161204191919974-1029671964.png)

### 2.设计集成原则

应该好而不同，三个臭皮匠也是需要三个不同类型的才行，如这图：

![img](https://img-blog.csdnimg.cn/20190222204512201.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2ppYW5nNDI1Nzc2MDI0,size_16,color_FFFFFF,t_70)

误差下降：

考虑二分类问题：$$y \in {-1,+1}$$和真实函数f，假设$$h\_i$$分类器的错误率都为$$\epsilon$$

$$
P(h\_i(x) \ne f(x))=\epsilon
$$

则用T个基分类器用简单投票法，过半预测正确则集成结果预测正确，集成分类的结果：

$$
H(x)=sign(\sum\_{i=1}^Th\_i(x)) \in {-1,1}
$$

假设基分类错误率相互独立，则集成器的错误率为：

$$
P(h\_i(x) \ne f(x))=\sum\_{k=0}^{|T/2|}C\_T^k(1-\epsilon)\epsilon^{T-k} \le e^{-\frac{1}{2}T(1-2\epsilon)^2}
$$

结果显示**随着基学习器个数T增大，错误率指数下降**；但现实中基学习器是为同一个问题设计训练的，错误率不可能独立，基学习器&#x7684;**“准确性”与“多样性”基本矛盾**不可同时最优，因此，如何设计好的“好而不同”个体学习器，是集成学习的核心。

## 3.集成学习算法分类

根据个体学习器生成方式：

1）个体学习器有**强依赖，必须串行**生成序列方法：Boosting(AdaBoost、GBDT、XGBoost)

2）**不存在强依赖，可同时生成**的并行方法：Bagging和 随机森林（Random Forest）


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://im-qianuxn.gitbook.io/pytorch/ji-suan-ji/ml/ji-cheng.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.