# sklearn实现SVC/SVR scikit-learn SVM算法库封装了libsvm 和 liblinear 的实现，仅仅重写了算法了接口部分。分为两类： * 分类：SVC， NuSVC，和LinearSVC。 * 回归：SVR， NuSVR，和LinearSVR 。相关的类都包裹在sklearn.svm模块中： sklearn官方svm API： ## 分类classification * VC， NuSVC，和LinearSVC * SVC， NuSVC类似，区间是NuSVC多了nu参数： * LinearSVC是线性分类，也就是不支持各种低维到高维的核函数，仅仅支持线性核函数，**对线性不可分的数据不能使用**。 | nu |

LinearSVC 和SVC没有这个参数，LinearSVC 和SVC使用惩罚系数C来控制惩罚力度。

NuSVC有

nu代表训练集训练的错误率的上限，或者说支持向量的百分比下限，取值范围为(0,1]，默认是0.5.它和惩罚系数C类似，都可以控制惩罚的力度。

NuSVC没有这个参数, 它通过另一个参数nu来控制训练集训练的错误率，等价于选择了一个C，让训练集训练后满足一个确定的错误率

| | ----- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- | | 惩罚系数C | SVM分类模型原型形式和对偶形式中的那个惩罚系数C，默认为1，一般需要通过交叉验证来选择一个合适的C。一般来说，如果噪音点较多时，C需要小一些。 | NuSVC没有这个参数, 它通过另一个参数nu来控制训练集训练的错误率，等价于选择了一个C，让训练集训练后满足一个确定的错误率 | ## 回归Regression SVR， NuSVR，和LinearSVR。SVR， NuSVR差不多，区别仅仅在于对损失的度量方式不同，NuSVR有nu参数： | nu | LinearSVR 和SVR没有这个参数，用ϵ控制错误率 | nu代表训练集训练的错误率的上限，或者说支持向量的百分比下限，取值范围为(0,1],默认是0.5.通过选择不同的错误率可以得到不同的距离误差ϵ。也就是说，这里的nu的使用和LinearSVR 和SVR的ϵ参数等价。 | | -- | ---------------------------- | ----------------------------------------------------------------------------------------------------------- | | | | | * 1）一般在做训练之前对数据进行归一化，当然测试集中的数据也需要归一化。 * 2）**在特征数非常多的情况下，或者样本数远小于特征数的时候**，使用线性核，效果已经很好，并且只需要选择惩罚系数C即可。 * 3）在选择核函数时，如果线性拟合不好，一般推荐使用默认的高斯核'rbf'。这时我们主要需要对惩罚系数C和核函数参数γ进行调参，通过多轮的交叉验证选择合适的惩罚系数C和核函数参数γ。 * 4）**理论上高斯核不会比线性核差**，但是这个理论却建立在要花费更多的时间来调参上。所以实际上能用线性核解决问题我们尽量使用线性核。 * 这个有个不错的中文表格介绍参数： ## 官方例子 LinearSVC： LinearSVR ： NuSVC： NuSVR： SVC： SVR: ## 一个SVC的实现 ```python from sklearn.svm import SVC import numpy as np import pandas as pd from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt def create_data(): iris = load_iris() df = pd.DataFrame(iris.data, columns=iris.feature_names) df['label'] = iris.target df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label'] data = np.array(df.iloc[:100, [0, 1, -1]]) for i in range(len(data)): if data[i, -1] == 0: data[i, -1] = -1 # print(data) return data[:, :2], data[:, -1] X, y = create_data() X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) clf = SVC() clf.fit(X_train, y_train) print('分数:', clf.score(X_test, y_test)) #分数: 0.96 ``` 可视化： ![img](https://img-blog.csdnimg.cn/20190221181856511.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2ppYW5nNDI1Nzc2MDI0,size_16,color_FFFFFF,t_70)![点击并拖拽以移动](https://firebasestorage.googleapis.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LlRDjw7ExCWOBrbokF1%2Fuploads%2FP9zAaXYU0kj61KjLKbfH%2Ffile.gif?alt=media) --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://im-qianuxn.gitbook.io/pytorch/ji-suan-ji/ml/7-zhi-chi-xiang-liang/sk-learn-svc-svr.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.