# spark-ML-pipeine

将机器学习每一个阶段建立成pipeline

![](/files/-LoyKLswh_vTQ-o9rFJ0)

1）建立机器学习流程：如多个数据处理阶段、机器学习算法

2）训练pipeline.fit()：训练阶段会顺序执行每个流程，产生pipelinemodel

3）预测pipelinemodel.transform()：同样会顺序直接每个阶段，得到预测值

spark的pipeline工作流好处：

* 数据格式（dataframe）同一化
* 数据处理模块化，方便套用
* 方便机器学习算法置换


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://im-qianuxn.gitbook.io/pytorch/ji-suan-ji/spark-hadoop/spark-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.