## Preface

Machine learning is widely used in industrial scenarios, with satisfying results. This experiment analyzes the power generation data of a combined cycle power plant to show how machine learning is applied to actual scenarios in industrial production.

This experiment uses the data of hybrid power plants collected from UCI machine learning datasets. For power plants, the output wind power determines the energy that a unit generator can produce. Power plants can collect metrics to predict the final output power. Power plants can also make production schedules with minimum resource waste by effectively predicting the output power of generators.

## Load and explore data

Load the dataset, which includes 9,568 data samples from a combined cycle power plant. Each data item occupies five columns: AT (atmospheric temperature), V (voltage), AP (atmospheric pressure), RH (relative humidity), and PE (output power). The following figure shows the data preview.

In the left-side navigation pane, choose **Components** > **Statistical Analysis**, and drag and drop **Correlation Coefficient Matrix** to the right section. View the features related to PE (output power) to find the factor that has the greatest impact on PE (output power).

Right-click the completed component and select View Analytics Report to obtain the correlation analysis result. The correlation chart shows the degree of correlation to PE (output power) in descending order: AT (atmospheric temperature) -> V (voltage) -> RH (relative humidity) -> AP (atmospheric pressure).

## Model data

In the left-side navigation pane, choose **Components** > **Data Preprocessing**, and drag and drop **Split** to the right section to split data into the training set and test set. Then, choose **Components** > **Machine Learning** > **Regression**, and drag and drop **Linear Regression** to the right section to perform regression modeling on the data. Select the feature columns (X) and label column (Y).

## Predict and evaluate the regression model

After modeling is complete, choose **Components** > **Machine Learning** and drag and drop **Prediction** to the right section to predict the effect of the model on the test dataset. Select AT, V, AP, and RH for Feature Columns, and select all options for **Reserved Output Column**.

Right-click the model and choose **Show Model** to view the weights of different features on the number of results.

In the left-side navigation pane, choose **Components** > **Machine Learning** > **Evaluation**, and drag and drop **Regression Model Evaluation** to the right section to view the model effect. Right-click **Regression Model Evaluation** and choose **View Analytics Report**. The RMSE value reaches 4.57. The following figure shows the completed experiment.

This completes the experiment of using the linear regression model to create a power prediction model for a hybrid power plant. After being deployed, the model can predict the power generation of the power plant in real time. This helps the power plant make a better power production schedule with minimum resource waste.