All Products
Search
Document Center

Platform For AI:Build models to predict the hazy weather

Last Updated:Sep 19, 2023

This topic describes how to build models to predict the hazy weather based on the analysis of data that is collected in Beijing for one year. The models can be used to find out the pollutant that is most likely to cause hazy weather. The pollutant is measured based on the concentration of PM 2.5.

Datasets

In the following sample experiment, the air quality data that is collected every hour in Beijing during 2016 is used. The following table describes the fields of the air quality data.

Field

Data type

Description

time

STRING

The date. This field is accurate to the day.

hour

STRING

The hour in which the data is collected.

pm2

STRING

The PM 2.5 index.

pm10

STRING

The PM 10 index.

so2

STRING

The sulfur dioxide index.

co

STRING

The carbon monoxide index.

no2

STRING

The nitrogen dioxide index.

Build models to predict the hazy weather

  1. Go to the Machine Learning Designer page.

    1. Log on to the Machine Learning Platform for AI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Training > Visualized Modeling (Designer) to go to the Machine Learning Designer page.

  2. Create a pipeline.

    1. On the Visualized Modeling (Designer) page, click the Preset Templates tab.

    2. In the Air Quality Prediction section of the Preset Templates tab, click Create.

    3. In the Create Pipeline dialog box, configure the parameters. You can use their default values.

      The value specified for the Pipeline Data Path parameter is the Object Storage Service (OSS) bucket path of the temporary data and models generated during the runtime of the pipeline.

    4. Click OK.

      It requires about 10 seconds to create the pipeline.

    5. On the Pipelines tab, double-click the created Air Quality Prediction template to open it.

    6. View the components of the pipeline on the canvas as shown in the following figure. The system automatically creates the pipeline based on the preset template.

      雾霾预测实验

      Section

      Description

      The components displayed in this section read and preprocess data.

      1. The data source component reads the source data.

      2. The type transform component converts the source data in the STRING type to the DOUBLE type.

      3. The sql component converts the values in the label column to binary values of 0 or 1. In this pipeline, the pm2 column is the label column. In the pm2 column, values greater than 200 indicate heavy hazy weather. The sql component marks the values greater than 200 in the pm2 column as 1 and the values that are smaller than or equal to 200 as 0 The following SQL statement provides an example:

        select time,hour,(case when pm2>200 then 1 else 0 end),pm10,so2,co,no2 from ${t1};
      4. The normalize component converts pollutant concentrations with different units to normalized values without units.

      The components displayed in this section perform statistical analysis.

      1. The histograms component visualizes the distribution of each pollutant.

        For example, the following figure shows that the interval in which most of the PM 2.5 concentrations fall is 11.74 to 15.61. The total number of PM 2.5 concentrations in this interval is 430.pm2.5分布

      2. The data view component visualizes the impact of different intervals of each pollutant on the results.

        For example, the following figure shows the data of the nitrogen dioxide concentration. When the nitrogen dioxide concentration falls in the interval of 112.33 to 113.9, seven values of the label column are converted to 0 and nine are converted to 1. This indicates that when the nitrogen dioxide concentration falls in the interval of 112.33 to 113.9, the occurrence probability of heavy hazy weather is high. Entropy and Gini indicate the impact of the feature interval on the target value in terms of the information amount. A larger indicates a greater impact. image.png

      The components displayed in this section train models and make predictions. In this pipeline, the random forests and logistic regression components train the models.

      The components displayed in this section evaluate the models.

  3. Run the pipeline and view the results.

    1. In the upper-left corner of the canvas, click the Run icon.

    2. After the pipeline is run, right-click the evaluate component that is connected as a downstream component of the random forests component. In the shortcut menu that appears, click Visual Analysis.

    3. In the evaluate section, click the Evaluation Chart tab to view the prediction results of the models that are trained by the random forests component.

      image.png The area under curve (AUC) value in the preceding figure indicates that the accuracy of the trained model for air quality prediction is higher than 99%. This model is trained by the random forests component.

    4. Right-click the evaluate component that is connected as a downstream component of the logistic regression component. In the shortcut menu that appears, click Visual Analysis.

    5. In the evaluate section, click the Evaluation Chart tab to view the prediction results of the models that are trained by the logistic regression component.

      image.png The AUC value in the preceding figure indicates the accuracy of the model for hazy weather prediction is higher than 98%. This model is trained by the logistic regression component.