This topic describes how to use the experiment template that is provided by Machine Learning Studio to build a model for identifying users who steal electricity or are involved in electricity leakage. This way, electricity theft and leakage can be automatically detected. This reduces the inspection workload of electrical inspection staff to a large degree and ensures normal and safe electricity usage.

Background information

The traditional methods of identifying electricity theft and leakage and metering device failures include regular inspection, regular check of electricity meters, and users' reporting of electricity theft and leakage. These methods require manual operations. In addition, these methods are inefficient if you want to identify users who steal electricity or are involved in electricity leakage. The staff of power supply bureaus use the existing automated system for metering electricity usage. To monitor electricity usage online, they use the system to trigger alerts for abnormal electricity usage and query electricity usage data. For example, the system collects data about abnormal electricity usage, abnormal load, alerts reported by terminals and primary sites, and abnormal line loss. This way, relevant staff can identify electricity theft, electricity leakage, and metering device failures. After alerts are triggered, the system builds models for analyzing abnormal electricity usage based on the current, voltage, and load before and after the alert time. This also helps relevant staff identify electricity theft, electricity leakage, and metering device failures.

The existing automated system for metering electricity usage can monitor abnormal electricity usage. However, due to frequent false positives and false negatives, it is difficult to precisely identify users who steal electricity or are involved in electricity leakage. In addition, experts need to determine the weight of each metric for the model to be built based on their knowledge and experience. This process is subjective.

The existing automated system for metering electricity usage can collect all kinds of electricity load data, such as the current, voltage, and power data, and alert data that terminals report. Such data can reflect the electricity usage of users. Electrical inspection staff can also collect electricity theft and leakage data from the online inspection system or by conducting on-site inspection. Based on the preceding data, PAI can abstract key features of users who steal electricity or are involved in electricity leakage and build a model for identifying such users. This way, electricity theft or leakage can be automatically detected. This reduces the inspection workload of electrical inspection staff to a large degree and ensures normal and safe electricity usage.

Dataset

The following table describes the fields that are used in the dataset of this experiment.
Field Data type Description
power_usage_decline_level BIGINT The electricity usage trend.
line_loss_rate BIGINT The line loss.
warning_num BIGINT The number of alerts.
is_theff BIGINT Indicates whether users steal electricity or are involved in electricity leakage.

Procedure

  1. Go to the Machine Learning Studio console.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.Machine Learning
  2. Create an experiment.
    1. In the left-side navigation pane, click Home.
    2. In the Templates section, click Create below Theft power identification.
    3. In the New Experiment dialog box, set the experiment parameters. You can use the default values for the parameters.
      Parameter Description
      Name The name of the experiment. Default value: Theft power identification.
      Project The project in which you want to create the experiment. You cannot change the value of this parameter.
      Description The description of the experiment. Default value: By constructing the identification model of the user who steals electricity, it can automatically check and determine whether the user has the leakage of electricity, greatly reducing the workload of the inspection staff.
      Save To The directory for storing the experiment. Default value: My Experiments.
    4. Click OK.
    5. Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
    6. Optional:Click Theft power identification_XX under My Experiments. The canvas of the experiment appears.
      My Experiments is the directory for storing the experiment that you created and Theft power identification_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
    7. View the components of the experiment on the canvas, as shown in the following figure. The system automatically creates the experiment based on the preset template.
      Experiment on electricity theft identification
      Area No. Description
      1 The components in this area perform statistical analysis:
      1. The Correlation Coefficient Matrix component analyzes the impact of each feature on determining whether users steal electricity or are involved in electricity leakage. After the experiment is run, right-click Correlation Coefficient Matrix-1 on the canvas and select View Analytics Report to view the impact of each feature on determining whether users steal electricity or are involved in electricity leakage.
      2. The Data Pivoting component visualizes the distribution of data in the feature columns and the label column. In this experiment, the feature columns are power_usage_decline_level, line_loss_rate, and warning_num. The label column is is_theft.
      2 The Split-1 component divides the dataset into a training dataset and a prediction dataset at the ratio of 8 to 2.
      3 The Logistic Regression for Binary Classification component is used to perform regression modeling on the training dataset. In the training dataset of this experiment, the feature columns are power_usage_decline_level, line_loss_rate, and warning_num. The label column is is_theft.
      4 The Prediction component predicts the result of applying the model to the prediction dataset. The Binary Classification Evaluation component evaluates the prediction accuracy.
  3. Run the experiment and view the result.
    1. In the top toolbar of the canvas, click Run.
    2. After the experiment is run, right-click Correlation Coefficient Matrix-1 on the canvas and select View Analytics Report.
    3. In the Correlation Coefficient Matrix dialog box, select Show Correlation to view the impact of each feature on determining whether users steal electricity or are involved in electricity leakage.
      Correlation analysis resultThe power_usage_deline_level, line_loss_rate, and warning_num features each do not have an obvious correlation with is_theft. In other words, whether users steal electricity or are involved in electricity leakage is determined based on more than one feature.
    4. Right-click Logistic Regression for Binary Classification-1 on the canvas and choose Model Option > Show Model to view the weight of each feature on the prediction result.
    5. Right-click Binary Classification Evaluation-1 on the canvas and select View Evaluation Report.
    6. In the Evaluation Report dialog box, click the Charts tab to view the model evaluation indexes.
      Model evaluation reportThe Area Under Curve (AUC) value in the preceding figure indicates that the accuracy of the model in identifying electricity leakage and theft is more than 91%.