All Products
Search
Document Center

Platform For AI:Lorenz Curve

Last Updated:Dec 27, 2024

A Lorenz curve is a graph used to show dataset distribution inequality and commonly used to show the distribution of income or wealth within an economy. It plots the cumulative percentage of resources received by the cumulative percentage of the population to show distribution inequality in an intuitive manner. In machine learning, a Lorenz curve can be used to evaluate the fairness of model predictions or the bias in resource allocation.

Configure the component

Method 1: Configure the component on the pipeline page

On the pipeline details page in Machine Learning Designer, add the Lorenz Curve component to the pipeline and configure the parameters described in the following table.

Tab

Parameter

Description

Fields Setting

Select Fields

Select the desired feature column that you want to use to plot a curve.

This column includes data that you can use to analyze distribution inequality, such as income, wealth, or score.

Parameters Setting

Quantile

The number of equal-probability intervals into which you divide the dataset to plot the curve.

You can determine an appropriate quantile to control the granularity of the curve. This allows for a more detailed analysis of the inequality in data distribution.

Tuning

Computing Cores

The number of cores used in computing. The value must be a positive integer.

Memory Size per Core (Unit: MB)

The memory size of each core.

Method 2: Use PAI commands

Configure the component parameters by using Platform for AI (PAI) commands. You can use the SQL Script component to call PAI commands. For more information, see Scenario 4: Execute PAI commands within the SQL script component.

PAI -name LorenzCurve
    -project algo_public
    -DinputTableName=maple_test_lorenz_basic10_input
    -DcolName=col0
    -DoutputTableName=maple_test_lorenz_basic10_output -DcoreNum=20
    -DmemSizePerCore=110;

Parameter

Required

Default value

Description

inputTableName

Yes

No default value

The name of the input table.

outputTableName

Yes

No default value

The name of the output table.

colName

No

No default value

The columns selected from the input table. You can select multiple columns and separate them with commas (,).

N

No

100

The quantile.

inputTablePartitions

No

No default value

The partitions that are selected from the input table for training. The following formats are supported:

  • partition_name=value

  • name1=value1/name2=value2: multi-level partitions

Note

If you specify multiple partitions, separate them with commas (,). Example: name1=value1,value2.

lifecycle

No

28

The lifecycle of the output table. This value must be an integer. Unit: days.

coreNum

No

Determined by the system

This parameter is used with memSizePerCore. The value must be a positive integer. The system calculates the number of instances based on the amount of input data.

memSizePerCore

No

Determined by the system

The memory size of each core. Unit: MB. The value must be a positive integer. Recommended values: (1024,64 × 1024).

Example

  1. Generate the following test data:

    col0:double

    4

    7

    2

    8

    6

    3

    9

    5

    0

    1

    10

  2. Run the following PAI command:

    PAI -name LorenzCurve
        -project algo_public
        -DinputTableName=maple_test_lorenz_basic10_input
        -DcolName=col0
        -DoutputTableName=maple_test_lorenz_basic10_output
        -DcoreNum=20
        -DmemSizePerCore=110;
  3. View the output as described in the following table.

    quantile

    col0

    0

    0

    1

    0.01818181818181818

    2

    0.01818181818181818

    3

    0.01818181818181818

    4

    0.01818181818181818

    5

    0.01818181818181818

    6

    0.01818181818181818

    7

    0.01818181818181818

    8

    0.01818181818181818

    9

    0.01818181818181818

    10

    0.01818181818181818

    11

    0.05454545454545454

    12

    0.05454545454545454

    13

    0.05454545454545454

    14

    0.05454545454545454

    ...

    ...

    85

    0.8181818181818182

    86

    0.8181818181818182

    87

    0.8181818181818182

    88

    0.8181818181818182

    89

    0.8181818181818182

    90

    1

    91

    1

    92

    1

    93

    1

    94

    1

    95

    1

    96

    1

    97

    1

    98

    1

    99

    1

    100

    1