Lorenz Curve - Platform For AI - Alibaba Cloud Documentation Center

A Lorenz curve can be used to show the income distribution of a country or region. The slope of the curve indicates the degree of income inequality. The greater the slope, the more unequal the income distribution.

In a rectangle, the height represents the total wealth and is equally divided into N parts. The length represents the families arranged from least wealthy to most wealthy. The length is also equally divided into N parts. The first part indicates the least wealthy 1/N families. The points, each of which indicates a wealth proportion of 1/N families, are connected to form a Lorenz curve.

Configure the component

You can use one of the following methods to configure the Lorenz Curve component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Lorenz Curve component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.

Tab	Parameter	Description
Fields Setting	Columns	N/A
Parameters Setting	Quantile	Default value: 100.
Tuning	Computing Cores	The number of cores used in computing. The value must be a positive integer.
Tuning	Memory Size per Core (Unit: MB)	The memory size of each core.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name LorenzCurve
    -project algo_public
    -DinputTableName=maple_test_lorenz_basic10_input
    -DcolName=col0
    -DoutputTableName=maple_test_lorenz_basic10_output -DcoreNum=20
    -DmemSizePerCore=110;

Parameter	Required	Description	Default value
inputTableName	Yes	The name of the input table.	No default value
outputTableName	Yes	The name of the output table.	No default value
ColName	No	The columns selected from the input table. You can select multiple columns and separate them with commas (,).	No default value
N	No	The quantile.	100
inputTablePartitions	No	The partitions that are selected from the input table for training. The following formats are supported: Partition_name=value name1=value1/name2=value2: multi-level partitions Note If you specify multiple partitions, separate them with commas (,).	No default value
lifecycle	No	The lifecycle of the output table. This value must be an integer. Unit: days.	28
coreNum	No	This parameter is used with memSizePerCore. The value must be a positive integer. The system calculates the number of instances based on the amount of input data.	Determined by the system
memSizePerCore	No	The memory size of each core. Unit: MB. The value must be a positive integer. Recommended values: (1024,64 × 1024).	Determined by the system

Example

Generate the following test data:
col0:double
4
7
2
8
6
3
9
5
0
1
10

Run the following PAI command:

PAI -name LorenzCurve
    -project algo_public
    -DinputTableName=maple_test_lorenz_basic10_input
    -DcolName=col0
    -DoutputTableName=maple_test_lorenz_basic10_output
    -DcoreNum=20
    -DmemSizePerCore=110;

View the output as described in the following table.

quantile	col0
0	0
1	0.01818181818181818
2	0.01818181818181818
3	0.01818181818181818
4	0.01818181818181818
5	0.01818181818181818
6	0.01818181818181818
7	0.01818181818181818
8	0.01818181818181818
9	0.01818181818181818
10	0.01818181818181818
11	0.05454545454545454
12	0.05454545454545454
13	0.05454545454545454
14	0.05454545454545454
...	...
85	0.8181818181818182
86	0.8181818181818182
87	0.8181818181818182
88	0.8181818181818182
89	0.8181818181818182
90	1
91	1
92	1
93	1
94	1
95	1
96	1
97	1
98	1
99	1
100	1

col0:double
4
7
2
8
6
3
9
5
0
1
10