The k-core algorithm is used to find the subgraph with the specified coreness. The largest coreness is considered to be the coreness of a graph. This topic describes the K-Core component provided by Machine Learning Designer (formerly known as Machine Learning Studio).
You can configure the component by using one of the following methods: Machine Learning Platform for AI (PAI) console and PAI command.
PAI console
Tab | Parameter | Description |
---|---|---|
Fields Setting | Source Vertex Column | The start vertex column in the edge table. |
Target Vertex Column | The end vertex column in the edge table. | |
Parameters Setting | k | The value of the coreness. Default value: 3. This parameter is required. |
Tuning | Workers | The number of vertices for parallel job execution. The parallelism level and framework communication costs increase with the value of this parameter. |
Memory Size per Worker | The maximum size of memory that a single job can use. By default, the system allocates 4,096 MB for each job. If the used memory size exceeds the value of this parameter, the OutOfMemory exception is reported. |
PAI command
PAI -name KCore
-project algo_public
-DinputEdgeTableName=KCore_func_test_edge
-DfromVertexCol=flow_out_id
-DtoVertexCol=flow_in_id
-DoutputTableName=KCore_func_test_result
-Dk=2;
Parameter | Required | Description | Default value |
---|---|---|---|
inputEdgeTableName | Yes | The name of the input edge table. | No default value |
inputEdgeTablePartitions | No | The partitions in the input edge table. | Full table |
fromVertexCol | Yes | The start vertex column in the input edge table. | No default value |
toVertexCol | Yes | The end vertex column in the input edge table. | No default value |
outputTableName | Yes | The name of the output table. | No default value |
outputTablePartitions | No | The partitions in the output table. | No default value |
lifecycle | No | The lifecycle of the output table. | No default value |
workerNum | No | The number of vertices for parallel job execution. The parallelism level and framework communication costs increase with the value of this parameter. | Not specified |
workerMem | No | The maximum size of memory that a single job can use. By default, the system allocates 4,096 MB for each job. If the used memory size exceeds the value of this parameter, the OutOfMemory exception is reported. | 4096 |
splitSize | No | The data split size. | 64 |
k | Yes | The number of coreness. | 3 |
Example
- Generate the training data.
drop table if exists KCore_func_test_edge; create table KCore_func_test_edge as select * from ( select '1' as flow_out_id,'2' as flow_in_id from dual union all select '1' as flow_out_id,'3' as flow_in_id from dual union all select '1' as flow_out_id,'4' as flow_in_id from dual union all select '2' as flow_out_id,'3' as flow_in_id from dual union all select '2' as flow_out_id,'4' as flow_in_id from dual union all select '3' as flow_out_id,'4' as flow_in_id from dual union all select '3' as flow_out_id,'5' as flow_in_id from dual union all select '3' as flow_out_id,'6' as flow_in_id from dual union all select '5' as flow_out_id,'6' as flow_in_id from dual )tmp;
The following figure shows the structure of the k-core graph. - Set k to 2 and view training results.
+-------+-------+ | node1 | node2 | +-------+-------+ | 1 | 2 | | 1 | 3 | | 1 | 4 | | 2 | 1 | | 2 | 3 | | 2 | 4 | | 3 | 1 | | 3 | 2 | | 3 | 4 | | 4 | 1 | | 4 | 2 | | 4 | 3 | +-------+-------+