DataWorks provides Graph Database (GDB) Reader and GDB Writer for you to read data from and write data to GDB data sources. This topic describes the capabilities of synchronizing data from or to GDB data sources.
Limits
Batch data read | Batch data write |
|
|
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.
Add a data source
Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Appendix: Code and parameters
Appendix: Configure a batch synchronization task by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.
Code for GDB Reader
In the following code, two synchronization tasks are configured to read data from a GDB instance.
Configure a synchronization task to read data about vertices from a GDB instance
{ "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"100" // The maximum number of dirty data records allowed. }, "jvmOption":"", "speed":{ "concurrent":3, "throttle":true,/// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. "mbps":"12"// The maximum transmission rate. Unit: MB/s. } }, "steps":[ { "category":"reader", "name":"Reader", "parameter":{ "host": "gdb-xxxxxx.aliyuncs.com", // The endpoint that is used to connect to the GDB instance. "port": 8182, // The port number that is used to connect to the GDB instance. "username": "gdb", // The username that is used to connect to the GDB instance. "password": "gdb", // The password that is used to connect to the GDB instance. "labelType": "VERTEX", // The type of the label. The value VERTEX indicates a vertex. "labels": ["label1", "label2"], // The labels of the vertices to be synchronized. If this parameter is left empty, all vertices are synchronized. "column": [ { "name": "id", // The name of the vertex property. "type": "string", // The data type for storing the data to be synchronized. "columnType": "primaryKey" // The category of the vertex property. The value primaryKey indicates that the synchronized data is the primary key of the vertex and is of the STRING type in the GDB instance. }, { "name": "label", // The name of the vertex property. "type": "string", // The data type for storing the data to be synchronized. "columnType": "primaryLabel" // The category of the vertex property. The value primaryLabel indicates that the synchronized data is the label of the vertex and is of the STRING type in the GDB instance. }, { "name": "age", // The name of the vertex property. "type": "int", // The data type for storing the data to be synchronized. "columnType": "vertexProperty" // The category of the vertex property. The value vertexProperty indicates a common vertex property. } ] }, "stepType":"gdb" }, { "category":"writer", "name":"Writer", "parameter":{ "print": true }, "stepType":"stream" } ] }
Configure a synchronization task to read data about edges from a GDB instance
{ "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"100" // The maximum number of dirty data records allowed. }, "jvmOption":"", "speed":{ "concurrent":3, "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. "mbps":"12"// The maximum transmission rate. Unit: MB/s. } }, "steps":[ { "category":"reader", "name":"Reader", "parameter":{ "host": "gdb-xxxxxx.aliyuncs.com", // The endpoint that is used to connect to the GDB instance. "port": 8182, // The port number that is used to connect to the GDB instance. "username": "gdb", // The username that is used to connect to the GDB instance. "password": "gdb", // The password that is used to connect to the GDB instance. "labelType": "EDGE", // The type of the label. The value EDGE indicates an edge. "labels": ["label1", "label2"], // The labels of the edges to be synchronized. If this parameter is left empty, all edges are synchronized. "column": [ { "name": "id", // The name of the edge property. "type": "string", // The data type for storing the data to be synchronized. "columnType": "primaryKey" // The category of the edge property. The value primaryKey indicates that the synchronized data is the primary key of the edge and is of the STRING type in the GDB instance. }, { "name": "label", // The name of the edge property. "type": "string", // The data type for storing the data to be synchronized. "columnType": "primaryLabel" // The category of the edge property. The value primaryLabel indicates that the synchronized data is the label of the edge and is of the STRING type in the GDB instance. }, { "name": "srcId", // The name of the edge property. "type": "string", // The data type for storing the data to be synchronized. "columnType": "srcPrimaryKey" // The category of the edge property. The value srcPrimaryKey indicates that the synchronized data is the primary key of the start vertex and is of the STRING type in the GDB instance. }, { "name": "srcLabel", // The name of the edge property. "type": "string", // The data type for storing the data to be synchronized. "columnType": "srcPrimaryLabel" // The category of the edge property. The value srcPrimaryLabel indicates that the synchronized data is the label of the start vertex and is of the STRING type in the GDB instance. }, { "name": "dstId", // The name of the edge property. "type": "string", // The data type for storing the data to be synchronized. "columnType": "dstPrimaryKey" // The category of the edge property. The value dstPrimaryKey indicates that the synchronized data is the primary key of the end vertex and is of the STRING type in the GDB instance. }, { "name": "dstLabel", // The name of the edge property. "type": "string", // The data type for storing the data to be synchronized. "columnType": "dstPrimaryLabel" // The category of the edge property. The value dstPrimaryLabel indicates that the synchronized data is the label of the end vertex and is of the STRING type in the GDB instance. }, { "name": "weight", // The name of the edge property. "type": "double", // The data type for storing the data to be synchronized. "columnType": "edgeProperty" // The category of the edge property. The value edgeProperty indicates a common edge property. } ] }, "stepType":"gdb" }, { "category":"writer", "name":"Writer", "parameter":{ "print": true }, "stepType":"stream" } ] }
Parameters in code for GDB Reader
Parameter | Description | Required | Default value |
host | The endpoint that is used to connect to the GDB instance. You can log on to the GDB console, find the instance that you want to configure, and click View Instance Details in the Actions column to view Intranet URL. | Yes | No default value |
port | The port number that is used to connect to the GDB instance. | Yes | 8182 |
username | The username that is used to connect to the GDB instance. | Yes | No default value |
password | The password that is used to connect to the GDB instance. | Yes | No default value |
labels | The label, which is the name of the vertex or edge. GDB Reader can read data from multiple vertices or edges at a time. In this case, the value of this parameter is an array, such as ["label1", "label2"]. | Yes | No default value |
labelType | The type of the label. Valid values:
| Yes | No default value |
column | The vertices or edges to be synchronized. | Yes | No default value |
column -> name | The name of the vertex or edge property to be synchronized. This parameter is required if vertex or edge properties are to be synchronized. | Yes | No default value |
column -> type | The data type for storing the vertex or edge property to be synchronized.
| Yes | No default value |
column -> columnType | The category of the vertex or edge property to be synchronized.
| Yes | No default value |
Code for GDB Writer
Configure a synchronization task to write data about vertices to a GDB database
{ "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"100" // The maximum number of dirty data records allowed. }, "speed":{ "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. "concurrent":3, // The maximum number of parallel threads. "mbps":"12"// The maximum transmission rate. Unit: MB/s. } }, "steps":[ { "category":"reader", "name":"Reader", "parameter":{ "column":[ "*" ], "datasource":"_ODPS", "emptyAsNull":true, "guid":"", "isCompress":false, "partition":[], "table":"" }, "stepType":"odps" }, { "category":"writer", "name":"Writer", "parameter": { "datasource": "testGDB", // The name of the data source. "label": "person", // The label, which is the name of the vertex. "srcLabel": "", // You do not need to configure this parameter for a vertex. "dstLabel": "", // You do not need to configure this parameter for a vertex. "labelType": "VERTEX", // The type of the label. The value VERTEX indicates a vertex. "writeMode": "INSERT", // The mode in which GDB Writer processes data records with duplicate primary keys. "idTransRule": "labelPrefix", // The rule for converting the primary key of a vertex. "srcIdTransRule": "none", // You do not need to configure this parameter for a vertex. "dstIdTransRule": "none", // You do not need to configure this parameter for a vertex. "column": [ { "name": "id", // The name of the vertex property. "value": "#{0}", // The value of the first column in the source is used as the value of the vertex property. If multiple columns are specified, the columns can be concatenated. In this example, 0 is the column index. "type": "string", // The data type of the vertex property. "columnType": "primaryKey" // The category of the vertex property. The value primaryKey indicates the primary key. }, // The primary key of the vertex. The value must be an ID of the STRING type, and the record must exist. { "name": "person_age", "value": "#{1}", // The value of the second column in the source is used as the value of the vertex property. If multiple columns are specified, the columns can be concatenated. "type": "int", "columnType": "vertexProperty" // The category of the vertex property. The value vertexProperty indicates a common vertex property. }, // A common property of the vertex. The value can be of the INT, LONG, FLOAT, DOUBLE, BOOLEAN, or STRING type. { "name": "person_credit", "value": "#{2}", // The value of the third column in the source is used as the value of the vertex property. If multiple columns are specified, the columns can be concatenated. "type": "string", "columnType": "vertexProperty" }, // A common property of the vertex. ] } "stepType":"gdb" } ], "type":"job", "version":"2.0" }
Configure a synchronization task to write data about edges to a GDB database
{ "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"100" // The maximum number of dirty data records allowed. }, "jvmOption":"", "speed":{ "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. "concurrent":3, // The maximum number of parallel threads. "mbps":"12"// The maximum transmission rate. Unit: MB/s. } }, "steps":[ { "category":"reader", "name":"Reader", "parameter":{ "column":[ "*" ], "datasource":"_ODPS", "emptyAsNull":true, "guid":"", "isCompress":false, "partition":[], "table":"" }, "stepType":"odps" }, { "category":"writer", "name":"Writer", "parameter": { "datasource": "testGDB", // The name of the data source. "label": "use", // The label, which is the name of the edge. "labelType": "EDGE", // The type of the label. The value EDGE indicates an edge. "srcLabel": "person", // The name of the start vertex in the edge. "dstLabel": "software", // The name of the end vertex in the edge. "writeMode": "INSERT", // The mode in which GDB Writer processes data records with duplicate primary keys. "idTransRule": "labelPrefix", // The rule for converting the primary key of the edge. "srcIdTransRule": "labelPrefix", // The rule for converting the primary key of the start vertex in the edge. "dstIdTransRule": "labelPrefix", // The rule for converting the primary key of the end vertex in the edge. "column": [ { "name": "id", // The name of the edge property. "value": "#{0}", // The value of the first column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. "type": "string", // The data type of the edge property. "columnType": "primaryKey" // The category of the edge property. The value primaryKey indicates the primary key. }, // The primary key of the edge. The value must be an ID of the STRING type, and the record must exist. { "name": "id", "value": "#{1}", // The value of the second column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. The mapping rule must be the same as that configured when you import the vertex. "type": "string", "columnType": "srcPrimaryKey" // The category of the edge property. The value srcPrimaryKey indicates the primary key of the start vertex. }, // The primary key of the start vertex. The value must be an ID of the STRING type, and the record must exist. { "name": "id", "value": "#{2}", // The value of the third column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. The mapping rule must be the same as that configured when you import the vertex. "type": "string", "columnType": "dstPrimaryKey" // The category of the edge property. The value dstPrimaryKey indicates the primary key of the end vertex. }, // The primary key of the end vertex. The value must be an ID of the STRING type, and the record must exist. { "name": "person_use_software_time", "value": "#{3}", // The value of the fourth column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. "type": "long", "columnType": "edgeProperty" // The category of the edge property. The value edgeProperty indicates a common edge property. }, // A common property of the edge. The value can be of the INT, LONG, FLOAT, DOUBLE, BOOLEAN, or STRING type. { "name": "person_regist_software_name", "value": "#{4}", // The value of the fifth column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. "type": "string", "columnType": "edgeProperty" }, // A common property of the edge. { "name": "id", "value": "#{5}", // The value of the sixth column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. "type": "long", "columnType": "edgeProperty" }, // A common property of the edge. The value is an ID. Different from the primary key, this property is optional. ] } "stepType":"gdb" } ], "type":"job", "version":"2.0" }
Parameters in code for GDB Writer
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
label | The label, which is the name of the vertex or edge. GDB Writer can obtain labels from columns in the source table. For example, if you set this parameter to #{0}, GDB Writer uses the value of the first column as the label. The column index starts from 0. | Yes | No default value |
labelType | The type of the label. Valid values:
| Yes | No default value |
srcLabel |
| No | No default value |
dstLabel |
| No | No default value |
writeMode | The mode in which GDB Writer processes data records with duplicate primary keys. Valid values:
| Yes | INSERT |
idTransRule | The rule for converting the primary key. Valid values:
| Yes | none |
srcIdTransRule | The rule for converting the primary key of the start vertex when the labelType parameter is set to EDGE. Valid values:
| Required when the labelType parameter is set to EDGE | none |
dstIdTransRule | The rule for converting the primary key of the end vertex when the labelType parameter is set to EDGE. Valid values:
| Required when the labelType parameter is set to EDGE | none |
column | The vertices or edges that you want to synchronize.
Sample of properties
| Yes | No default value |