A Graph Database data source lets you read data from and write data to Graph Database. This topic describes the data synchronization capabilities for Graph Database in DataWorks.
Prerequisites
Before you develop a synchronization task, make sure you have:
Access to a serverless resource group (recommended) or an exclusive resource group for Data Integration. Due to network restrictions, data integration tasks for GDB must run in one of these resource groups. Purchase and attach the virtual private cloud (VPC) where your GDB instance resides before you start.
GDB added as a data source in DataWorks. See Add a data source.
Limits
Offline read
Configure separate tasks for vertices and edges. Each export task traverses data based on the label names of the vertices or edges being exported.
Primary key ID fields for both vertices and edges are of the STRING type. If you configure a numeric type such as LONG, GDB Reader attempts a type conversion. A failed conversion causes the record to be lost.
Property values must match the storage class. If they don't match, GDB Reader attempts a type conversion. A failed conversion may cause the record to be lost.
When exporting a SET property value from a vertex, the same value is not guaranteed to be exported each time.
When all properties are exported in JSON format, a SET property with only one value is output as a regular property.
Field names and enumeration values in the examples are case-sensitive unless otherwise specified.
The GDB server supports UTF-8 encoding only. All exported data is in UTF-8 format.
GDB must be upgraded to version 1.0.20 or later to support SET properties. Verify the instance version before using SET properties.
Offline write
Run the vertex sync task first. After it completes successfully, run the edge sync task.
Field names and enumeration values in the examples are case-sensitive unless otherwise specified.
The GDB server supports UTF-8 encoding only. Source data must also be in UTF-8 format.
Vertex constraints
Constraint | Details |
Type name | Required. A vertex must have a type name (vertex name) that corresponds to the |
Primary key ID | Required. Must be unique among all vertices and must be of the STRING type. GDB Writer force-converts non-STRING types. |
| If set to |
Edge constraints
Constraint | Details |
Type name | Required. An edge must have a type name (edge name) that corresponds to the |
Primary key ID | Optional. If specified, it must be globally unique across all edges. If not specified, the GDB server generates a UUID. The type must be STRING; GDB Writer force-converts non-STRING types. |
| If set to |
| Required. Must be consistent with the |
Add a data source
Before developing a synchronization task, add GDB as a data source in DataWorks. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Develop a data synchronization task
For the entry point and configuration procedure, see the following guides.
Configuration guide for an offline sync task for a single table
For parameters and a script demo using the code editor, see Appendix: Script demo and parameter description.
Appendix: Script demo and parameter description
Configure a batch synchronization task using the code editor
To configure a batch synchronization task using the code editor, configure the parameters in your script following the unified script format. For more information, see Configure a task in the code editor. The following sections describe the parameters required for GDB data sources.
Reader script demo
GDB Reader exports vertices and edges using separate tasks. All examples use the labelType parameter to specify whether the task targets vertices (VERTEX) or edges (EDGE).
Vertex configuration example
{
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "100"
},
"jvmOption": "",
"speed": {
"concurrent": 3,
"throttle": true,
"mbps": "12"
}
},
"steps": [
{
"category": "reader",
"name": "Reader",
"parameter": {
"host": "gdb-xxxxxx.aliyuncs.com",
"port": 8182,
"username": "gdb",
"password": "gdb",
"labelType": "VERTEX",
"labels": ["label1", "label2"],
"column": [
{
"name": "id",
"type": "string",
"columnType": "primaryKey"
},
{
"name": "label",
"type": "string",
"columnType": "primaryLabel"
},
{
"name": "age",
"type": "int",
"columnType": "vertexProperty"
}
]
},
"stepType": "gdb"
},
{
"category": "writer",
"name": "Writer",
"parameter": {
"print": true
},
"stepType": "stream"
}
]
}Edge configuration example
{
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "100"
},
"jvmOption": "",
"speed": {
"concurrent": 3,
"throttle": true,
"mbps": "12"
}
},
"steps": [
{
"category": "reader",
"name": "Reader",
"parameter": {
"host": "gdb-xxxxxx.aliyuncs.com",
"port": 8182,
"username": "gdb",
"password": "gdb",
"labelType": "EDGE",
"labels": ["label1", "label2"],
"column": [
{
"name": "id",
"type": "string",
"columnType": "primaryKey"
},
{
"name": "label",
"type": "string",
"columnType": "primaryLabel"
},
{
"name": "srcId",
"type": "string",
"columnType": "srcPrimaryKey"
},
{
"name": "srcLabel",
"type": "string",
"columnType": "srcPrimaryLabel"
},
{
"name": "dstId",
"type": "string",
"columnType": "dstPrimaryKey"
},
{
"name": "dstLabel",
"type": "string",
"columnType": "dstPrimaryLabel"
},
{
"name": "weight",
"type": "double",
"columnType": "edgeProperty"
}
]
},
"stepType": "gdb"
},
{
"category": "writer",
"name": "Writer",
"parameter": {
"print": true
},
"stepType": "stream"
}
]
}Reader script parameters
Parameter | Description | Required | Default |
| The endpoint of the GDB instance. In the Graph Database console, click Graph Database consoleManage next to the instance to view the Internal Endpoint. | Yes | None |
| The port used to connect to the GDB instance. | Yes |
|
| The account name for the GDB instance. | Yes | None |
| The password for the GDB instance account. | Yes | None |
| The label names to read. Accepts an array, for example, | Yes | None |
| The type of data to read. | Yes | None |
| The field mapping configuration for the vertex or edge. | Yes | None |
| The field name for the vertex or edge. For properties, provide the property name. | Yes | None |
| The type of the field value. Supported types for regular properties: | Yes | None |
| The role of the field. See the table below for supported values. | Yes | None |
Supported `columnType` values
Value | Applies to | Description |
| Vertices and edges | The primary key ID. |
| Vertices and edges | The label name. |
| Vertices ( | A basic-type property of the vertex. |
| Vertices ( | All vertex properties packed into a single JSON column. Cannot be combined with other property types in the same |
| Edges ( | The primary key ID of the source vertex. |
| Edges ( | The primary key ID of the destination vertex. |
| Edges ( | The label name of the source vertex. |
| Edges ( | The label name of the destination vertex. |
| Edges ( | A property of the edge. |
| Edges ( | All edge properties packed into a single JSON column. Cannot be combined with other property types in the same |
`vertexJsonProperty` format
{
"properties": [
{"k": "name", "t": "string", "v": "tom", "c": "set"},
{"k": "name", "t": "string", "v": "jack", "c": "set"},
{"k": "sex", "t": "string", "v": "male", "c": "single"}
]
}The name property above is multi-valued (two values). If a multi-valued property in GDB contains only one value, it is exported as a single-valued property.
`edgeJsonProperty` format
{
"properties": [
{"k": "name", "t": "string", "v": "tom"},
{"k": "sex", "t": "string", "v": "male"}
]
}Writer script demo
Vertex configuration example
{
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "100"
},
"speed": {
"throttle": true,
"concurrent": 3,
"mbps": "12"
}
},
"steps": [
{
"category": "reader",
"name": "Reader",
"parameter": {
"column": ["*"],
"datasource": "_ODPS",
"emptyAsNull": true,
"guid": "",
"isCompress": false,
"partition": [],
"table": ""
},
"stepType": "odps"
},
{
"category": "writer",
"name": "Writer",
"parameter": {
"datasource": "testGDB",
"label": "person",
"srcLabel": "",
"dstLabel": "",
"labelType": "VERTEX",
"writeMode": "INSERT",
"idTransRule": "labelPrefix",
"srcIdTransRule": "none",
"dstIdTransRule": "none",
"column": [
{
"name": "id",
"value": "#{0}",
"type": "string",
"columnType": "primaryKey"
},
{
"name": "person_age",
"value": "#{1}",
"type": "int",
"columnType": "vertexProperty"
},
{
"name": "person_credit",
"value": "#{2}",
"type": "string",
"columnType": "vertexProperty"
}
]
},
"stepType": "gdb"
}
],
"type": "job",
"version": "2.0"
}Edge configuration example
{
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "100"
},
"jvmOption": "",
"speed": {
"throttle": true,
"concurrent": 3,
"mbps": "12"
}
},
"steps": [
{
"category": "reader",
"name": "Reader",
"parameter": {
"column": ["*"],
"datasource": "_ODPS",
"emptyAsNull": true,
"guid": "",
"isCompress": false,
"partition": [],
"table": ""
},
"stepType": "odps"
},
{
"category": "writer",
"name": "Writer",
"parameter": {
"datasource": "testGDB",
"label": "use",
"labelType": "EDGE",
"srcLabel": "person",
"dstLabel": "software",
"writeMode": "INSERT",
"idTransRule": "labelPrefix",
"srcIdTransRule": "labelPrefix",
"dstIdTransRule": "labelPrefix",
"column": [
{
"name": "id",
"value": "#{0}",
"type": "string",
"columnType": "primaryKey"
},
{
"name": "id",
"value": "#{1}",
"type": "string",
"columnType": "srcPrimaryKey"
},
{
"name": "id",
"value": "#{2}",
"type": "string",
"columnType": "dstPrimaryKey"
},
{
"name": "person_use_software_time",
"value": "#{3}",
"type": "long",
"columnType": "edgeProperty"
},
{
"name": "person_regist_software_name",
"value": "#{4}",
"type": "string",
"columnType": "edgeProperty"
},
{
"name": "id",
"value": "#{5}",
"type": "long",
"columnType": "edgeProperty"
}
]
},
"stepType": "gdb"
}
],
"type": "job",
"version": "2.0"
}Writer script parameters
Parameter | Description | Required | Default |
| The data source name. Must match the name of the data source added in DataWorks. | Yes | None |
| The type name (vertex or edge name). Can be read from a source column using | Yes | None |
| The type of the label. | Yes | None |
| The source vertex name. Required when | No | None |
| The destination vertex name. Required when | No | None |
| How to handle duplicate IDs during import. | Yes |
|
| The transform rule for the primary key ID. | Yes |
|
| The transform rule for the source vertex primary key ID. | Required if |
|
| The transform rule for the destination vertex primary key ID. | Required if |
|
| The field mapping configuration for vertices or edges. See the field descriptions below. | Yes | None |
`column` field descriptions
Field | Description |
| The field name of the vertex or edge. |
| The mapped value. |
| The type of the mapped value. The primary key ID accepts STRING only; GDB Writer force-converts other types. Regular properties support: |
| The role of the mapped field. See the table below for supported values. |
Supported `columnType` values for Writer
Value | Applies to | Description |
| Vertices and edges | The primary key ID. Required for vertices; optional for edges. |
| Vertices ( | A regular property of the vertex. |
| Vertices ( | A JSON property of the vertex. For the value structure, see the |
| Edges ( | The primary key ID of the source vertex. |
| Edges ( | The primary key ID of the destination vertex. |
| Edges ( | A regular property of the edge. |
| Edges ( | A JSON property of the edge. For the value structure, see the |
`properties` example
{
"properties": [
{"k": "name", "t": "string", "v": "tom"},
{"k": "age", "t": "int", "v": "20"},
{"k": "sex", "t": "string", "v": "male"}
]
}