All Products
Search
Document Center

DataWorks:GDB data source

Last Updated:Nov 15, 2023

DataWorks provides Graph Database (GDB) Reader and GDB Writer for you to read data from and write data to GDB data sources. This topic describes the capabilities of synchronizing data from or to GDB data sources.

Limits

Batch data read

Batch data write

  • You must configure two synchronization tasks to synchronize data about vertices and edges separately.

  • The vertices or edges whose data is to be synchronized must have names for DataWorks to traverse and obtain related data.

  • The primary key values of vertices and edges in GDB are of the STRING type. The type of data to be synchronized must be configured as the STRING type. If the configured type is a numeric type, such as LONG, GDB Reader forcibly converts the primary key values to the STRING type. If the conversion fails, the primary key values are lost.

  • For the values of vertex or edge properties, the data type for the property values to be synchronized must be the same as the original data type in a GDB instance. If the data type for the property values is different from the original data type, GDB Reader forcibly converts the property values to the specified data type. If the conversion fails, the property values are lost.

  • If you run a task to synchronize the vertex data multiple times, the obtained values of the SET property may be different.

  • If you configure all properties in the JSON format, the SET property that contains only one value is regarded as a common property.

  • Unless otherwise specified, field names or enumerated values in this topic are case-sensitive.

  • GDB Reader supports only the UTF-8 encoding format. The synchronized data must be encoded in UTF-8.

  • Only GDB 1.0.20 or later supports the SET property. Confirm the GDB version before you use the SET property.

  • You must run a synchronization task to synchronize vertex data before you run a synchronization task to synchronize edge data.

  • Limits on vertices:

    • A vertex must have a name, which is specified by the label parameter.

    • A vertex must have a unique primary key of the STRING type. If the primary key is not a string, GDB Writer forcibly converts the primary key into a string.

    • Exercise caution when you configure the idTransRule parameter. If you want to set this parameter to none, you must make sure that the primary key of each vertex is unique among all vertices.

  • Limits on edges:

    • An edge must have a name, which is specified by the label parameter.

    • A primary key is optional for an edge.

      • If you want to specify a primary key for an edge, you must make sure that the primary key is unique among all edges.

      • If you do not specify a primary key for an edge, GDB Writer automatically generates a universally unique identifier (UUID) of the STRING type for the edge. If the UUID is not a string, GDB Writer forcibly converts the UUID into a string.

    • Exercise caution when you configure the idTransRule parameter. If you want to set this parameter to none, you must make sure that the primary key of each edge is unique among all edges.

    • The srcIdTransRule and dstIdTransRule parameters are required for an edge. The values of the two parameters must be the same as the value of the idTransRule parameter of the related vertex.

  • Unless otherwise specified, field names and enumerated values in this topic are case-sensitive.

  • GDB Writer supports only the UTF-8 encoding format. Source data must be encoded in UTF-8.

  • Due to network constraints, synchronization tasks that are used to synchronize data to GDB databases can run only on exclusive resource groups for Data Integration. You must purchase an exclusive resource group for Data Integration and associate the group with the virtual private cloud (VPC) in which the GDB instance resides in advance.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.

Add a data source

Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Appendix: Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.

Code for GDB Reader

In the following code, two synchronization tasks are configured to read data from a GDB instance.

  • Configure a synchronization task to read data about vertices from a GDB instance

    {
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        },
        "setting":{
            "errorLimit":{
                "record":"100"  // The maximum number of dirty data records allowed. 
            },
            "jvmOption":"",
            "speed":{
                "concurrent":3,
                "throttle":true,/// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
                "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
            }
        },
        "steps":[
            {
                "category":"reader",
                "name":"Reader",
                "parameter":{
                    "host": "gdb-xxxxxx.aliyuncs.com", // The endpoint that is used to connect to the GDB instance. 
                    "port": 8182, // The port number that is used to connect to the GDB instance. 
                    "username": "gdb", // The username that is used to connect to the GDB instance. 
                    "password": "gdb", // The password that is used to connect to the GDB instance. 
                    "labelType": "VERTEX", // The type of the label. The value VERTEX indicates a vertex. 
                    "labels": ["label1", "label2"],  // The labels of the vertices to be synchronized. If this parameter is left empty, all vertices are synchronized. 
                    "column": [
                        {
                            "name": "id",               // The name of the vertex property. 
                            "type": "string",           // The data type for storing the data to be synchronized. 
                            "columnType": "primaryKey"  // The category of the vertex property. The value primaryKey indicates that the synchronized data is the primary key of the vertex and is of the STRING type in the GDB instance. 
                        },
                        {
                            "name": "label",              // The name of the vertex property. 
                            "type": "string",             // The data type for storing the data to be synchronized. 
                            "columnType": "primaryLabel"  // The category of the vertex property. The value primaryLabel indicates that the synchronized data is the label of the vertex and is of the STRING type in the GDB instance. 
                        },
                        {
                            "name": "age",                   // The name of the vertex property. 
                            "type": "int",                   // The data type for storing the data to be synchronized. 
                            "columnType": "vertexProperty"   // The category of the vertex property. The value vertexProperty indicates a common vertex property. 
                        }
                    ]
                },
                "stepType":"gdb"
            },
            {
                "category":"writer",
                "name":"Writer",
                "parameter":{
                    "print": true
                },
                "stepType":"stream"
            }
        ]
    }
  • Configure a synchronization task to read data about edges from a GDB instance

    {
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        },
        "setting":{
            "errorLimit":{
                "record":"100"  // The maximum number of dirty data records allowed. 
            },
            "jvmOption":"",
            "speed":{
                "concurrent":3,
                "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
                "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
            }
        },
        "steps":[
            {
                "category":"reader",
                "name":"Reader",
                "parameter":{
                    "host": "gdb-xxxxxx.aliyuncs.com", // The endpoint that is used to connect to the GDB instance. 
                    "port": 8182, // The port number that is used to connect to the GDB instance. 
                    "username": "gdb", // The username that is used to connect to the GDB instance. 
                    "password": "gdb", // The password that is used to connect to the GDB instance. 
                    "labelType": "EDGE", // The type of the label. The value EDGE indicates an edge. 
                    "labels": ["label1", "label2"],  // The labels of the edges to be synchronized. If this parameter is left empty, all edges are synchronized. 
                    "column": [
                        {
                            "name": "id",               // The name of the edge property. 
                            "type": "string",           // The data type for storing the data to be synchronized. 
                            "columnType": "primaryKey"  // The category of the edge property. The value primaryKey indicates that the synchronized data is the primary key of the edge and is of the STRING type in the GDB instance. 
                        },
                        {
                            "name": "label",              // The name of the edge property. 
                            "type": "string",             // The data type for storing the data to be synchronized. 
                            "columnType": "primaryLabel"  // The category of the edge property. The value primaryLabel indicates that the synchronized data is the label of the edge and is of the STRING type in the GDB instance. 
                        },
                        {
                            "name": "srcId",               // The name of the edge property. 
                            "type": "string",              // The data type for storing the data to be synchronized. 
                            "columnType": "srcPrimaryKey"  // The category of the edge property. The value srcPrimaryKey indicates that the synchronized data is the primary key of the start vertex and is of the STRING type in the GDB instance. 
                        },
                        {
                            "name": "srcLabel",               // The name of the edge property. 
                            "type": "string",                 // The data type for storing the data to be synchronized. 
                            "columnType": "srcPrimaryLabel"   // The category of the edge property. The value srcPrimaryLabel indicates that the synchronized data is the label of the start vertex and is of the STRING type in the GDB instance. 
                        },
                        {
                            "name": "dstId",                    // The name of the edge property. 
                            "type": "string",                   // The data type for storing the data to be synchronized. 
                            "columnType": "dstPrimaryKey"       // The category of the edge property. The value dstPrimaryKey indicates that the synchronized data is the primary key of the end vertex and is of the STRING type in the GDB instance. 
                        },
                        {
                            "name": "dstLabel",                 // The name of the edge property. 
                            "type": "string",                   // The data type for storing the data to be synchronized. 
                            "columnType": "dstPrimaryLabel"     // The category of the edge property. The value dstPrimaryLabel indicates that the synchronized data is the label of the end vertex and is of the STRING type in the GDB instance. 
                        },
                        {
                            "name": "weight",               // The name of the edge property. 
                            "type": "double",               // The data type for storing the data to be synchronized. 
                            "columnType": "edgeProperty"    // The category of the edge property. The value edgeProperty indicates a common edge property. 
                        }
                    ]
                },
                "stepType":"gdb"
            },
            {
                "category":"writer",
                "name":"Writer",
                "parameter":{
                    "print": true
                },
                "stepType":"stream"
            }
        ]
    }

Parameters in code for GDB Reader

Parameter

Description

Required

Default value

host

The endpoint that is used to connect to the GDB instance. You can log on to the GDB console, find the instance that you want to configure, and click View Instance Details in the Actions column to view Intranet URL.

Yes

No default value

port

The port number that is used to connect to the GDB instance.

Yes

8182

username

The username that is used to connect to the GDB instance.

Yes

No default value

password

The password that is used to connect to the GDB instance.

Yes

No default value

labels

The label, which is the name of the vertex or edge. GDB Reader can read data from multiple vertices or edges at a time. In this case, the value of this parameter is an array, such as ["label1", "label2"].

Yes

No default value

labelType

The type of the label. Valid values:

  • VERTEX

  • EDGE

Yes

No default value

column

The vertices or edges to be synchronized.

Yes

No default value

column -> name

The name of the vertex or edge property to be synchronized. This parameter is required if vertex or edge properties are to be synchronized.

Yes

No default value

column -> type

The data type for storing the vertex or edge property to be synchronized.

  • The primary key and label can only be of the STRING type. If you do not set the data type to STRING, data conversion fails.

  • Other properties can be of the INT, LONG, FLOAT, DOUBLE, BOOLEAN, or STRING type.

  • GDB Reader forcibly converts the obtained data to the specified type. If the conversion fails, the data record is lost.

Yes

No default value

column -> columnType

The category of the vertex or edge property to be synchronized.

  • For both vertices and edges:

    • primaryKey: the primary key.

    • primaryLabel: the label.

  • For vertices:

    • vertexProperty: a common property of the vertex.

    • vertexJsonProperty: a collection of the properties of the vertex, in the JSON format. If you set the columnType parameter to vertexJsonProperty, all properties are listed in this column. Other columns cannot contain the property of the vertex.

      Example of vertexJsonProperty:

      {
          "properties":[
              {"k":"name","t":"string","v":"tom","c":"set"},
              {"k":"name","t":"string","v":"jack","c":"set"},
              {"k":"sex","t":"string","v":"male","c":"single"}
          ]
      }
                                                          

      The preceding code contains a multi-value property name and a single-value property gender. The name property has two records. Although the gender property is a multi-value property, it is regarded as a single-value property in this example because only one related record exists.

  • For edges:

    • srcPrimaryKey: the primary key of the start vertex.

    • dstPrimaryKey: the primary key of the end vertex.

    • srcPrimaryLabel: the label of the start vertex.

    • dstPrimaryLabel: the label of the end vertex.

    • edgeProperty: a property of the edge.

    • edgeJsonProperty: a collection of the properties of the edge, in the JSON format. If you set the columnType parameter to edgeJsonProperty, all properties are listed in this column. Other columns cannot contain the property of the edge.

      Example of edgeJsonProperty:

      {
          "properties":[
              {"k":"name","t":"string","v":"tom"},
              {"k":"sex","t":"string","v":"male"}
      ]
      }
                                                          

      An edge does not support multi-value properties or the c field.

Yes

No default value

Code for GDB Writer

  • Configure a synchronization task to write data about vertices to a GDB database

    {
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        },
        "setting":{
            "errorLimit":{
                "record":"100"  // The maximum number of dirty data records allowed. 
            },
            "speed":{
                 "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
                "concurrent":3, // The maximum number of parallel threads. 
                "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
            }
        },
        "steps":[
            {
                "category":"reader",
                "name":"Reader",
                "parameter":{
                    "column":[
                        "*"
                    ],
                    "datasource":"_ODPS",
                    "emptyAsNull":true,
                    "guid":"",
                    "isCompress":false,
                    "partition":[],
                    "table":""
                },
                "stepType":"odps"
            },
            {
                "category":"writer",
                "name":"Writer",
                "parameter": {
                    "datasource": "testGDB", // The name of the data source. 
                    "label": "person", // The label, which is the name of the vertex. 
                    "srcLabel": "", // You do not need to configure this parameter for a vertex. 
                    "dstLabel": "", // You do not need to configure this parameter for a vertex. 
                    "labelType": "VERTEX", // The type of the label. The value VERTEX indicates a vertex. 
                    "writeMode": "INSERT", // The mode in which GDB Writer processes data records with duplicate primary keys. 
                    "idTransRule": "labelPrefix", // The rule for converting the primary key of a vertex. 
                    "srcIdTransRule": "none", // You do not need to configure this parameter for a vertex. 
                    "dstIdTransRule": "none", // You do not need to configure this parameter for a vertex. 
                    "column": [
                        {
                            "name": "id", // The name of the vertex property. 
                            "value": "#{0}", // The value of the first column in the source is used as the value of the vertex property. If multiple columns are specified, the columns can be concatenated. In this example, 0 is the column index. 
                            "type": "string", // The data type of the vertex property. 
                            "columnType": "primaryKey" // The category of the vertex property. The value primaryKey indicates the primary key. 
                        }, // The primary key of the vertex. The value must be an ID of the STRING type, and the record must exist. 
                        {
                            "name": "person_age",
                            "value": "#{1}", // The value of the second column in the source is used as the value of the vertex property. If multiple columns are specified, the columns can be concatenated. 
                            "type": "int",
                            "columnType": "vertexProperty" // The category of the vertex property. The value vertexProperty indicates a common vertex property. 
                        }, // A common property of the vertex. The value can be of the INT, LONG, FLOAT, DOUBLE, BOOLEAN, or STRING type. 
                        {
                            "name": "person_credit",
                            "value": "#{2}", // The value of the third column in the source is used as the value of the vertex property. If multiple columns are specified, the columns can be concatenated. 
                            "type": "string",
                            "columnType": "vertexProperty"
                        }, // A common property of the vertex. 
                    ]
                }
                "stepType":"gdb"
            }
        ],
        "type":"job",
        "version":"2.0"
    }
  • Configure a synchronization task to write data about edges to a GDB database

    {
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        },
        "setting":{
            "errorLimit":{
                "record":"100" // The maximum number of dirty data records allowed. 
            },
            "jvmOption":"",
            "speed":{
                "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
                "concurrent":3, // The maximum number of parallel threads. 
                "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
            }
        },
        "steps":[
            {
                "category":"reader",
                "name":"Reader",
                "parameter":{
                    "column":[
                        "*"
                    ],
                    "datasource":"_ODPS",
                    "emptyAsNull":true,
                    "guid":"",
                    "isCompress":false,
                    "partition":[],
                    "table":""
                },
                "stepType":"odps"
            },
            {
                "category":"writer",
                "name":"Writer",
                "parameter": {
                    "datasource": "testGDB", // The name of the data source. 
                    "label": "use", // The label, which is the name of the edge. 
                    "labelType": "EDGE", // The type of the label. The value EDGE indicates an edge. 
                    "srcLabel": "person", // The name of the start vertex in the edge. 
                    "dstLabel": "software", // The name of the end vertex in the edge. 
                    "writeMode": "INSERT", // The mode in which GDB Writer processes data records with duplicate primary keys. 
                    "idTransRule": "labelPrefix", // The rule for converting the primary key of the edge. 
                    "srcIdTransRule": "labelPrefix", // The rule for converting the primary key of the start vertex in the edge. 
                    "dstIdTransRule": "labelPrefix", // The rule for converting the primary key of the end vertex in the edge. 
                    "column": [
                        {
                            "name": "id", // The name of the edge property. 
                            "value": "#{0}", // The value of the first column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. 
                            "type": "string", // The data type of the edge property. 
                            "columnType": "primaryKey" // The category of the edge property. The value primaryKey indicates the primary key. 
                        }, // The primary key of the edge. The value must be an ID of the STRING type, and the record must exist. 
                        {
                            "name": "id",
                            "value": "#{1}", // The value of the second column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. The mapping rule must be the same as that configured when you import the vertex. 
                            "type": "string",
                            "columnType": "srcPrimaryKey" // The category of the edge property. The value srcPrimaryKey indicates the primary key of the start vertex. 
                        }, // The primary key of the start vertex. The value must be an ID of the STRING type, and the record must exist. 
                        {
                            "name": "id",
                            "value": "#{2}", // The value of the third column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. The mapping rule must be the same as that configured when you import the vertex. 
                            "type": "string",
                            "columnType": "dstPrimaryKey" // The category of the edge property. The value dstPrimaryKey indicates the primary key of the end vertex. 
                        }, // The primary key of the end vertex. The value must be an ID of the STRING type, and the record must exist. 
                        {
                            "name": "person_use_software_time",
                            "value": "#{3}", // The value of the fourth column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. 
                            "type": "long",
                            "columnType": "edgeProperty" // The category of the edge property. The value edgeProperty indicates a common edge property. 
                        }, // A common property of the edge. The value can be of the INT, LONG, FLOAT, DOUBLE, BOOLEAN, or STRING type. 
                        {
                            "name": "person_regist_software_name",
                            "value": "#{4}", // The value of the fifth column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. 
                            "type": "string",
                            "columnType": "edgeProperty"
                        }, // A common property of the edge.
                        {
                            "name": "id",
                            "value": "#{5}", // The value of the sixth column in the source is used as the value of the edge property. If multiple columns are specified, the columns can be concatenated. 
                            "type": "long",
                            "columnType": "edgeProperty"
                        }, // A common property of the edge. The value is an ID. Different from the primary key, this property is optional. 
                    ]
                }
                "stepType":"gdb"
            }
        ],
        "type":"job",
        "version":"2.0"
    }

Parameters in code for GDB Writer

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.

Yes

No default value

label

The label, which is the name of the vertex or edge.

GDB Writer can obtain labels from columns in the source table. For example, if you set this parameter to #{0}, GDB Writer uses the value of the first column as the label. The column index starts from 0.

Yes

No default value

labelType

The type of the label. Valid values:

  • VERTEX

  • EDGE

Yes

No default value

srcLabel

  • The name of the start vertex in an edge when the labelType parameter is set to EDGE.

    This parameter can be left empty if srcIdTransRule is set to none. If srcIdTransRule is set to another value, this parameter is required.

  • Leave this parameter empty if the labelType parameter is set to VERTEX.

No

No default value

dstLabel

  • The name of the end vertex in an edge when the labelType parameter is set to EDGE.

    This parameter can be left empty if dstIdTransRule is set to none. If dstIdTransRule is set to another value, this parameter is required.

  • Leave this parameter empty if the labelType parameter is set to VERTEX.

No

No default value

writeMode

The mode in which GDB Writer processes data records with duplicate primary keys. Valid values:

  • INSERT: returns an error message. The number of error data records is increased by 1.

  • MERGE: overwrites the existing data record with the new one.

Yes

INSERT

idTransRule

The rule for converting the primary key. Valid values:

  • labelPrefix: converts the primary key into the {label}-{column in source} format.

  • none: does not convert the primary key.

Yes

none

srcIdTransRule

The rule for converting the primary key of the start vertex when the labelType parameter is set to EDGE. Valid values:

  • labelPrefix: converts the primary key into the {label}-{column in source} format.

  • none: does not convert the primary key. In this case, the srcLabel parameter can be left empty.

Required when the labelType parameter is set to EDGE

none

dstIdTransRule

The rule for converting the primary key of the end vertex when the labelType parameter is set to EDGE. Valid values:

  • labelPrefix: converts the primary key into the {label}-{column in source} format.

  • none: does not convert the primary key. In this case, the dstLabel parameter can be left empty.

Required when the labelType parameter is set to EDGE

none

column

The vertices or edges that you want to synchronize.

  • name: the name of the vertex or edge property.

  • value: the value of the vertex or edge property. You can customize a value only in the code editor.

    • #{N}: uses the value of the Nth column in the source as the value of the vertex or edge property. N indicates the column index, which starts from 0.

    • #{0}: uses the value of the first column in the source as the value of the vertex or edge property.

    • test-#{0}: appends a fixed string such as test- to the beginning or end of #{0}.

    • #{0}-#{1}: combines the values of multiple columns in the source as the value of a vertex or edge property. You can also add fixed strings at any positions, such as test-#{0}-test1-#{1}-test2.

  • type: the data type of the vertex or edge property.

    The primary key must be of the STRING type. If the value obtained from the source is not a string, GDB Writer forcibly converts the value into a string. Make sure that the value can be converted into a string.

    Other properties can be of the INT, LONG, FLOAT, DOUBLE, BOOLEAN, or STRING type.

  • columnType: the category of the vertex or edge property that you want to synchronize.

    • For both vertices and edges

      primaryKey: the primary key.

    • For vertices

      • vertexProperty: a common property of a vertex.

      • vertexJsonProperty: a JSON property of the vertex. For more information about the value structure, see the sample of properties.

    • For edges

      • srcPrimaryKey: the primary key of the start vertex.

      • dstPrimaryKey: the primary key of the end vertex.

      • edgeProperty: a common property of an edge.

      • edgeJsonProperty: a JSON property of an edge. For more information about the value structure, see the sample of properties.

Sample of properties

{"properties":[
    {"k":"name","t":"string","v":"tom"},
    {"k":"age","t":"int","v":"20"},
    {"k":"sex","t":"string","v":"male"}
]}

Yes

No default value