All Products
Search
Document Center

DataWorks:Salesforce data source

Last Updated:Jan 08, 2024

Salesforce provides customer relationship management (CRM) software that is focused on contact management, product catalog management, order management, opportunity management, and sales management. DataWorks provides Salesforce Reader for you to read data from Salesforce data sources. This topic describes the capabilities of synchronizing data from Salesforce data sources.

Data type mappings

Data type

Data type in code editor

address

STRING

anyType

STRING

base64

BYTES

boolean

BOOL

combobox

STRING

complexvalue

STRING

currency

DOUBLE

date

DATE

datetime

DATE

double

DOUBLE

email

STRING

encryptedstring

STRING

id

STRING

int

LONG

json

STRING

long

LONG

multipicklist

STRING

percent

DOUBLE

phone

STRING

picklist

STRING

reference

STRING

string

STRING

textarea

STRING

time

DATE

url

STRING

geolocation

STRING

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.

Add a data source

Before you configure a data synchronization task to synchronize data from a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader in the code editor.

Code for Salesforce Reader

Example 1: Query Salesforce objects

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "sobject",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Example 2: Query Salesforce objects in batches

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "bulk2",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "blockCompoundColumn":true,
        "bulkQueryJobTimeoutSeconds":86400,
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Example 3: Execute a Salesforce Object Query Language (SOQL) statement to query data

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "query",
        "query": "select Id, Name, IsDeleted, CreatedDate from Account where Name!='Aliyun' ",
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Parameters in code for Salesforce Reader

Parameter

Required

Description

Default value

datasource

Yes

The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.

No default value

serviceType

No

The synchronization mode. Valid values:

  • sobject: queries Salesforce objects.

  • query: queries data by executing SOQL statements.

  • bulk2: queries Salesforce objects by using Salesforce Bulk API 2.0.

    Important
    • The bulk2 mode does not support columns of combined data types, such as columns of the address and geolocation data types.

    • The bulk2 mode does not support distributed tasks.

sobject

table

Yes

The Salesforce object, such as Account, Case, or Group. Objects are equivalent to tables. This parameter is required if you set the serviceType parameter to sobject or bulk2.

No default value

beginDateTime

No

  • The start and end time of data consumption. The parameters are required if you set the serviceType parameter to sobject or bulk2.

  • If you filter data based on the last modified time of Salesforce objects, the time fields that are related to object modification are queried based on the following sequence: SystemModstamp > LastModifiedDate > CreatedDate.

  • The time range is a left-closed, right-open interval.

  • The time is in the yyyymmddhhmmss format. You can use the parameters together with the scheduling parameters in DataWorks to read incremental data.

No default value

endDateTime

No

No default value

splitPk

No

  • The field that is used for data sharding. This parameter is required if you set the serviceType parameter to sobject.

  • If you configure this parameter, the source object is sharded based on the value of this parameter. Data Integration runs parallel threads to read data. This way, data can be synchronized more efficiently.

  • You can set this parameter to a field of the datetime, int, or long data type. For fields of other types, an error is reported.

No default value

blockCompoundColumn

No

Specifies whether to support columns of combined data types. This parameter is required if you set the serviceType parameter to bulk2. Valid values:

  • true: If columns of combined data types exist, the task fails and needs to be rerun after you remove the column mappings of combined data types.

  • false: Data in columns of combined data types is read as NULL values.

true

bulkQueryJobTimeoutSeconds

No

  • The timeout period for preparing batch data. Unit: seconds. This parameter is required if you set the serviceType parameter to bulk2.

  • Before Salesforce Reader starts to read data, the Salesforce server runs a task to prepare batch data. If the running duration of the task exceeds the specified timeout period, the task times out and fails.

86400

batchSize

No

  • The number of data records to download at a time. This parameter is required if you set the serviceType parameter to bulk2.

  • To achieve the optimal download performance, you can set this parameter to a value that is slightly greater than the value of automatic sharding for batch data preparation tasks in Salesforce.

  • Data is downloaded in streaming mode. Therefore, increasing the value of this parameter does not occupy more memory.

  • This parameter is an advanced parameter, which is available only in the code editor.

300000

where

No

  • The WHERE clause. This parameter is required if you set the serviceType parameter to sobject or bulk2.

  • In actual business scenarios, you can specify a WHERE clause to filter data, such as Name != 'Aliyun'.

  • If the where parameter is not provided or is left empty, Salesforce Reader reads all data.

  • Do not set the where parameter to limit 10, which does not conform to the constraints of Salesforce on the SOQL WHERE clause.

No default value

query

No

  • The query statement that is used for refined data filtering. This parameter is required if you set the serviceType parameter to query.

  • If you configure this parameter, Data Integration filters data based on the value of this parameter. In addition, Salesforce Reader ignores the table, column, beginDateTime, endDateTime, where, and splitPk parameters that you configured. Example: select Id, Name, IsDeleted from Account where Name! ='Aliyun'.

  • This parameter is an advanced parameter, which is available only in the code editor.

No default value

queryAll

No

  • The SQL statement that is used for querying all data. This parameter is required if you set the serviceType parameter to sobject or query.

  • If you set this parameter to true, all data including deleted data is queried. The IsDeleted field can be used to determine whether a data record is deleted.

false

column

Yes

The names of the columns that you want to synchronize. Specify the names in a JSON array.

  • You can read data from all or some of the columns.

  • The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table.

  • Constants are supported. Example:

    [
      {
        "name": "Id",
        "type": "STRING"
      },
      {
        "name": "Name",
        "type": "STRING"
      },
      {
        "name": "'123'",
        "type": "LONG"
      },
      {
        "name": "'abc'",
        "type": "STRING"
      }
    ]
    Note
    • Id and Name: column names.

    • '123': an integer constant, which is enclosed in single quotation marks (').

    • 'abc': a string constant, which is enclosed in single quotation marks (').

  • The column parameter must explicitly specify all the columns from which you want to read data. This parameter cannot be left empty.

No default value

connectTimeoutSeconds

No

  • The timeout period for HTTP requests. Unit: seconds. If the specified timeout period is exceeded, the task fails.

  • This parameter is an advanced parameter, which is available only in the code editor.

30

socketTimeoutSeconds

No

  • The timeout period for HTTP responses. Unit: seconds. If the interval between two packets is greater than the specified timeout period, the task fails.

  • This parameter is an advanced parameter, which is available only in the code editor.

600

retryIntervalSeconds

No

  • The retry interval. Unit: seconds.

  • This parameter is an advanced parameter, which is available only in the code editor.

60

retryTimes

No

  • The number of retries.

  • This parameter is an advanced parameter, which is available only in the code editor.

3