Salesforce data source - DataWorks - Alibaba Cloud Documentation Center

Salesforce provides customer relationship management (CRM) software that is focused on contact management, product catalog management, order management, opportunity management, and sales management. DataWorks provides Salesforce Reader for you to read data from Salesforce data sources. This topic describes the capabilities of synchronizing data from Salesforce data sources.

Data type mappings

Data type	Data type in code editor
address	STRING
anyType	STRING
base64	BYTES
boolean	BOOL
combobox	STRING
complexvalue	STRING
currency	DOUBLE
date	DATE
datetime	DATE
double	DOUBLE
email	STRING
encryptedstring	STRING
id	STRING
int	LONG
json	STRING
long	LONG
multipicklist	STRING
percent	DOUBLE
phone	STRING
picklist	STRING
reference	STRING
string	STRING
textarea	STRING
time	DATE
url	STRING
geolocation	STRING

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.

Add a data source

Before you configure a data synchronization task to synchronize data from a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

Salesforce allows you to add a data source by using one of the following methods:

Salesforce official: Log on to the Salesforce official website to obtain the access address of Salesforce. Then, add a data source based on the access address.

Custom: Specify the consumer key and consumer secret of a connected application to obtain the access address of Salesforce. Then, add a data source based on the access address.

Add a data source in custom mode

Create a connected application

Go to the creation page.
1. Log on to the Salesforce official website.
2. In the top navigation bar, click the icon. In the left-side navigation pane, choose Apps > App Manager.
3. On the Lightning Experience App Manager page, click New Connected App.

Configure the parameters.

The following table describes the key parameters.

Area	Description
1	Enter a valid application name in the Connected App Name field, a valid API name in the API Name field, and an email address in the Contact Email field based on your business requirements.
2	Select Enable OAuth Settings. Set the Callback URL parameter to `https://bff-cn-shanghai.data.aliyun.com/di/oauth/callback/index.html`.
3	Select the following scopes for the Selected OAuth Scopes parameter: Access Connect REST API resources (chatter api) Access the identity URL service (id, profile, email, address, phone) Access unique user identifiers (openid) Manage user data via APIs (api) Perform requests at any time (refresh token, offline_access)
4	Clear Require Proof Key for Code Exchange (PKCE) Extension for Supported Authorization Flows. Select Require Secret for Web Server Flow. Select Require Secret for Refresh Token Flow.

View the consumer key and consumer secret of the connected application.
1. Find the created application on the App Manager page, click the icon to the right of the application, and then click View.
2. In the API (Enable OAuth Settings) section of the Manage Connected Apps page, view the consumer key and consumer secret.
3. Copy the consumer key and consumer secret.

Configure the parameters that are required to add a data source

Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Data Integration in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Data Source to go to the Data Source page.

On the Data Sources page, click Add Data Source. In the Add Data Source dialog box, search for Salesforce and click Salesforce. In the Add Salesforce Data Source dialog box, select Custom for the Data Source Type parameter.

The following table describes the key parameters.

Parameter	Description
Login Page URL	Enter `https://<Salesforce domain name>/services/oauth2/authorize` in the field.
Token Sign URL	Enter `https://<Salesforce domain name>/services/oauth2/token` in the field.
Consumer Key and Consumer Secret	Enter the consumer key and consumer secret that you obtained in the previous step.

Click Log On to Salesforce. On the page that appears, enter the username and password and click Allow.

Configure a batch synchronization task to synchronize data of a single table

For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader in the code editor.

Code for Salesforce Reader

Example 1: Query Salesforce objects

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "sobject",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Example 2: Use Bulk API 1.0 to query Salesforce objects

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "bulk1",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "blockCompoundColumn":true,
        "bulkQueryJobTimeoutSeconds":86400,
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{
        "print": true
      },
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "concurrent":1
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Example 3: Use Bulk API 2.0 to query Salesforce objects

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "bulk2",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "blockCompoundColumn":true,
        "bulkQueryJobTimeoutSeconds":86400,
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Example 4: Execute a Salesforce Object Query Language (SOQL) statement to query data

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "query",
        "query": "select Id, Name, IsDeleted, CreatedDate from Account where Name!='Aliyun' ",
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Parameters in code for Salesforce Reader

Parameter	Required	Description	Default value
datasource	Yes	The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.	No default value
serviceType	No	The synchronization mode. Valid values: sobject: queries Salesforce objects. query: queries data by executing SOQL statements. bulk1: queries Salesforce objects by using Salesforce Bulk API 1.0. bulk2: queries Salesforce objects by using Salesforce Bulk API 2.0. Important The bulk1 and bulk2 modes do not support columns of combined data types, such as columns of the address and geolocation data types. The bulk2 mode does not support distributed tasks. In some cases, bulk1 outperforms bulk2 in the query performance. You can test the query performance based on your Salesforce objects and select a synchronization mode based on your business requirements.	sobject
table	Yes	The Salesforce object, such as Account, Case, or Group. Objects are equivalent to tables. This parameter is required if you set the serviceType parameter to sobject, bulk1, or bulk2.	No default value
beginDateTime	No	The start and end time of data consumption. The parameters are required if you set the serviceType parameter to sobject, bulk1, or bulk2. If you filter data based on the last modified time of Salesforce objects, the time fields that are related to object modification are queried based on the following sequence: SystemModstamp > LastModifiedDate > CreatedDate. The time range is a left-closed, right-open interval. The time is in the yyyymmddhhmmss format. You can use the parameters together with the scheduling parameters in DataWorks to read incremental data.	No default value
endDateTime	No		No default value
splitPk	No	The field that is used for data sharding. This parameter is required if you set the serviceType parameter to sobject. If you configure this parameter, the source object is sharded based on the value of this parameter. Data Integration runs parallel threads to read data. This way, data can be synchronized more efficiently. You can set this parameter to a field of the datetime, int, or long data type. For fields of other types, an error is reported.	No default value
blockCompoundColumn	No	Specifies whether to support columns of combined data types. This parameter is required if you set the serviceType parameter to bulk1 or bulk2. Valid values: true: If columns of combined data types exist, the task fails and needs to be rerun after you remove the column mappings of combined data types. false: Data in columns of combined data types is read as NULL values.	true
bulkQueryJobTimeoutSeconds	No	The timeout period for preparing batch data. Unit: seconds. This parameter is required if you set the serviceType parameter to bulk1 or bulk2. Before Salesforce Reader starts to read data, the Salesforce server runs a task to prepare batch data. If the running duration of the task exceeds the specified timeout period, the task times out and fails.	86400
batchSize	No	The number of data records to download at a time. This parameter is required if you set the serviceType parameter to bulk1 or bulk2. To achieve the optimal download performance, you can set this parameter to a value that is slightly greater than the value of automatic sharding for batch data preparation tasks in Salesforce. Data is downloaded in streaming mode. Therefore, increasing the value of this parameter does not occupy more memory. This parameter is an advanced parameter, which is available only in the code editor.	300000
where	No	The WHERE clause. This parameter is required if you set the serviceType parameter to sobject, bulk1, or bulk2. In actual business scenarios, you can specify a WHERE clause to filter data, such as Name != 'Aliyun'. If the where parameter is not provided or is left empty, Salesforce Reader reads all data. Do not set the where parameter to limit 10, which does not conform to the constraints of Salesforce on the SOQL WHERE clause.	No default value
query	No	The query statement that is used for refined data filtering. This parameter is required if you set the serviceType parameter to query. If you configure this parameter, Data Integration filters data based on the value of this parameter. In addition, Salesforce Reader ignores the table, column, beginDateTime, endDateTime, where, and splitPk parameters that you configured. Example: `select Id, Name, IsDeleted from Account where Name! ='Aliyun'`. This parameter is an advanced parameter, which is available only in the code editor.	No default value
queryAll	No	The SQL statement that is used for querying all data. This parameter is required if you set the serviceType parameter to sobject or query. If you set this parameter to true, all data including deleted data is queried. The IsDeleted field can be used to determine whether a data record is deleted.	false
column	Yes	The names of the columns that you want to synchronize. Specify the names in a JSON array. You can read data from all or some of the columns. The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table. Constants are supported. Example: `[ { "name": "Id", "type": "STRING" }, { "name": "Name", "type": "STRING" }, { "name": "'123'", "type": "LONG" }, { "name": "'abc'", "type": "STRING" } ]` Note Id and Name: column names. '123': an integer constant, which is enclosed in single quotation marks ('). 'abc': a string constant, which is enclosed in single quotation marks ('). The column parameter must explicitly specify all the columns from which you want to read data. This parameter cannot be left empty.	No default value
connectTimeoutSeconds	No	The timeout period for HTTP requests. Unit: seconds. If the specified timeout period is exceeded, the task fails. This parameter is an advanced parameter, which is available only in the code editor.	30
socketTimeoutSeconds	No	The timeout period for HTTP responses. Unit: seconds. If the interval between two packets is greater than the specified timeout period, the task fails. This parameter is an advanced parameter, which is available only in the code editor.	600
retryIntervalSeconds	No	The retry interval. Unit: seconds. This parameter is an advanced parameter, which is available only in the code editor.	60
retryTimes	No	The number of retries. This parameter is an advanced parameter, which is available only in the code editor.	3