All Products
Search
Document Center

DataWorks:Salesforce data source

Last Updated:Sep 09, 2024

Salesforce provides customer relationship management (CRM) software that is focused on contact management, product catalog management, order management, opportunity management, and sales management. DataWorks provides Salesforce Reader for you to read data from Salesforce data sources. This topic describes the capabilities of synchronizing data from Salesforce data sources.

Data type mappings

Data type

Data type in code editor

address

STRING

anyType

STRING

base64

BYTES

boolean

BOOL

combobox

STRING

complexvalue

STRING

currency

DOUBLE

date

DATE

datetime

DATE

double

DOUBLE

email

STRING

encryptedstring

STRING

id

STRING

int

LONG

json

STRING

long

LONG

multipicklist

STRING

percent

DOUBLE

phone

STRING

picklist

STRING

reference

STRING

string

STRING

textarea

STRING

time

DATE

url

STRING

geolocation

STRING

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.

Add a data source

Before you configure a data synchronization task to synchronize data from a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

Salesforce allows you to add a data source by using one of the following methods:

  • Salesforce official: Log on to the Salesforce official website to obtain the access address of Salesforce. Then, add a data source based on the access address.

  • Custom: Specify the consumer key and consumer secret of a connected application to obtain the access address of Salesforce. Then, add a data source based on the access address.

    Add a data source in custom mode

    Create a connected application

    1. Go to the creation page.

      1. Log on to the Salesforce official website.

      2. In the top navigation bar, click the image icon. In the left-side navigation pane, choose Apps > App Manager.

      3. On the Lightning Experience App Manager page, click New Connected App.

        image

    2. Configure the parameters.

      image

      The following table describes the key parameters.

      Area

      Description

      1

      Enter a valid application name in the Connected App Name field, a valid API name in the API Name field, and an email address in the Contact Email field based on your business requirements.

      2

      Select Enable OAuth Settings. Set the Callback URL parameter to https://bff-cn-shanghai.data.aliyun.com/di/oauth/callback/index.html.

      3

      Select the following scopes for the Selected OAuth Scopes parameter:

      • Access Connect REST API resources (chatter api)

      • Access the identity URL service (id, profile, email, address, phone)

      • Access unique user identifiers (openid)

      • Manage user data via APIs (api)

      • Perform requests at any time (refresh token, offline_access)

      4

      • Clear Require Proof Key for Code Exchange (PKCE) Extension for Supported Authorization Flows.

      • Select Require Secret for Web Server Flow.

      • Select Require Secret for Refresh Token Flow.

    3. View the consumer key and consumer secret of the connected application.

      1. Find the created application on the App Manager page, click the image icon to the right of the application, and then click View.

      2. In the API (Enable OAuth Settings) section of the Manage Connected Apps page, view the consumer key and consumer secret.

        image

      3. Copy the consumer key and consumer secret.

        image

    Configure the parameters that are required to add a data source

    1. Go to the Data Integration page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Data Integration in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

    2. In the left-side navigation pane, click Data Source to go to the Data Source page.

    3. On the Data Sources page, click Add Data Source. In the Add Data Source dialog box, search for Salesforce and click Salesforce. In the Add Salesforce Data Source dialog box, select Custom for the Data Source Type parameter.

      image

      The following table describes the key parameters.

      Parameter

      Description

      Login Page URL

      Enter https://<Salesforce domain name>/services/oauth2/authorize in the field.

      Token Sign URL

      Enter https://<Salesforce domain name>/services/oauth2/token in the field.

      Consumer Key and Consumer Secret

      Enter the consumer key and consumer secret that you obtained in the previous step.

      Click Log On to Salesforce. On the page that appears, enter the username and password and click Allow.

      image

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader in the code editor.

Code for Salesforce Reader

Example 1: Query Salesforce objects

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "sobject",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Example 2: Use Bulk API 1.0 to query Salesforce objects

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "bulk1",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "blockCompoundColumn":true,
        "bulkQueryJobTimeoutSeconds":86400,
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{
        "print": true
      },
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "concurrent":1
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Example 3: Use Bulk API 2.0 to query Salesforce objects

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "bulk2",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "blockCompoundColumn":true,
        "bulkQueryJobTimeoutSeconds":86400,
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Example 4: Execute a Salesforce Object Query Language (SOQL) statement to query data

{
  "type":"job",
  "version":"2.0",
  "steps":[
    {
      "stepType":"salesforce",
      "parameter":{
        "datasource":"",
        "serviceType": "query",
        "query": "select Id, Name, IsDeleted, CreatedDate from Account where Name!='Aliyun' ",
        "column": [
          {
            "type": "STRING",
            "name": "Id"
          },
          {
            "type": "STRING",
            "name": "Name"
          },
          {
            "type": "BOOL",
            "name": "IsDeleted"
          },
          {
            "type": "DATE",
            "name": "CreatedDate"
          }
        ]
      },
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Writer",
      "category":"writer"
    }
  ],
  "setting":{
    "errorLimit":{
      "record":"0"
    },
    "speed":{
      "throttle":true,
      "concurrent":1,
      "mbps":"12"
    }
  },
  "order":{
    "hops":[
      {
        "from":"Reader",
        "to":"Writer"
      }
    ]
  }
}

Parameters in code for Salesforce Reader

Parameter

Required

Description

Default value

datasource

Yes

The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.

No default value

serviceType

No

The synchronization mode. Valid values:

  • sobject: queries Salesforce objects.

  • query: queries data by executing SOQL statements.

  • bulk1: queries Salesforce objects by using Salesforce Bulk API 1.0.

  • bulk2: queries Salesforce objects by using Salesforce Bulk API 2.0.

    Important
    • The bulk1 and bulk2 modes do not support columns of combined data types, such as columns of the address and geolocation data types.

    • The bulk2 mode does not support distributed tasks.

    • In some cases, bulk1 outperforms bulk2 in the query performance. You can test the query performance based on your Salesforce objects and select a synchronization mode based on your business requirements.

sobject

table

Yes

The Salesforce object, such as Account, Case, or Group. Objects are equivalent to tables. This parameter is required if you set the serviceType parameter to sobject, bulk1, or bulk2.

No default value

beginDateTime

No

  • The start and end time of data consumption. The parameters are required if you set the serviceType parameter to sobject, bulk1, or bulk2.

  • If you filter data based on the last modified time of Salesforce objects, the time fields that are related to object modification are queried based on the following sequence: SystemModstamp > LastModifiedDate > CreatedDate.

  • The time range is a left-closed, right-open interval.

  • The time is in the yyyymmddhhmmss format. You can use the parameters together with the scheduling parameters in DataWorks to read incremental data.

No default value

endDateTime

No

No default value

splitPk

No

  • The field that is used for data sharding. This parameter is required if you set the serviceType parameter to sobject.

  • If you configure this parameter, the source object is sharded based on the value of this parameter. Data Integration runs parallel threads to read data. This way, data can be synchronized more efficiently.

  • You can set this parameter to a field of the datetime, int, or long data type. For fields of other types, an error is reported.

No default value

blockCompoundColumn

No

Specifies whether to support columns of combined data types. This parameter is required if you set the serviceType parameter to bulk1 or bulk2. Valid values:

  • true: If columns of combined data types exist, the task fails and needs to be rerun after you remove the column mappings of combined data types.

  • false: Data in columns of combined data types is read as NULL values.

true

bulkQueryJobTimeoutSeconds

No

  • The timeout period for preparing batch data. Unit: seconds. This parameter is required if you set the serviceType parameter to bulk1 or bulk2.

  • Before Salesforce Reader starts to read data, the Salesforce server runs a task to prepare batch data. If the running duration of the task exceeds the specified timeout period, the task times out and fails.

86400

batchSize

No

  • The number of data records to download at a time. This parameter is required if you set the serviceType parameter to bulk1 or bulk2.

  • To achieve the optimal download performance, you can set this parameter to a value that is slightly greater than the value of automatic sharding for batch data preparation tasks in Salesforce.

  • Data is downloaded in streaming mode. Therefore, increasing the value of this parameter does not occupy more memory.

  • This parameter is an advanced parameter, which is available only in the code editor.

300000

where

No

  • The WHERE clause. This parameter is required if you set the serviceType parameter to sobject, bulk1, or bulk2.

  • In actual business scenarios, you can specify a WHERE clause to filter data, such as Name != 'Aliyun'.

  • If the where parameter is not provided or is left empty, Salesforce Reader reads all data.

  • Do not set the where parameter to limit 10, which does not conform to the constraints of Salesforce on the SOQL WHERE clause.

No default value

query

No

  • The query statement that is used for refined data filtering. This parameter is required if you set the serviceType parameter to query.

  • If you configure this parameter, Data Integration filters data based on the value of this parameter. In addition, Salesforce Reader ignores the table, column, beginDateTime, endDateTime, where, and splitPk parameters that you configured. Example: select Id, Name, IsDeleted from Account where Name! ='Aliyun'.

  • This parameter is an advanced parameter, which is available only in the code editor.

No default value

queryAll

No

  • The SQL statement that is used for querying all data. This parameter is required if you set the serviceType parameter to sobject or query.

  • If you set this parameter to true, all data including deleted data is queried. The IsDeleted field can be used to determine whether a data record is deleted.

false

column

Yes

The names of the columns that you want to synchronize. Specify the names in a JSON array.

  • You can read data from all or some of the columns.

  • The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table.

  • Constants are supported. Example:

    [
      {
        "name": "Id",
        "type": "STRING"
      },
      {
        "name": "Name",
        "type": "STRING"
      },
      {
        "name": "'123'",
        "type": "LONG"
      },
      {
        "name": "'abc'",
        "type": "STRING"
      }
    ]
    Note
    • Id and Name: column names.

    • '123': an integer constant, which is enclosed in single quotation marks (').

    • 'abc': a string constant, which is enclosed in single quotation marks (').

  • The column parameter must explicitly specify all the columns from which you want to read data. This parameter cannot be left empty.

No default value

connectTimeoutSeconds

No

  • The timeout period for HTTP requests. Unit: seconds. If the specified timeout period is exceeded, the task fails.

  • This parameter is an advanced parameter, which is available only in the code editor.

30

socketTimeoutSeconds

No

  • The timeout period for HTTP responses. Unit: seconds. If the interval between two packets is greater than the specified timeout period, the task fails.

  • This parameter is an advanced parameter, which is available only in the code editor.

600

retryIntervalSeconds

No

  • The retry interval. Unit: seconds.

  • This parameter is an advanced parameter, which is available only in the code editor.

60

retryTimes

No

  • The number of retries.

  • This parameter is an advanced parameter, which is available only in the code editor.

3