All Products
Search
Document Center

DataWorks:Salesforce

Last Updated:Mar 26, 2026

Salesforce provides customer relationship management (CRM) software focused on contact management, product catalog management, order management, opportunity management, and sales management. DataWorks provides Salesforce Reader to batch-synchronize data from Salesforce into your data warehouse or data lake. Salesforce Reader supports four sync modes — standard object sync, Bulk API 1.0, Bulk API 2.0, and SOQL query — each suited to different data volumes and column types.

Prerequisites

Before you begin, make sure you have:

  • A resource group whose virtual private cloud (VPC) network has outbound connectivity to your Salesforce domain. Without this, the data source connection fails.

Data type mappings

The following table shows how Salesforce data types map to DataWorks types in the code editor.

Salesforce type DataWorks type
address STRING
anyType STRING
base64 BYTES
boolean BOOL
combobox STRING
complexvalue STRING
currency DOUBLE
date DATE
datetime DATE
double DOUBLE
email STRING
encryptedstring STRING
id STRING
int LONG
json STRING
long LONG
multipicklist STRING
percent DOUBLE
phone STRING
picklist STRING
reference STRING
string STRING
textarea STRING
time DATE
url STRING
geolocation STRING
The currency, double, and percent types all map to DOUBLE. If your data has high-precision decimal values, verify that DOUBLE precision meets your requirements before running a production sync.

Add a data source

Before configuring a batch synchronization task, add Salesforce as a data source in DataWorks. For the general procedure, see Data source management.

Salesforce supports two authentication modes:

  • Salesforce official: Log on to the Salesforce official website to obtain the access address of Salesforce, then add a data source based on the access address.

  • Custom: Create a Connected App in Salesforce to issue dedicated OAuth credentials (Consumer Key and Consumer Secret) for DataWorks. Use this mode when you need a service-specific account with minimum required permissions.

Add a data source in custom mode

Custom mode requires you to first create a Connected App in Salesforce, then register its credentials in DataWorks.

Create a Connected App in Salesforce

  1. Go to the app creation page.

    1. Log in to your Salesforce system.

    2. In the top navigation bar, click the image icon. In the left-side navigation pane, choose Apps > App Manager.

    3. On the Lightning Experience App Manager page, click New Connected App. image

  2. Configure the app settings.

    Area Setting
    1 Enter a Connected App Name, API Name, and Contact Email.
    2 Select Enable OAuth Settings. Set Callback URL to https://bff-cn-shanghai.data.aliyun.com/di/oauth/callback/index.html.
    3 Under Selected OAuth Scopes, add: Access Connect REST API resources (chatter api), Access the identity URL service (id, profile, email, address, phone), Access unique user identifiers (openid), Manage user data via APIs (api), and Perform requests at any time (refresh token, offline_access).
    4 Clear Require Proof Key for Code Exchange (PKCE) Extension for Supported Authorization Flows. Select both Require Secret for Web Server Flow and Require Secret for Refresh Token Flow.

    image

  3. Copy the Consumer Key and Consumer Secret.

    1. On the App Manager page, find the app you created. Click the image icon to its right, then click View.

    2. In the API (Enable OAuth Settings) section, locate the Consumer Key and Consumer Secret. image

    3. Copy both values. image

Register the data source in DataWorks

  1. Go to the Data Integration page. Log in to the DataWorks console. In the top navigation bar, select your region. In the left-side navigation pane, choose Data Integration > Data Integration. Select your workspace and click Go to Data Integration.

  2. In the left-side navigation pane, click Data Source.

  3. On the Data Sources page, click Add Data Source. Search for Salesforce, select it, then choose Custom for Data Source Type.

    Parameter Value
    Login Page URL https://<Salesforce domain name>/services/oauth2/authorize
    Token Sign URL https://<Salesforce domain name>/services/oauth2/token
    Consumer Key The Consumer Key from your Connected App
    Consumer Secret The Consumer Secret from your Connected App

    image

  4. Click Log On to Salesforce. Enter your username and password on the Salesforce login page, then click Allow.

    image

Important

Salesforce is a third-party service. Ensure that the VPC network bound to the resource group has connectivity to this platform. Otherwise, the data source creation will fail.

Configure a batch synchronization task

After adding the data source, configure a batch synchronization task to read data from Salesforce.

Choose a sync mode

Salesforce Reader supports four sync modes via the serviceType parameter. Use the following table to select the right mode.

Mode serviceType value When to use
Standard object sync sobject (default) General-purpose sync; supports data sharding via splitPk for parallel reads; required for objects with compound columns (e.g., address, geolocation)
SOQL query query When you need to query data by executing a SOQL statement with custom filtering conditions
Bulk API 1.0 bulk1 Large-volume syncs; may outperform bulk2 for some objects
Bulk API 2.0 bulk2 Large-volume syncs; does not support distributed tasks
Important
  • bulk1 and bulk2 do not support columns of compound data types such as address and geolocation. If those columns exist in your object, use sobject instead, or set blockCompoundColumn to false to read them as NULL.

  • bulk2 does not support distributed tasks.

  • Test both bulk1 and bulk2 against your specific Salesforce objects to determine which performs better.

Incremental sync

When serviceType is sobject, bulk1, or bulk2, set beginDateTime and endDateTime to sync only records modified within a time window. DataWorks filters records using the following timestamp fields, in priority order:

  1. SystemModstamp

  2. LastModifiedDate

  3. CreatedDate

The time range is a left-closed, right-open interval (beginDateTime is included, endDateTime is excluded).

Use beginDateTime and endDateTime together with DataWorks scheduling parameters to automate incremental data reads.

Appendix: Code and parameters

Code examples

All four examples use the same job structure. The serviceType parameter in the Reader parameter block determines the sync mode.

Example 1: Standard object sync (sobject)

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "salesforce",
      "parameter": {
        "datasource": "",
        "serviceType": "sobject",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "column": [
          { "type": "STRING", "name": "Id" },
          { "type": "STRING", "name": "Name" },
          { "type": "BOOL",   "name": "IsDeleted" },
          { "type": "DATE",   "name": "CreatedDate" }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": { "record": "0" },
    "speed": { "throttle": true, "concurrent": 1, "mbps": "12" }
  },
  "order": { "hops": [{ "from": "Reader", "to": "Writer" }] }
}

Example 2: Bulk API 1.0

bulk1 adds blockCompoundColumn and bulkQueryJobTimeoutSeconds. It does not use the speed.throttle or mbps settings.

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "salesforce",
      "parameter": {
        "datasource": "",
        "serviceType": "bulk1",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "blockCompoundColumn": true,
        "bulkQueryJobTimeoutSeconds": 86400,
        "column": [
          { "type": "STRING", "name": "Id" },
          { "type": "STRING", "name": "Name" },
          { "type": "BOOL",   "name": "IsDeleted" },
          { "type": "DATE",   "name": "CreatedDate" }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": { "print": true },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": { "record": "0" },
    "speed": { "concurrent": 1 }
  },
  "order": { "hops": [{ "from": "Reader", "to": "Writer" }] }
}

Example 3: Bulk API 2.0

bulk2 uses the same extra parameters as bulk1. Note that bulk2 does not support distributed tasks.

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "salesforce",
      "parameter": {
        "datasource": "",
        "serviceType": "bulk2",
        "table": "Account",
        "beginDateTime": "20230817184200",
        "endDateTime": "20231017184200",
        "where": "",
        "blockCompoundColumn": true,
        "bulkQueryJobTimeoutSeconds": 86400,
        "column": [
          { "type": "STRING", "name": "Id" },
          { "type": "STRING", "name": "Name" },
          { "type": "BOOL",   "name": "IsDeleted" },
          { "type": "DATE",   "name": "CreatedDate" }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": { "record": "0" },
    "speed": { "throttle": true, "concurrent": 1, "mbps": "12" }
  },
  "order": { "hops": [{ "from": "Reader", "to": "Writer" }] }
}

Example 4: SOQL query

When serviceType is query, use the query field to provide a full SOQL statement. DataWorks ignores table, column, beginDateTime, endDateTime, where, and splitPk.

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "salesforce",
      "parameter": {
        "datasource": "",
        "serviceType": "query",
        "query": "select Id, Name, IsDeleted, CreatedDate from Account where Name!='Aliyun'",
        "column": [
          { "type": "STRING", "name": "Id" },
          { "type": "STRING", "name": "Name" },
          { "type": "BOOL",   "name": "IsDeleted" },
          { "type": "DATE",   "name": "CreatedDate" }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": { "record": "0" },
    "speed": { "throttle": true, "concurrent": 1, "mbps": "12" }
  },
  "order": { "hops": [{ "from": "Reader", "to": "Writer" }] }
}

Reader parameters

Parameter Required Default Description
datasource Yes Name of the Salesforce data source as added in DataWorks.
serviceType No sobject Sync mode. Valid values: sobject, query, bulk1, bulk2. See Choose a sync mode.
table Yes (sobject, bulk1, bulk2) Salesforce object name, such as Account, Case, or Group. Objects are equivalent to tables.
beginDateTime No Start of the sync time window in yyyymmddhhmmss format. Applies to sobject, bulk1, and bulk2. The interval is left-closed and right-open.
endDateTime No End of the sync time window. Same format as beginDateTime.
splitPk No Field used for data sharding, enabling parallel reads. Applies to sobject. Supported field types: datetime, int, long. Other field types cause an error.
blockCompoundColumn No true Controls behavior when compound-type columns (e.g., address, geolocation) are present. Applies to bulk1 and bulk2. true: the task fails if compound columns exist; false: compound columns are read as NULL.
bulkQueryJobTimeoutSeconds No 86400 Timeout for Salesforce's batch data preparation phase, in seconds. Applies to bulk1 and bulk2. If Salesforce exceeds this duration preparing data, the task fails.
batchSize No 300000 Number of records to download per batch. Applies to bulk1 and bulk2. Set slightly above Salesforce's automatic shard size for optimal throughput. Data is streamed, so a larger value does not increase memory usage. Advanced parameter — code editor only.
where No WHERE clause for filtering data. Applies to sobject, bulk1, and bulk2. If blank, all records are returned. Do not use limit 10 — Salesforce does not support LIMIT in a standalone WHERE clause.
query No Full SOQL statement. Applies to query mode only. When set, DataWorks ignores table, column, beginDateTime, endDateTime, where, and splitPk. Example: select Id, Name, IsDeleted from Account where Name!='Aliyun'. Advanced parameter — code editor only.
queryAll No false When true, includes deleted records in the result. Applies to sobject and query. Use the IsDeleted field to identify deleted records.
column Yes JSON array of columns to sync. Each entry specifies name and type. Supports constants enclosed in single quotation marks (e.g., '123' for an integer constant, 'abc' for a string constant). Cannot be empty.
connectTimeoutSeconds No 30 HTTP request timeout, in seconds. Advanced parameter — code editor only.
socketTimeoutSeconds No 600 HTTP response timeout, in seconds. The task fails if the interval between two packets exceeds this value. Advanced parameter — code editor only.
retryIntervalSeconds No 60 Interval between retries, in seconds. Advanced parameter — code editor only.
retryTimes No 3 Number of retry attempts. Advanced parameter — code editor only.

Column constants example:

[
  { "name": "Id",    "type": "STRING" },
  { "name": "Name",  "type": "STRING" },
  { "name": "'123'", "type": "LONG"   },
  { "name": "'abc'", "type": "STRING" }
]

Id and Name are column names. '123' and 'abc' are constants — integer and string respectively — enclosed in single quotation marks.