An API data source allows Dataphin to retrieve business data from an API or write data to it. This topic explains how to create an API data source.
Permissions
To create a Data Source, you must have a Custom Global Role that includes the Create Data Source permission, or one of the following System Roles: Super Administrator, Data Source Administrator, Section Architect, or Project Administrator.
Procedure
In the top navigation bar on the Dataphin homepage, click Management Center > Data Source Management.
On the Data Sources page, click + Add Data Source.
On the New Data Source page, select API from the Semi-structured Storage section.
If you have recently used the API connector, it appears in the Recently Used section. You can also use the search box to find it.
On the New API Data Source page, configure the connection parameters.
Configure basic information
Parameter
Description
Data source name
The name must follow these rules:
It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
It must not exceed 64 characters in length.
Data Source Code
The Data Source Code allows you to reference tables in Flink SQL tasks by using the format
data_source_code.table_nameordata_source_code.schema.table_name. To automatically access the data source for the current environment, use the variable format${data_source_code}.tableor${data_source_code}.schema.table. For more information, see Develop with Dataphin Data Source tables.ImportantThe data source code cannot be modified after it is configured.
You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.
Data source description
A brief description of the data source. The description cannot exceed 128 characters.
Data source configuration
Select the type of data source to configure:
Select Production + Development Data Source if you use separate production and development environments.
Select Production Data Source if you do not use separate environments.
Tags
You can assign tags to the data source for classification. To learn how to create tags, see Manage Data Source Tags.
Configure connection parameters.
NoteAs a best practice, configure the Production and Development Data Sources separately for environment isolation. This prevents development activities from affecting the production environment.
Parameter
Description
URL
The request URL of the API.
Authentication Method
The authentication method used by the API.
Basic auth
Username: Enter the username.
Password: Enter the password.
Alibaba Cloud appKey auth
AppKey: Enter the AppKey.
AppSecret: Enter the AppSecret.
None: No authentication is required.
API key
Key: Enter the authentication key.
Value: Enter the authentication value.
Add to: Specifies where to add the API key in the request. Options include Parameters, Headers, or Body.
Bearer token: Enter the token. This information is added to the API request header in the format
Authorization: Bearer {token}.OAuth 2.0: Enter the Token Prefix and Access Token, and configure the Access Token Retrieval Configuration below.
Token prefix (Optional): Enter a prefix for the token. The default is
Bearer. You can leave this field empty.Access token: Enter the JSON path to the access token within the response from the token acquisition request. Multi-level paths are supported. For example,
data.access_token.
Access token acquisition configuration
NoteThis section is available only when the Authentication Method is set to OAuth2.0.
Request method: Select POST or GET. The default value is GET.
Token URL: Enter the URL of the token endpoint. For example,
https://example.com/oauth/token.Client ID: Enter the Client ID.
Client Secret: Enter the Client Secret.
Client authentication: Select Send basic authentication information in the header or Send client credentials in the request body. The default value is Send basic authentication information in the header.
Send basic authentication information in the header: Sends the authentication information in the
Authorizationheader of the HTTP request. The format isAuthorization: Basic {credentials}, where{credentials}is the Base64-encoded Client ID and Client Secret.Send client credentials in the request body: Sends the client authentication information in the request body as parameters with the keys
client_id, client_secret.
Advanced Settings
NoteThis section is available only when the Authentication Method is set to OAuth2.0.
Request parameters: Specify the additional parameters required for multiple request tokens. This field is empty by default. If the parameters specified here conflict with those automatically added by the authentication configuration above, the parameters specified here take precedence.
Parameter name: Can contain only letters, digits, underscores (_), and hyphens (-). The maximum length is 256 characters.
Add to: Select Parameter, Header, or Body. The default is Parameter. The Body option is available only when the Request Method is POST.
Test Connection: After you click Test Connection, the system automatically verifies the Token URL, Client ID, Client Secret, and Client Authentication. After the connection test is complete, you can click Expand Query Result to view the formatted JSON.
Advanced Settings
Connection retries: The number of times to retry a failed API connection. If all retries fail, the connection attempt is marked as Failed.
Click OK to create the API data source.
External API integration
Core capabilities
This table outlines the core capabilities of an API Data Source. See the following sections for detailed parameter configurations.
Capability | Description |
Authentication method | Supports Basic Authentication, Alibaba Cloud AppKey Auth, API Key Authentication, Bearer Token Authentication, OAuth 2.0 (Authorization Code Grant), and Signature Authentication. |
Request protocol | Supports HTTP and HTTPS. We recommend using HTTPS to protect data in transit. Important You can connect to both HTTP and HTTPS APIs. However, for HTTPS, Dataphin only supports APIs that allow you to skip certificate verification. It does not support APIs that require mandatory certificate validation. |
Request method |
|
Paginated API (Looping Call) | Supports iterative calls to APIs that use page numbers, offsets, or cursors for pagination. |
API authentication methods
This section describes the supported API Authentication Methods, their core parameters, and usage guidelines.
In summary, Basic Authentication and Alibaba Cloud AppKey Auth serve specific scenarios. API Key Authentication and Bearer Token Authentication are for general-purpose use. OAuth 2.0 is ideal for complex authorization scenarios, and the None option is for public APIs.
None (no authentication)
Overview
This method allows API calls without any authentication credentials. It is suitable for publicly accessible endpoints, such as public query services, or APIs that rely on other security measures like an IP Allowlist or internal business logic for access control.
Configuration
No authentication parameters are required. You can call the API directly.
Basic authentication (Basic Auth)
Overview
This is a simple authentication scheme based on the standard HTTP authentication framework. The username and password are combined, encoded in Base64, and included in the request header (
Authorization: Basic <encoded_string>). We recommend using this method only over HTTPS to prevent exposing credentials in plaintext.Required parameters
Parameter
Description
Example
Username
The username provided by the API provider for basic authentication. It is typically a string that supports letters, numbers, and special characters.
api_user_01
Password
The password provided by the API provider. It must be used with the corresponding username.
e89s76d9@2026
Alibaba Cloud AppKey Auth
Overview
As the standard authentication method for the Alibaba Cloud open platform, it validates requests based on an AppKey and AppSecret. Signature rules may vary slightly across Alibaba Cloud products but typically involve encrypting request parameters, a timestamp, and other values. For more information about signature calculation, see Offline integration with API components. This method is ideal for calling various Alibaba Cloud open APIs, such as Object Storage Service (OSS), ApsaraDB RDS, and Short Message Service (SMS).
Required parameters
Parameter
Description
Example
AppKey
The AppKey assigned by the API provider, which serves as a unique identifier for your application.
api_user_01
AppSecret
The secret key assigned by the API provider. It must be used with the corresponding AppKey.
e89s76d9@2026
API key authentication
Overview
This method uses a custom key-value pair for authentication. You can send the key in the request parameters (URL), request header, or request body. It is a highly versatile method compatible with most internal, custom-built, and third-party APIs.
Required parameters
Parameter
Description
Example
Key
The name of the authentication key as defined by the API provider, such as
api-key,app-key, ortoken-id.api_key
Value
The secret value associated with the key. This value is provided by the API provider and should be kept confidential.
789d87s6a987d6s9876d987s6987d
Add to
Specifies where to place the key-value pair in the request. Options include:
Parameter: Appends the key to the URL as a query parameter. For example,
https://api.example.com?api_key=789d87s6....Header: Includes the key in the request header. For example,
api_key: 789d87s6....Body: Includes the key in the POST request body.
Header
Bearer token authentication
Overview
This is a simple authentication scheme based on the HTTP Bearer token. The token is sent directly in the request header (
Authorization: Bearer <token_value>). It is suitable for scenarios requiring single-use or short-lived tokens, such as temporary access tokens.Required parameters
Parameter
Description
Example
Token
The Bearer token issued by the API provider. It is typically a string, such as a JSON Web Token (JWT) or a random string. Ensure the token is valid and not expired.
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
OAuth 2.0 (authorization code grant)
Overview
As an industry-standard authorization protocol, OAuth 2.0 requires you to obtain an access token before making API calls. This method is ideal for scenarios that require user authorization or multi-client access, such as third-party platform integrations.
In Offline Integration tasks, for example, Dataphin retrieves a new access token before each request to the target API and then makes the request with the new token.
Required parameters
Parameter
Description
Example
Basic Configuration
Token prefix
The prefix for the token in the request header. Common values are
Bearer,Token, orOAuth.Bearer
Access token path
The JSONPath to the access token value in the API response. For example, if the response is:
{ "access_token": "access_token_return", "token_type": "Bearer", "expires_in": 3600, "refresh_token": "d68297697d37d97197d7a0c986f9d77989118912" }Enter
access_token.access_token
Access token retrieval configuration
Request method
The HTTP method for retrieving the token. Supports GET and POST. Most services use POST.
POST
Token URL
The API endpoint provided by the server for retrieving the access token.
https://oauth.example.com/tokenClient ID
The unique identifier for your application, provided by the API provider.
client_123456789Client Secret
The secret key for your application, provided by the API provider. This key must be kept confidential.
secret_987654321Client authentication (Optional)
Specifies how to send the Client ID and Client Secret. Options include:
Request Header: Sends credentials in the
Authorizationheader using the Basic Auth format.Request Body: Sends credentials as parameters in the body of the token request.
Request Header
Additional request parameters for token retrieval (Optional)
Specifies any extra parameters required to retrieve the token. You can send them as URL parameters or in the header. For example,
grant_type=client_credentialsorscope=read_write.Parameter:
grant_type=client_credentials
API requirements
Data Read APIs
The response from a data read API must be a JSON object where all fields in a single record contain scalar values. Fields cannot contain an Array.
Supported data structure (compliant)
All fields in a single record, including those within a Nested Object, must contain primitive types such as strings, numbers, or booleans.
// Paginated API responses must include pagination information like page_no and page_size. { "code": 200, "msg": "Request successful", "request_id": "req-20260228001", // Unique request ID for troubleshooting. "data": [{ "user_id": 10001, "user_name": "Zhang San", "user_phone": "13800138000", "user_email": "zhangsan@example.com", "user_hobbies": { "sports" : "Badminton, Table Tennis",// Multi-level nested objects are supported. "instrument" : "Guitar" }, "create_time": "2026-02-28 10:00:00" },{ "user_id": 10002, "user_name": "Li Si", "user_phone": "13800138001", "user_email": "lisi@example.com", "user_hobbies": { "sports" : "Basketball", "instrument" : "Piano, Erhu" }, "create_time": "2026-02-28 10:00:00" }], "page_no": 1, // Current page number. Required for paginated APIs. "page_size": 10, // Number of items per page. Required for paginated APIs. "total": 23, // Total number of items. Optional. "pages": 3 // Total number of pages. Optional. }Unsupported data structure (non-compliant)
Fields in a single record, including those within a Nested Object, cannot contain an Array. The following structure is non-compliant and cannot be parsed.
{ "code": 200, "msg": "Request successful", "request_id": "req-20260228001", "data": [{ "user_id": 10001, "user_name": "Zhang San", "user_phone": "13800138000", "user_email": "zhangsan@example.com", "user_hobbies": { "sports": ["Basketball"], // Prohibited: Field value is an array. "instrument": ["Piano", "Erhu"] // Prohibited: Field value is an array. }, "create_time": "2026-02-28 10:00:00" }] }
Data Write APIs
Data write APIs must accept a request body where field values are scalar types. Nested Objects and Array values are not supported.
Supported data structure (compliant)
All field values in a single record must be primitive types such as strings, numbers, or booleans. Nested objects and arrays are not supported.
[{ "user_id": 10001, "user_name": "Zhang San", "user_phone": "13800138000", "user_email": "zhangsan@example.com", "create_time": "2026-02-28 10:00:00" },{ "user_id": 10002, "user_name": "Li Si", "user_phone": "13800138001", "user_email": "lisi@example.com", "create_time": "2026-02-28 10:00:00" }]Unsupported data structure (non-compliant)
The value of any field in a single record cannot be a Nested Object or an Array. The following structure does not meet the requirements and cannot be parsed.
[{ "user_id": 10001, "user_name": "Zhang San", "user_hobbies": ["Basketball", "Piano"], // Prohibited: Array type. "user_info": { // Prohibited: Nested Object type. "phone": "13800138001", "email": "zhangsan@example.com" }, "create_time": "2026-02-28 10:00:00" },{ "user_id": 10002, "user_name": "Li Si", "user_hobbies": ["Basketball", "Piano"], // Prohibited: Array type. "user_info": { // Prohibited: Nested Object type. "phone": "13800138002", "email": "lisi@example.com" }, "create_time": "2026-02-28 10:00:00" }]
Offline integration with API components
Offline integration is a primary use case for API data sources. The capabilities and requirements for input and output scenarios differ slightly. For more information about configuring the API Input and Output components, see the following topics:
Signature authentication
Signature Authentication provides robust identity verification by generating a signature string by applying a hash function to critical request information, such as request parameters, a timestamp, a nonce (a random number), and a secret key. The server then recalculates the signature using the same rule and compares it with the received signature to prevent request tampering and replay attacks. You can use it independently or in combination with other authentication methods to enhance API security. If your API provider requires signature authentication, refer to this section.
Required parameters
The following parameters are used to generate the signature string.
Parameter | Description | Example |
Signature name | The name of the signature parameter or header in the request. The server uses this name to identify the signature. |
|
Signature location | Specifies where to place the signature string. Options include:
| Params |
Generation function | The hash algorithm used for the signature. Supported algorithms include: MD5HEX, HMAC_MD5, SHA1HEX, HMAC_SHA1, SHA256, SHA256HEX, HMAC_SHA256, SHA512HEX, and HMAC_SHA512. | MD5HEX |
Secret key | The secret key used for signature calculation. This key is agreed upon with the API provider and must be kept confidential. |
|
Content to concatenate | The rule for constructing the original string before signature generation. Options include:
| Custom |
Data reading (API Input component)
Required parameters
Parameter | Description | Example |
Request method | The API Input Component supports GET and POST. | GET |
Number of requests | Supports single and multiple (iterative) requests. | Multiple Requests |
Multiple (iterative) requests: Pagination loop
For APIs that do not return all data in a single request (such as paginated or scrolling APIs), use the automatic Looping Call feature to continuously make requests and aggregate all data until a termination condition is met. This feature supports two modes, Page Number/Offset loop and Cursor loop, to continuously make requests until a termination condition is met. This automatically aggregates all data, making it suitable for various batch data retrieval scenarios.
Page Number/Offset loop
Use this mode for standard Paginated APIs that accept
Page Numberandpage sizeorOffsetandlimitas input parameters to return data for a specific page (for example,page=1&size=100oroffset=0&limit=100).Parameter
Description
Example (page and size)
Example (offset and limit)
Request parameters
Select the location (Params or Body only) and the name of the looping input parameter.
Location: Params
Parameter Name:
pageLocation: Params
Parameter Name:
offsetInitial value
The initial value of the loop parameter.
1 (Page number starts at 1)
0 (Offset starts at 0)
Step
The increment for the parameter after each loop. This value should match the number of items per page.
1 (If each page has 100 items, the page number increments by 1)
100 (100 items per page. The offset is incremented by 100 for each request.)
Termination condition
The rule that determines when to stop the loop.
You can compare response parameters, request parameters, constants, or the number of requests.
Rule: Response parameter
current_page= Constant10(Stops after fetching 10 pages. This condition is flexible.)
Rule: Response parameter
has_more= Constantfalse(Stops when there is no more data. This condition is flexible.)
Cursor loop
This mode is ideal for high-concurrency or large-volume data APIs. It uses a
Cursorvalue from the current response as an input for the next request, which avoids the "missed or duplicate data" issues common with page-number pagination.Parameter
Description
Example
Request parameters
Select the location (Params or Body) and the name of the cursor parameter.
Location: Params
Parameter Name:
next_cursorCursor parameter
Defines the source of the cursor value for the next request. The value is extracted from the response of the current request. JSONPath is supported.
Response parameter +
data.next_cursorInitial value
The initial cursor value for the first request, as defined by the API. The default is
0.0
Termination condition
The rule that determines when to stop the loop.
Common conditions include checking if the cursor value from the response is a constant (like an empty string or 0) or if the response data is empty.
Rule: Response parameter
data.next_cursor= Constant "" (empty string)
Multiple (iterative) requests: Parameter traversal loop
This mode iterates through a fixed list of parameter values. In each request, one value from the list is passed to the specified input parameter of the API. The loop continues until all values in the list have been used. This is suitable for scenarios where you need to query data based on a batch of fixed dimensions, such as city, department, or product ID. It supports two methods for providing the parameter list: Manual Input and Retrieve via API.
Parameter level | Parameter | Description | Example |
Loop List Configuration | Retrieval Method | You can select:
| Manual Input |
Manual Input Mode | Parameter value list | When you select Manual Input, enter the parameter values to iterate through. Separate each value with a line break. | 101 102 103 (Represents iterating through city IDs) |
API Retrieval Mode | API Configuration for List Retrieval | When you select Retrieve via API, you must configure the preliminary API's request URL, request method, authentication method, and parameters, as well as the path for extracting parameter values (JSONPath is supported). The system first calls this API to get the list and then iterates through it. | Preliminary API: Extraction Path: |
The Retrieve via API mode only supports APIs that use the None (no authentication) method.
Data writing (API Output component)
Parameter | Description | Example |
Request method | The API Output Component only supports POST. | POST |
Number of requests | Only single requests are supported. | Default. No configuration required. |
Request data structure | The format of the JSON data sent in the request.
| Single Record |