DataWorks includes a built-in public dataset data source that requires no configuration. Use it to run single-table offline sync tasks against real data without setting up your own data source.
Prerequisites
Before you begin, make sure you have:
-
A DataWorks workspace in a supported region
-
Subscribed to the dataset you want to use
To subscribe, go to DataWorks Gallery, open the Alibaba Cloud Marketplace Datasets category, find the dataset, and subscribe. A dataset only appears as a usable data source in a sync task after you subscribe.
Supported regions
The public dataset data source is available in the following regions:
Beijing, Shanghai, Hangzhou, Shenzhen, Zhangjiakou, Chengdu, Ulanqab, China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).
Run a single-table offline sync task
Configure the sync task using the codeless UI or the code editor:
-
Codeless UI: Configure in codeless UI
-
Code editor: Configure in the code editor
For the code editor script format, parameters, and a working example, see Appendix: Script demo and parameter descriptions.
Appendix: Script demo and parameter descriptions
Reader script demo
The following script reads columns from the good_reads_books table in the Curated Book Dataset public dataset. Set stepType to public_dataset for all public dataset readers.
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "public_dataset",
"parameter": {
"datasource": "Curated Book Dataset",
"column": [
"bookid",
"title",
"authors",
"average_rating",
"isbn",
"isbn13",
"language_code",
"__num_pages",
"ratings_count",
"text_reviews_count",
"publication_date",
"publisher"
],
"table": "good_reads_books"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": true
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"locale": "zh_CN",
"speed": {
"concurrent": 2,
"throttle": false
}
}
}
Reader script parameters
| Parameter | Description | Required | Default value |
|---|---|---|---|
datasource |
The dataset name as shown in DataWorks Gallery after subscribing, for example, Curated Book Dataset. |
Yes | None |
table |
The name of the table to sync. Find this in the dataset details on DataWorks Gallery. | Yes | None |
column |
The columns to read from the table, for example, ["bookid", "title", "authors"]. |
Yes | None |