All Products
Search
Document Center

DataWorks:Public dataset

Last Updated:Mar 26, 2026

DataWorks includes a built-in public dataset data source that requires no configuration. Use it to run single-table offline sync tasks against real data without setting up your own data source.

Prerequisites

Before you begin, make sure you have:

  • A DataWorks workspace in a supported region

  • Subscribed to the dataset you want to use

To subscribe, go to DataWorks Gallery, open the Alibaba Cloud Marketplace Datasets category, find the dataset, and subscribe. A dataset only appears as a usable data source in a sync task after you subscribe.

Supported regions

The public dataset data source is available in the following regions:

Beijing, Shanghai, Hangzhou, Shenzhen, Zhangjiakou, Chengdu, Ulanqab, China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).

Run a single-table offline sync task

Configure the sync task using the codeless UI or the code editor:

For the code editor script format, parameters, and a working example, see Appendix: Script demo and parameter descriptions.

Appendix: Script demo and parameter descriptions

Reader script demo

The following script reads columns from the good_reads_books table in the Curated Book Dataset public dataset. Set stepType to public_dataset for all public dataset readers.

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "public_dataset",
            "parameter": {
                "datasource": "Curated Book Dataset",
                "column": [
                    "bookid",
                    "title",
                    "authors",
                    "average_rating",
                    "isbn",
                    "isbn13",
                    "language_code",
                    "__num_pages",
                    "ratings_count",
                    "text_reviews_count",
                    "publication_date",
                    "publisher"
                ],
                "table": "good_reads_books"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": true
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "locale": "zh_CN",
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    }
}

Reader script parameters

Parameter Description Required Default value
datasource The dataset name as shown in DataWorks Gallery after subscribing, for example, Curated Book Dataset. Yes None
table The name of the table to sync. Find this in the dataset details on DataWorks Gallery. Yes None
column The columns to read from the table, for example, ["bookid", "title", "authors"]. Yes None