Operational data store layer - MaxCompute - Alibaba Cloud Documentation Center

The operational data store (ODS) layer holds the raw, unprocessed data ingested from your source business systems. It serves as the foundation for all upper-layer analytics — the quality of your data warehouse depends directly on how well you design and manage this layer.

Data at the ODS layer is detailed, application-oriented, and frequently accessed. Although the ODS layer sits outside the formal data modeling scope of a data warehouse, it still requires careful planning.

Note Before creating tables in MaxCompute, familiarize yourself with the data type editions that MaxCompute supports.

Design ODS tables

This tutorial uses transaction data — order details, user details, and product details — to illustrate ODS layer design. Raw data is stored in two-dimensional tables across six ODS tables:

Table	Description
`s_auction`	Products listed for auction
`s_sale`	Products listed for sale
`s_users_extra`	Extended user information
`s_biz_order_delta`	Incremental product orders (daily)
`s_logistics_order_delta`	Incremental logistics orders (daily)
`s_pay_order_delta`	Incremental payment orders (daily)

Naming conventions

Keep ODS table and field names consistent with those in the source system. This makes it easier to trace data back to its origin and reduces confusion during development.

Follow these rules:

Use tags to distinguish incremental tables from full tables. In this tutorial, the _delta suffix marks incremental tables.
When two source systems use the same table name, use the source database name as a prefix or suffix to avoid conflicts.
If a field name conflicts with a SQL keyword, append _col to the field name.

Create ODS tables

The following CREATE TABLE statements create the six ODS tables used in this tutorial. For detailed syntax, see Table operations.

All tables use a date partition field (ds) and a LIFECYCLE setting that controls how long data is retained.

CREATE TABLE IF NOT EXISTS s_auction
(
    id                             STRING COMMENT 'Product ID',
    title                          STRING COMMENT 'Product name',
    gmt_modified                   STRING COMMENT 'Last modification date of the product',
    price                          DOUBLE COMMENT 'Product price in USD',
    starts                         STRING COMMENT 'Time when the product is put on the shelf',
    minimum_bid                    DOUBLE COMMENT 'Product opening bid in USD',
    duration                       STRING COMMENT 'Validity period of the product for sale, in days',
    incrementnum                   DOUBLE COMMENT 'Increment of the auction price',
    city                           STRING COMMENT 'City of the product',
    prov                           STRING COMMENT 'Province of the product',
    ends                           STRING COMMENT 'Time when the product is taken off the shelf',
    quantity                       BIGINT COMMENT 'Quantity',
    stuff_status                   BIGINT COMMENT 'Product status: 0: new; 1: idle; 2: second-hand',
    auction_status                 BIGINT COMMENT 'Product sales status: 0: normal; 1: deleted; 2: taken off the shelf; 3: never put on the shelf',
    cate_id                        BIGINT COMMENT 'Product category ID',
    cate_name                      STRING COMMENT 'Product category name',
    commodity_id                   BIGINT COMMENT 'Commodity ID',
    commodity_name                 STRING COMMENT 'Commodity name',
    umid                           STRING COMMENT 'Buyer UMID'
)
COMMENT 'ODS table of products for auction'
PARTITIONED BY (ds         STRING COMMENT 'Format: YYYYMMDD')
LIFECYCLE 400;

CREATE TABLE IF NOT EXISTS s_sale
(
    id                             STRING COMMENT 'Product ID',
    title                          STRING COMMENT 'Product name',
    gmt_modified                   STRING COMMENT 'Last modification date of the product',
    starts                         STRING COMMENT 'Time when the product is put on the shelf',
    price                          DOUBLE COMMENT 'Product price in USD',
    city                           STRING COMMENT 'City of the product',
    prov                           STRING COMMENT 'Province of the product',
    quantity                       BIGINT COMMENT 'Quantity',
    stuff_status                   BIGINT COMMENT 'Product status: 0: new; 1: idle; 2: second-hand',
    auction_status                 BIGINT COMMENT 'Product sales status: 0: normal; 1: deleted; 2: taken off the shelf; 3: never put on the shelf',
    cate_id                        BIGINT COMMENT 'Product category ID',
    cate_name                      STRING COMMENT 'Product category name',
    commodity_id                   BIGINT COMMENT 'Commodity ID',
    commodity_name                 STRING COMMENT 'Commodity name',
    umid                           STRING COMMENT 'Buyer UMID'
)
COMMENT 'ODS table of products for sale'
PARTITIONED BY (ds      STRING COMMENT 'Format: YYYYMMDD')
LIFECYCLE 400;


CREATE TABLE IF NOT EXISTS s_users_extra
(
    id                  STRING COMMENT 'User ID',
    logincount        BIGINT COMMENT 'Number of logons',
    buyer_goodnum     BIGINT COMMENT 'Number of favorable comments received as a buyer',
    seller_goodnum    BIGINT COMMENT 'Number of favorable comments received as a seller',
    level_type        BIGINT COMMENT '1: level-1 shop; 2: level-2 shop; 3: level-3 shop',
    promoted_num      BIGINT COMMENT '1: class-A service; 2: class-B service; 3: class-C service',
    gmt_create        STRING COMMENT 'Creation time',
    order_id          BIGINT COMMENT 'Order ID',
    buyer_id            BIGINT COMMENT 'Buyer ID',
    buyer_nick        STRING COMMENT 'Buyer nickname',
    buyer_star_id     BIGINT COMMENT 'Buyer star ID',
    seller_id         BIGINT COMMENT 'Seller ID',
    seller_nick       STRING COMMENT 'Seller nickname',
    seller_star_id      BIGINT COMMENT 'Seller star ID',
    shop_id             BIGINT COMMENT 'Shop ID',
    shop_name         STRING COMMENT 'Shop name'
)
COMMENT 'User information extended table'
PARTITIONED BY (ds       STRING COMMENT 'yyyymmdd')
LIFECYCLE 400;

CREATE TABLE IF NOT EXISTS s_biz_order_delta
(
    biz_order_id         STRING COMMENT 'Order ID',
    pay_order_id          STRING COMMENT 'Payment order ID',
    logistics_order_id    STRING COMMENT 'Logistics order ID',
    buyer_nick           STRING COMMENT 'Buyer nickname',
    buyer_id              STRING COMMENT 'Buyer ID',
    seller_nick          STRING COMMENT 'Seller nickname',
    seller_id             STRING COMMENT 'Seller ID',
    auction_id             STRING COMMENT 'Product ID',
    auction_title        STRING COMMENT 'Product title',
    auction_price        DOUBLE COMMENT 'Product price',
    buy_amount           BIGINT COMMENT 'Quantity of purchased products',
    buy_fee              BIGINT COMMENT 'Purchase order amount',
    pay_status           BIGINT COMMENT 'Payment status: 1: unpaid; 2: paid; 3: refunded',
    logistics_id           BIGINT COMMENT 'Logistics order ID',
    mord_cod_status      BIGINT COMMENT 'Logistics status: 0: initial state; 1: order received; 2: order receiving timed out; 3: package picked up; 4: package pickup failed; 5: package signed off; 6: package sign-off failed; 7: logistics order canceled by the user',
    status               BIGINT COMMENT 'Order status: 0: normal; 1: invisible',
    sub_biz_type         BIGINT COMMENT 'Business type: 1: auction; 2: purchase',
    end_time             STRING COMMENT 'Transaction end time',
    shop_id                BIGINT COMMENT 'Shop ID'
)
COMMENT 'Daily incremental table of successful orders'
PARTITIONED BY (ds       STRING COMMENT 'yyyymmdd')
LIFECYCLE 7200;


CREATE TABLE IF NOT EXISTS s_logistics_order_delta
(
    logistics_order_id STRING COMMENT 'Logistics order ID',
    post_fee           DOUBLE COMMENT 'Logistics fee',
    address            STRING COMMENT 'Shipping address',
    full_name          STRING COMMENT 'Recipient full name',
    mobile_phone       STRING COMMENT 'Mobile number',
    prov               STRING COMMENT 'Province',
    prov_code          STRING COMMENT 'Province ID',
    city               STRING COMMENT 'City',
    city_code          STRING COMMENT 'City ID',
    logistics_status   BIGINT COMMENT 'Logistics status:
1: not shipped
2: shipped
3: received
4: returned
5: picking',
    consign_time       STRING COMMENT 'Shipping time',
    gmt_create         STRING COMMENT 'Order creation time',
    shipping           BIGINT COMMENT 'Shipping method:
1: surface mail
2: express delivery
3: EMS',
    seller_id          STRING COMMENT 'Seller ID',
    buyer_id           STRING COMMENT 'Buyer ID'
)
COMMENT 'Daily incremental table of logistics orders'
PARTITIONED BY (ds                 STRING COMMENT 'Date')
LIFECYCLE 7200;


CREATE TABLE IF NOT EXISTS s_pay_order_delta
(
    pay_order_id     STRING COMMENT 'Payment order ID',
    total_fee        DOUBLE COMMENT 'Total amount payable (Quantity × Unit price)',
    seller_id STRING COMMENT 'Seller ID',
    buyer_id  STRING COMMENT 'Buyer ID',
    pay_status       BIGINT COMMENT 'Payment status:
1: waiting for payment
2: waiting for shipping
3: transaction succeeded',
    pay_time         STRING COMMENT 'Payment time',
    gmt_create       STRING COMMENT 'Order creation time',
    refund_fee       DOUBLE COMMENT 'Refund amount (freight included)',
    confirm_paid_fee DOUBLE COMMENT 'Total revenue of received and confirmed products'
)
COMMENT 'Daily incremental table of payment orders'
PARTITIONED BY (ds        STRING COMMENT 'Date')
LIFECYCLE 7200;

Choose a storage strategy

MaxCompute supports three storage strategies for ODS tables. Use this table to choose the right one for your data:

Strategy	How it works	Best for	Storage cost
Incremental storage	Each partition holds one day's new or changed records.	High-volume transactional data (orders, logs) where downstream jobs rarely need historical full data.	Low — stores only daily deltas.
Full storage	Each partition holds a complete snapshot of all records as of that date.	Slowly changing dimensions (SCDs) with a small total data volume, such as product categories.	Higher — duplicates unchanged records across partitions.
Zipper storage	Two timestamp fields (`start_dt`, `end_dt`) track every record change. Both fields are typically used as partition fields.	Dimensions that change over time and where you need to reconstruct the exact state on any given date.	Medium — stores change history without full duplication.

Incremental storage

Each partition stores only the records created or modified that day, using the date as the partition key.

Example: On January 1, User A visits shop B, generating log entry t1 (stored in the 20160101 partition). On January 2, User A visits shop C, generating log entry t2 (stored in the 20160102 partition).

If User A makes a purchase on January 1 (log entry t1) and returns the product on January 2, the updated record is stored in the 20160102 partition — not as an overwrite to 20160101.

To reconstruct full historical data, run a UNION across all incremental partitions. This approach is efficient when downstream applications rarely need a complete view directly from ODS.

Full storage

Each partition stores a complete copy of all records as of that date.

Example: On January 1, seller A lists products B and C (t1, t2 stored in 20160101). On January 2, product B is updated and product D is added. The 20160102 partition contains the updated t1, unchanged t2, and new t3.

This approach trades storage space for simplicity — no UNION needed, and any day's snapshot is immediately queryable. Set LIFECYCLE values to limit how many daily snapshots you retain, or storage costs will grow unbounded.

Zipper storage

Zipper storage adds start_dt and end_dt to every record to track its validity period. A record is "active" when start_dt <= <query_date> and end_dt > <query_date>.

Example:

Product	`start_dt`	`end_dt`	Seller	Status
B	20160101	20160102	A	Put on the shelf
C	20160101	30001231	A	Put on the shelf
B	20160102	30001231	A	Take off the shelf

To get the state of all products on January 1, 2016, specify the conditions start_dt <= 20160101 and end_dt > 20160101 on the timestamp fields.

Handle slowly changing dimensions (SCDs)

Use a natural key, not a surrogate key, as the primary key of a MaxCompute dimension table. There are two reasons:

Performance: MaxCompute is a distributed compute engine. Generating globally unique surrogate keys adds significant overhead, especially at scale.
ETL complexity: Surrogate keys increase extract, transform, and load (ETL) pipeline complexity and ongoing maintenance costs.

Snapshot mode for SCDs

The recommended approach for handling SCDs in MaxCompute is snapshot mode. Once per day, take a full snapshot of each dimension table. This lets you join any fact table partition to the dimension snapshot from the same date using the natural key — no surrogate keys or complex ETL merging required.

Advantages:

Simple to implement and maintain
Easy to query: specify a date and join on the natural key
Low development and maintenance cost

Trade-off: If a dimension rarely changes, daily snapshots waste storage by repeating unchanged records. Set up a data lifecycle mechanism to purge snapshots beyond your retention window.

Synchronize data

Use DataWorks Data Integration to synchronize data from source systems into MaxCompute.

Follow these rules:

Sync each source table only once. Syncing the same source table multiple times creates schema inconsistencies.
Store ODS data in partitioned tables by statistical date. This reduces storage costs and enables lifecycle policy control.
Use Data Integration's full and incremental sync. The feature supports full loads, real-time incremental writes, and automatic merging of incremental and full data into new partitions. For details, see Overview of the full and incremental synchronization feature.