All Products
Search
Document Center

Use data development to archive data

Last Updated: May 09, 2020

This topic describes how to quickly archive data based on the data development feature provided by Data Management Service (DMS).

Background

The amount of data in online tables will constantly increase as business develops. In regular business, business data is usually classified into hot data and cold data. Hot data refers to data that has recently been generated and needs to be accessed, whereas cold data refers to data that was generated long time ago. Cold data is retained online for purposes like backup and filing for traceability. If the amount of data in business tables is within control and the database is performing well, you may not need to process cold data. However, when frequent data or schema changes are required, timely and efficient separation of cold and hot data will greatly save your time and reduce performance overhead.

Before the data development feature is introduced, the conventional data archiving methods are as follows:

  • The R&D engineer writes dumping and cleanup logic and deploys the code on an application server. The code is periodically called as scheduled.
  • The database administrator (DBA) or maintenance expert writes dumping and cleanup SQL statements in a script and deploys the script on a database server by using the crontab command. The script is periodically called as scheduled.

Both the data archiving methods have the following shortcomings:

  • You must develop and maintain each business table separately.
  • You cannot manage global tasks in a timely and effective way. For example, you cannot pause tasks at important moments, such as big promotions and changes.
  • The methods are unable to provide notifications for stakeholders if tasks are not run as scheduled. As a result, a large amount of data may be generated in online tables. This will degrade service performance.
  • You cannot manage operations logs in a unified manner or trace the source.

Data development overview

  • For more information about the basic concepts and usage of data development, see Data development.
  • Archiving data online by using the data development feature has the following advantages:
    • You can configure the archiving logic for each business table on demand at any time.
    • You can view all task flows. In case of important events such as big promotions, you can easily pause tasks.
    • R&D engineers, DBAs, and maintenance experts can all perform operations in functional modules. This reduces the workload of R&D engineers.
    • This method supports unified operations and allows you to view operations logs at any time.
    • This method provides notifications for stakeholders immediately after a task fails to be run.

Procedure

  1. Log on to the DMS console.

  2. In the top navigation bar, choose Data Plans > Data Development.

  3. On the Data Development page that appears, click Add Task Flow in the upper-left corner.

  4. In the Add Task Flow dialog box that appears, set Task Flow Name and Description and click OK.

  5. Click Add Task in the middle of the page and drag the task type Single Instance SQL to the canvas.

    If the dumped data and online data are not stored on the same instance, drag Cross-instance SQL to the canvas.

  6. Configure the task.

    1. Click the task created in step 5, and then click the Content tab on the right-side configuration panel. The SQL editor of the task appears.
    2. Edit the data dumping logic on the Content tab.
      If you want one table per month, you can use the schema design and schema change features to create multiple tables at a time. For more information, see Schema design and Change the schema of a logical table. You can also use the CREATE statement to create tables in the SQL task.

    3. Click the Variable Setting tab on the configuration panel to set variables.

    4. After you write SQL statements and set variables, click SQL preview.

    5. If the entered information is correct, click Save in the lower-right corner of the configuration panel.

  7. Repeat steps 5 and 6 and create another task to clean up data in online tables.

  8. Return to the directed acyclic graph (DAG) and draw a line between the two tasks.

    Data archiving requires that data can be cleaned up only after data dumping is completed.

  9. After a task flow is created in the DAG, click the Scheduling tab on the right-side configuration panel. On this tab, set Turn on/off to on, set other parameters, and then click Save.

    A task flow can be scheduled on a monthly, weekly, or daily basis. You can select a scheduling cycle as needed.

Check the running status of a task flow

On the Data Development page, click the Operation Center tab on the left-side navigation submenu to view the running status of a task flow. In this example, the task flow is configured to be run on the nineteenth day every month. As a result, no record appears here.

If a task fails, the owner of the task flow can be notified. By default, the owner of a task flow is the creator. The DMS administrator and DBA can transfer the ownership of a task flow. For more information, see Task flow.