This topic describes how to upgrade your Dataphin version in Dataphin Manager using a downtime or semi-downtime method.
Prerequisites
You understand the scope of impact of the version upgrade. For more information, see Scope of impact of the version upgrade.
No configurations or upgrades are in progress. If an upgrade is in progress, wait for it to complete. If necessary, you can force stop the current version upgrade before you start a new one.
Background information
A semi-downtime or downtime upgrade for Dataphin involves three main steps: selecting the version and upgrade mode, performing the upgrade, and verifying the upgrade result. During the upgrade, you must enter upgrade maintenance mode. In semi-downtime mode, users can log on to Dataphin after the application upgrade is complete. In downtime mode, users cannot log on to Dataphin.
If your current Dataphin instance version is earlier than V5.1.1, only downtime upgrades and zero-downtime upgrades are supported. If your current version is V5.1.1 or later, semi-downtime upgrades, downtime upgrades, and zero-downtime upgrades are supported. The following flowcharts show the complete process for semi-downtime and downtime upgrades.
Stop upgrade: You can stop the upgrade during the Force stop running tasks or Applications not yet stopped stage. This means the system is in the downtime stage, but the shutdown has not started. After you stop the upgrade, the system resumes task submission and exits upgrade maintenance mode. You can choose whether to automatically rerun the forcibly stopped instances. If you do not choose to automatically rerun them, you can manually rerun them later in the task operations and maintenance (O&M) module.
Force stop upgrade: You can force stop the upgrade at any stage after the upgrade begins. This includes force stopping running tasks, shutting down, pre-upgrading, backing up the database, upgrading applications, rerunning tasks, and updating data. However, the system cannot automatically recover from a forced stop. This might make Dataphin unavailable. Therefore, before you force stop an upgrade, you must confirm that you have manually completed the upgrade or rollback. We recommend that only professional O&M engineers confirm and perform this operation.
Procedure
Step 1: Select the version and upgrade mode
Append
/opsconsole/v2to the logon URL of Dataphin to open the Dataphin Manager logon page.On the Dataphin Manager logon page, enter your Username and Password, and then click Log on. You can obtain the username and password from Dataphin operations personnel.
On the Dataphin Manager home page, click System Configuration.
On the Upgrade History page, click Upgrade Dataphin.
On the Upgrade Dataphin page, select the version and upgrade mode. The following table describes the parameters.
Parameter
Description
Upgrade Configuration
Target version
Select the target version from the version list.
If the version list does not contain the target version, click Upload Version Configuration to upload the target version's configuration file.
After the configuration file is uploaded, the system performs configuration file validation. If the file content is incorrect, the validation fails, and the system reports an error with the failure reason. If the configuration file passes validation and the system does not already contain the configuration file for that version, continuing the upload will import the version configuration. If the configuration file passes validation and the system already contains the configuration file for that version, continuing the upload will overwrite the system configuration.
Configuration file
Standard configuration: Click Upload File to upload a configuration file in YAML or ZIP format. You can download the configuration file after it is uploaded.
After the configuration file is uploaded, the system performs configuration file validation. If the configuration file does not contain all configuration items, the system reports an error. Click View Details in the error message to see the list of Missing Configuration Items.
Non-standard configuration: Before uploading the file, you can contact the Dataphin O&M team to get the required configuration file in ZIP format. Click Upload File to upload the file. After the file is uploaded, the system automatically performs the following checks:
Checks whether the standard version configuration template (product/dataphin/...) in the configuration file is the same as the uploaded version configuration. This is done using MD5 validation. If they are not consistent, the system prompts you that the configuration template in the file is not compatible with the selected version. We recommend that you confirm with the Dataphin O&M team before proceeding with the upgrade.
Checks whether the overlay file and the values.yaml file in the configuration file are compatible with the standard version configuration template (product/dataphin/...). If they are not compatible, the system blocks this upgrade and prompts you that the configuration information in the file is not compatible with the selected version. It advises you to confirm with the Dataphin O&M team.
Checks whether the values.yaml file in the configuration file is consistent with the values.yaml file running in your current online environment. If they are not consistent, the system prompts you that the configuration information in the file is not consistent with the current online configuration. Continuing the upgrade will use the new configuration file.
NoteUnless there are special circumstances, do not use a non-standard configuration.
Upgrade mode
Select Semi-downtime upgrade or Downtime upgrade. During the upgrade, DataService Studio remains active, and synchronous calls to the DataService Studio API remain available. Except for scenarios where the first three digits of the current and target Dataphin version numbers are the same, we recommend that you use the semi-downtime upgrade for all other upgrade paths.
Semi-downtime upgrade: Greatly reduces scheduling downtime and does not require stopping running tasks.
Downtime upgrade: Pauses task scheduling during the upgrade and requires stopping running tasks.
NoteDuring a semi-downtime or downtime upgrade, when you make asynchronous calls to DataService Studio APIs related to StarRocks, MaxCompute, Databricks, or OceanBase data sources, DataService Studio remains active and available for calls.
The system checks the compatibility between the current and target versions to determine if a zero-downtime upgrade is supported. If a zero-downtime upgrade is not supported, the downtime upgrade is selected by default.
Announcement configuration
Estimated completion time
Select the estimated time to exit upgrade maintenance mode. The default is the current time. The time format is YYYY-MM-DD hh:mm.
Contact email
The email address of a contact person available during the upgrade.
Contact phone
The phone number of a contact person available during the upgrade. This can be a landline or a mobile phone number.
Select the risk declaration checkbox, and then click Enter Upgrade Maintenance Mode.
After you click Enter Upgrade Maintenance Mode, the system records the time. The system then performs the following operations:
Saves the current configuration and creates a record with the status "Upgrading" in the upgrade history. You cannot start a new upgrade process at this time.
The dispatching of scheduled tasks is paused. After the upgrade is complete, task dispatching resumes:
Developer/Production environment: Task Scheduling=Off, Task Execution=On
NoteSchedule switch: Controls scheduling for auto triggered tasks. When this switch is off, new auto triggered tasks are not dispatched. Task instances that are already running continue to run. Data backfill tasks and ad-hoc queries are not affected.
Execution switch: Controls whether tasks are dispatched for resource scheduling. This switch affects data backfill tasks, auto triggered tasks, and ad-hoc queries. When this switch is off, instances that have not started are not dispatched. Task instances that are already running continue to run.
In the development environment, the schedule switch is off by default and cannot be turned on.
The frontend enters upgrade maintenance mode and no longer allows operations.
If you click Save, the system creates a record with the status Configuring in the upgrade history list. You can click Continue Upgrade in the list to edit it.
Step 2: Start the upgrade
Semi-downtime upgrade
Database Backup
Using a self-managed PostgreSQL database
Click Start Backup. When the database backup status is Backing up, you can click Next.
The database backup statuses are Not backed up, Starting, Backing up, Backup complete, and Backup failed.
Using RDS or other database types
If you use an RDS database, you must perform the backup in the RDS console. Click Go to RDS Backup to open the RDS database console.
You can select the risk declaration checkbox and click Next to skip the database backup step.
NoteThe database backup status does not affect the upgrade task. The upgrade can be finalized even if the database backup is not complete. If you force stop the upgrade, the database backup process continues and is not affected.
Pre-upgrade
Click Start Pre-upgrade. All applications begin the pre-upgrade process. A progress bar shows the completion percentage. The service list below shows the specific pre-upgrade status of each service. When the pre-upgrade is complete, click Next.
Upgrade application
Click Start Upgrade. In the Prompt dialog box, click OK. The system stops scheduling (this does not affect running tasks) and starts the upgrade. This is expected to take 30 minutes. After the application upgrade step is successful, click Exit Upgrade Maintenance Mode and Next. This exits maintenance mode, and the system records the time when task submission is resumed and maintenance mode is exited. If the application upgrade step fails, contact Dataphin O&M engineers. You cannot exit upgrade maintenance mode at this point.
NoteAfter you exit maintenance mode, the system resumes scheduling and can be accessed normally. However, this does not mean the upgrade is complete. You still need to perform the data update step.
The system starts the applications. The online transactional processing (OLTP) applications for DataService Studio undergo a rolling upgrade. This involves starting the new version and gradually replacing the old version. The upgrade status progress bar shows the progress and status of the startup. After the applications are started (upgraded), the process automatically proceeds to the next step.
If resources are sufficient, the new versions of the OLTP, mgmt, and Gateway applications are started before the old versions are stopped. If resources are insufficient, there might be times when no applications are available. After the mgmt application is started, the OLTP and Gateway applications for DataService Studio need to be upgraded separately.
If some applications fail to upgrade, you can click Java Thread Dump to diagnose the issue or click Restart to restart the Java process for the application. Regardless of whether the application upgrade is successful, you can click View Log in the upper-right corner of the page to view detailed logs for the application upgrade.
Data update
Click Update and Next. The system automatically executes all data update tasks that have not run successfully. The data update list does not include tasks that do not need to be run (non-blocking tasks). Tasks that are already running or have run successfully will not run again. Failed and unexecuted tasks will be triggered to run.
You can click Run to start a task. If necessary, you can click Stop to stop a running task. Both Run and Stop support batch operations. When a task's status is Successful, Running, Stopped, or Failed, you can click Log Details to view the task's operational log.
For an Asset refresh task, you can view the Asset Refresh Details. If the metadata format changes after the upgrade, the asset metadata needs to be upgraded. The asset metadata is upgraded by triggering messages. The asset refresh details include the total number of messages, remaining messages, successful messages, and failed messages. The total number of messages is the total reported by each application in the initial stage. The number of failed messages is the number of messages that failed to be processed. If there are failed messages during the data update, the Dataphin O&M team must confirm whether they can be ignored before you continue the upgrade.
For a Metadata acquisition task, you can manually start the task and view its status and log details. The log includes the total number of workflows and the number of completed workflows. During the upgrade, existing acquisition instances are not changed and use the latest execution code. If an instance includes a scheduling change, the change takes effect the day after the upgrade is complete.
During the upgrade, you can click Force Stop Upgrade at any time to stop the upgrade. Note that the system cannot automatically recover from a forced stop, which might make Dataphin unavailable. Before you force stop an upgrade, you must confirm that you have manually completed the upgrade or rollback. We recommend that only professional O&M engineers confirm and perform this operation. When you force stop an upgrade, you can choose to set the upgrade status to successful or failed. In either case, the system records the time of the forced completion and adds the record to the upgrade history list.
Downtime upgrade
You can click View Log to view the service startup log.
Force stop running tasks
In the Running Tasks list, select one or more tasks to force stop. You can click Force Stop All Tasks and Next to stop all tasks. Forcibly stopped tasks can be rerun after the upgrade. If a task cannot be rerun, we recommend waiting for it to complete. Then, view the stopped tasks list. Tasks in the forcibly stopped tasks list must be rerun after the application is upgraded. When all tasks are forcibly stopped, the system records the completion time.
After you click Force Stop, the corresponding task is removed from the task list and added to the stopped tasks list. If a task completes or fails before being forcibly stopped, it will not be recorded in the task list.
NoteYou can proceed to the next step only when the running tasks list is empty, meaning all tasks have been forcibly stopped or have completed running. If a task fails to be forcibly stopped, an error is reported. Click Retry in the error message to try again. The task list auto-refreshes every 20 s.
You must manually rerun forcibly stopped tasks in the Rerun tasks step.
(Optional) If necessary, you can stop the upgrade from the upgrade history list.
Shutdown
Click Shutdown and Next to shut down all running pods. A progress bar shows the shutdown progress.
During the upgrade, DataService Studio remains active. The pod list shows the status of all application pods except for DataService Studio.
Database Backup
Click Backup and Next to start the data backup. After the backup is complete, the process automatically proceeds to the next step.
A progress bar shows the backup progress. If a database backup fails, you can click View Log to see the failure log, or click Re-backup to back up a single database again. Clicking Re-backup resets the backup status of that database. If necessary, you can click Force Stop to stop a database that is being backed up.
NoteYou can select the risk declaration checkbox to skip the database backup step.
If you use an RDS database, you must perform the backup in the RDS console. You can click Go to RDS Backup to open the RDS database console.
Upgrade application
Click Upgrade Application and Next to start the applications. The OLTP applications for DataService Studio undergo a rolling upgrade. This involves starting the new version and gradually replacing the old version. The upgrade status progress bar shows the progress and status of the startup. After the applications are started (upgraded), the process automatically proceeds to the next step.
If resources are sufficient, the new versions of the OLTP, mgmt, and Gateway applications are started before the old versions are stopped. If resources are insufficient, there might be times when no applications are available. After the mgmt application is started, the OLTP and Gateway applications for DataService Studio need to be upgraded separately.
The service list in the Upgrade application step is the same as the application list in the Shutdown step. If some applications fail to upgrade, you can click Java Thread Dump to diagnose the issue or click Restart to restart the Java process for the application. Regardless of whether the application upgrade is successful, you can click View Log in the upper-right corner of the page to view detailed logs for the application upgrade.
Rerun tasks
Click Rerun Tasks and Next to automatically rerun all tasks that have not been rerun. The list shows the instances that were forcibly stopped during this upgrade. After the tasks are rerun, the process automatically proceeds to the next step. You can also click Run to rerun a single task. If necessary, you can click Stop to stop a task from running. Run and Stop support batch operations. If your selection includes completed or failed tasks, they will be ignored, and the list will automatically refresh after the operation.
If rerunning tasks fails to start, or some tasks fail to start, the system reports an error. You can click Retry in the error message or click Rerun Tasks and Next to try again.
Data update
By default, Run tasks and exit upgrade maintenance mode is selected. Click Update and Next. The system automatically executes all data update tasks that were not skipped and have not run successfully. When all blocking tasks are complete, the system automatically resumes task submission and exits upgrade maintenance mode. It also records the time when task submission is resumed and maintenance mode is exited. Non-blocking tasks will continue to run. Tasks that are already running or have run successfully will not run again. Failed and unexecuted tasks will be triggered to run. Skipped tasks will be ignored and will not run.
You can also click Run to start a task, or click Skip to treat the task as non-blocking and proceed to the next step, regardless of whether it runs. If necessary, you can click Stop to stop a running task. Run, Skip, and Stop all support batch operations. When a task's status is Successful, Running, Stopped, or Failed, you can click Log Details to view the task's operational log.
For an Asset refresh task, you can view the Asset Refresh Details. If the metadata format changes after the upgrade, the asset metadata needs to be upgraded. The asset metadata is upgraded by triggering messages. The asset refresh details include the total number of messages, remaining messages, successful messages, and failed messages. The total number of messages is the total reported by each application in the initial stage. The number of failed messages is the number of messages that failed to be processed. If there are failed messages during the data update, the Dataphin O&M team must confirm whether they can be ignored before you continue the upgrade.
Metadata acquisition tasks cannot be skipped. You can manually start the task and view its status and log details. The log includes the total number of workflows and the number of completed workflows. During the upgrade, existing acquisition instances are not changed and use the latest execution code. If an instance includes a scheduling change, the change takes effect the day after the upgrade is complete.
During the upgrade, you can click Force Stop Upgrade at any time to stop the upgrade. Note that the system cannot automatically recover from a forced stop, which might make Dataphin unavailable. Before you force stop an upgrade, you must confirm that you have manually completed the upgrade or rollback. We recommend that only professional O&M engineers confirm and perform this operation. When you force stop an upgrade, you can choose to set the upgrade status to successful or failed. In either case, the system records the time of the forced completion and adds the record to the upgrade history list.
Step 3: Verify the upgrade result
All necessary operations for the system upgrade are complete, and the process now enters the verification stage. The verification page shows the Number of generated instances, Instance status comparison, and View failed instances.
You can verify the upgrade result by checking the number of instances generated, comparing instance statuses, and viewing failed instances to determine the cause of failure. After you confirm that the upgrade was successful, click Complete to finalize the Dataphin version upgrade. The system records this time as the upgrade completion time. If the upgrade result is not as expected, contact the Dataphin O&M team for confirmation and repair.
Number of generated instances: This is divided into the number of recent production instances and the 7-day average number of instances. Instances are generated at 23:00 every day.
Recent production instances: Dynamically queries and displays information about the last instance generation, including the number of instances generated and the corresponding data timestamp. If no instances were generated, the system displays To be generated. The date format is
YYYY-MM-DD.7-day average instances: The daily average number of instances for data timestamps from T-8 to T-2. If no instances were generated on a day, it is not counted. If no instances were generated in the last 7 days, it displays NA.
Instance status comparison: Filters data by data timestamp (default is T-1) and tenant. A donut chart and a column chart show the instance status distribution and the number of instances over the last 7 days, respectively.
Status distribution: Shows the status distribution of the filtered instances.
7-day instance count: Shows the number of filtered instances categorized by status and the total number of instances.
View failed instances: The failed instances list shows task instances that failed today but succeeded yesterday. It includes the instance ID, node ID, task name, node type, scheduling type, last update time, and running status for each instance. After the upgrade, monitor for 30 minutes. Rerun any tasks that succeeded yesterday but failed today. If a task still fails, check the operational log to determine if the issue is with the task itself. The task list refreshes every 20 s and updates the task statuses.