All Products
Search
Document Center

Cloud Storage Gateway:What synchronization mechanisms does CSG employ to ensure data consistency with OSS?

Last Updated:Nov 07, 2024

Cloud Storage Gateway (CSG) maintains a one-to-one data mapping relationship between a gateway and the associated Object Storage Service (OSS) bucket. Each file in the gateway maps to an object in the associated OSS bucket. CSG allows you to access data in OSS by using Portable Operating System Interface (POSIX) standards. CSG uses metadata synchronization mechanisms to ensure a consistent data view between the gateway and the associated bucket. This topic describes how CSG maintains a consistent data view between a gateway and the associated OSS bucket by using synchronization mechanisms. If you want to view data changes in the OSS bucket on the gateway side, refer to reverse synchronization described in this topic.

Forward synchronization

A gateway automatically synchronizes gateway-side data changes produced by operations such as addition, deletion, and modification to the associated OSS bucket. For example, after you create a file5.mp4 file in the dir2 directory by using the gateway and data synchronization is complete, you can see an object named dir1/dir2/file5.mp4 in the associated OSS bucket. You do not need additional configurations to use the forward synchronization feature.

Reverse synchronization

Reverse synchronization synchronizes object metadata changes from the associated bucket to the gateway. Object metadata changes can be produced by operations that are performed by using different methods, such as the OSS console, ossutil, and OSS API. For example, after you upload a file1.txt object to a bucket that is mapped to a gateway with reverse synchronization enabled, the object appears as a file in the corresponding local directory. The object in the bucket can be managed by using the gateway in the same way you manage a local file. To use the gateway to manage historical objects created in the bucket before the gateway was mapped or those uploaded by using other methods afterward, reversely synchronize metadata changes from the bucket to the gateway. The following table describes the methods that you can use to reversely synchronize metadata.

Important
  • If you want to use the gateway to manage only objects uploaded from it, rather than historical objects or those uploaded by using other methods, such as the OSS console or OSS API, we recommend that you disable reverse synchronization. This helps you avoid the cost and possible performance degradation caused by unnecessary data scans.

  • The performance of reverse synchronization from a versioning-enabled bucket may be compromised due to degraded object listing performance.

  • If you delete an object from a versioning-enabled bucket without specifying a version ID, the event notification rule for the DeleteObject or DeleteObjects event is not triggered. This is because a request to delete an object from a versioning-enabled bucket without a version ID specified does not actually delete any object version. Instead, the current version of the object is made a previous version and a delete marker is added as the current version of the object. As a result, CSG detects no deletion operation on the object.

Reverse synchronization

Regular reverse synchronization based on data scans

Express synchronization based on SMQ

One-time reverse synchronization per API request

Description

Synchronizes metadata changes to the gateway based on regular scans. By default, this reverse synchronization method is enabled. You can disable it.

Synchronizes metadata changes in the bucket within seconds based on Simple Message Queue (formerly MNS) messages, without requiring regular scans. This reverse synchronization method incurs SMQ fees.

Synchronizes metadata changes from the specified directory in the bucket to the gateway upon each TriggerGatewayRemoteSyncRequest request. This reverse synchronization method does not track changes in the bucket after the request is complete.

Scenarios

Suitable for scenarios where timely synchronization is not critical, or a certain level of latency is acceptable for directory access by using commands such as ls and stat.

Suitable for scenarios that require synchronization within seconds, high metadata efficiency, minimal operation latency, or the handling of a large number of files.

Suitable for scenarios where changes in the specified directory of the bucket or the entire bucket are expected to be synchronized to the gateway upon each request.

Trigger

Triggered by access to objects or directories in the bucket. Accessing objects or directories multiple times within the time interval window for reverse synchronization triggers only one reverse synchronization operation.

Triggered based on SMQ messages.

Triggered by individual TriggerGatewayRemoteSyncRequest operations.

Configuration method

In the CSG console, select Yes for Reverse Sync in the Advanced Settings of the share and configure the Reverse Sync Interval parameter. For more information, see In what scenarios and how do I configure reverse synchronization?

Activate Simple Message Queue (formerly MNS) queues and configure express synchronization in the CSG console.

See How do I actively trigger reverse synchronization on a specific directory?

Cost

Fees for OSS API operations. For more information, see API operation calling fees.

Fees for OSS API operations and Simple Message Queue (formerly MNS) resource usage. For more information, see Configure express synchronization.

Fees for OSS API operations.

How it works

See Regular reverse synchronization based on data scans.

See Express synchronization based on SMQ.

See One-time reverse synchronization per API request.

Regular reverse synchronization based on data scans

By default, CSG uses scan-based regular reverse synchronization to synchronize changes from the mapped bucket to the gateway. This reverse synchronization process is triggered by access operations. When you use regular reverse synchronization, you configure a time interval for reverse synchronization at the directory level. When CSG detects that the gap between the current time and the last reverse synchronization time exceeds the time interval, it performs a reverse synchronization operation on the directory. You can configure reverse synchronization in the CSG console. For more information, see Manage shares and In what scenarios and how do I configure reverse synchronization?

Scan-based reverse synchronization is suitable for scenarios where applications are not sensitive to data changes in OSS. It provides stable synchronization performance when the amount of data involved in the synchronization is not large and a reasonable time interval for reverse synchronization is configured. However, because OSS has a flat structure, you may experience high latency when using the OSS API operation to list objects in simulated directories. If a small time interval is specified for reverse synchronization, a large number of scans are performed, which causes latency in metadata operations such as ls and affects the user experience. In this case, we recommend that you use express synchronization based on SMQ messages instead of scan-based reverse synchronization to improve performance.

Express synchronization based on SMQ

Express synchronization based on SMQ messages combines full synchronization and incremental update to maintain consistent metadata between OSS and CSG. Express synchronization addresses the pain points of scan-based reverse synchronization and implements fast synchronization of metadata changes within seconds. To use express synchronization, you must create an express synchronization group. After you create an express synchronization group, you can view the progress of the initial full synchronization. Subsequent changes after the full synchronization are swiftly synchronized to the gateway based on Simple Message Queue (formerly MNS) messages. If you have multiple shares mapped to the same bucket, you can add them to an express synchronization group to enable synchronization of incremental changes to all shares in the express synchronization group. For more information, see Configure express synchronization.Simple Message Queue (formerly MNS)

  • Full synchronization

    After express synchronization is enabled, CSG first performs a full synchronization operation to ensure data consistency between the gateway and bucket. The synchronization process ensures that data on the gateway is exactly consistent with the data in the bucket. Before the full synchronization operation is complete, you may not see all data that exists in the bucket from the gateway. We recommend that you read and write data by using the gateway after the full synchronization operation is complete.

  • Incremental update

    In an incremental update, an Simple Message Queue (formerly MNS) message is sent based on the defined OSS event notification rules to notify CSG to synchronize changes in the OSS bucket in real time. This ensures data view consistency between the bucket and the gateway. This solution avoids performing a large number of scans and provides fast synchronization and a better user experience in cases where applications process numerous files.

  • Express synchronization group

    After you create an express synchronization group, an Simple Message Queue (formerly MNS) topic is automatically created for the express synchronization group. All changes in the associated bucket are delivered to the topic. A topic can map to one or more queues, and each queue corresponds to a share of the gateway. Simple Message Queue (formerly MNS) delivers changes in the topic to corresponding queues. This way, incremental changes in the bucket can be synchronized to all shares in the express synchronization group within seconds.

One-time reverse synchronization per API request

In addition to automatically triggered metadata synchronization, you can also manually trigger synchronization on a specific directory by calling the TriggerGatewayRemoteSyncRequest operation. CSG synchronizes the metadata of the directory to the local directory to make sure that data in the local directory exactly matches the content in the OSS directory. For more information, see How do I actively trigger reverse synchronization on a specific directory?