All Products
Search
Document Center

Cloud Backup:Add a data source

Last Updated:Mar 04, 2024

Cloud Backup provides the data synchronization feature for unstructured file systems. You can synchronize data sources on the source, such as Network Attached Storage (NAS) file systems, Hadoop Distributed File System (HDFS) file systems, S3-Compatible Storage buckets, Object Storage Service (OSS) buckets, Cloud Parallel File Storage (CPFS) file systems, and OSS-Compatible Storage buckets, to the data sources on the destination (including Alibaba Cloud). Before you synchronize data for the first time, you must add the source and destination data sources. This topic describes how to add a data source in the Cloud Backup console.

Prerequisites

  • Cloud Backup is activated. You are not charged for activating Cloud Backup. The data synchronization feature is in public preview and is provided free of charge.

  • Cloud Backup is authorized, and a Cloud Backup client is installed. For more information, see Before you begin.

Procedure

  1. Log on to the Cloud Backup console.

  2. In the left-side navigation pane, choose Synchronization > Data Synchronization.

  3. In the top navigation bar, select a region.

  4. On the Data Source List tab, click Create Data Source.

  5. In the Create Data Source panel, configure the parameters and click OK.

    • Source Type: Network Attached Storage (NAS)

      1. Configure the parameters described in the following table.

        Parameter

        Description

        Source Type

        The type of the data source. Select Network Attached Storage (NAS).

        Data Source Name

        The name of the data source.

        NAS Network Address

        The network address of the NAS file system whose data is to be synchronized.

        NAS Share Path

        The directory relative to the / root directory. For example, if you enter /myshare, Cloud Backup synchronizes data from the /myshare directory.

        • For more information about how to query the shared directories of a NAS file system, see How do I query the shared directories of a NAS file system?

        • The directory can contain letters, digits, and the following special characters: ,-_=/.:\.

          Note

          The root directory of an Apsara File Storage NAS file system after the file system is mounted differs depending on the protocol type. In most cases, the root directory of a Network File System (NFS) file system is / after the file system is mounted, whereas the root directory of a Server Message Block (SMB) file system is /myshare after the file system is mounted. Note that the mounted directory may be different in actual scenarios. Therefore, perform operations based on your actual mounting situation.

        Protocol Type

        The protocol type of the NAS file system. Valid values:

        • NFS

        • SMB

        • GlusterFS

        Important
        • If you want to synchronize data from an Apsara File Storage NAS file system, configure the vers parameter in the Advanced Settings section.

        • The NFS, SMB, or GlusterFS client must be installed on the Cloud Backup client. You can run the following commands to install the NFS, SMB, or GlusterFS client:

          NFS
          -CentOS: sudo yum install nfs-utils
          -Ubuntu: sudo apt-get install nfs-common
          SMB
          -Centos: sudo yum install cifs-utils
          -Ubuntu: sudo apt-get install cifs-utils
          -openSUSE: sudo zypper install cifs-utils
          GlusterFS
          -CentOS: sudo yum install glusterfs-client
          -Ubuntu: sudo apt-get install glusterfs-client
          -Reference: https://docs.gluster.org/en/latest/Install-Guide/Overview/
      2. Optional. Click Advanced Settings and then click +Set Mount Parameters.

        The following table describes the mount parameters that you can configure.

        Parameter

        Description

        vers

        The protocol version of the file system.

        • vers=3: uses NFSv3 to mount the file system.

        • vers=4: uses NFSv4 to mount the file system.

        • vers=4.0: uses NFSv4.0 to mount the file system.

        nolock

        Specifies whether to enable file locking.

        proto

        The protocol that you want to use to mount the file system.

        rsize

        The size of each data block that a client can read from the file system.

        Recommended value: 1048576. Unit: bytes.

        wsize

        The size of each data block that a client can write to the file system.

        Recommended value: 1048576. Unit: bytes.

        hard

        Specifies that applications no longer access the file system when the file system is unavailable and access the file system again when the file system becomes available. We recommend that you enable this parameter.

        timeo

        The period of time for which the NFS client waits before the client retries to send a request. Unit: deciseconds (tenths of a second).

        Recommended value: 600 (60 seconds).

        retrans

        The number of retries after the NFS client fails to send a request.

        Recommended value: 2.

    • Source Type: Hadoop Distributed File System (HDFS)

      Configure the parameters described in the following table.

      Parameter

      Description

      Source Type

      The type of the data source. Select Hadoop Distributed File System (HDFS).

      Data Source Name

      The name of the HDFS data source. You can specify a name based on your business requirements. Example: back-end-hdfs.

      NameNode Network Address

      The network address of an HDFS primary server.

      NameNode serves as a primary server that is used to manage the namespaces of HDFS file systems and control access from clients to files in the file systems. For example, if the network address is 47.100.XX.XX and the port number is 9000, the data source address is 47.100.XX.XX:9000.

      NameNode Port

      The port number of the HDFS primary server. Example: 9000.

      Secondary NameNode Network Address

      The network address of a secondary HDFS node.

      A secondary HDFS node helps the primary server perform management tasks.

      Secondary NameNode Port

      The port number of the secondary HDFS node.

      HDFS Username

      The name of the HDFS user.

      Note

      Make sure that the specified HDFS user has sufficient permissions. Otherwise, you may be unable to read files when you synchronize data. We recommend that you set this parameter to hadoop or hdfs.

    • Source Type: Alibaba Cloud Object Storage Service (OSS)

      Configure the parameters described in the following table.

      Parameter

      Description

      Source Type

      The type of the data source. Select Alibaba Cloud Object Storage Service (OSS).

      Data Source Name

      The name of the OSS data source.

      Use HTTPS

      Specifies whether to transmit data over HTTPS. HTTPS provides higher transmission security than HTTP.

      OSS Bucket

      The name of the OSS bucket whose data you want to synchronize. Select a name from the drop-down list. Cloud Backup automatically obtains all OSS buckets in the region within your account.

      OSS Endpoint

      The endpoint of the OSS bucket. Select an endpoint from the drop-down list. For more information about the endpoints of OSS in each region, see Regions and endpoints.

      • If you synchronize data over the Internet, select the public endpoint of the bucket. For example, select oss-cn-hangzhou.aliyuncs.com for the China (Hangzhou) region.

      • If you use a virtual private cloud (VPC) to synchronize data, select the internal endpoint of the bucket. For example, select oss-cn-hangzhou-internal.aliyuncs.com for the China (Hangzhou) region.

    • Source Type: Cloud Parallel File Storage (CPFS)

      Configure the parameters described in the following table.

      Parameter

      Description

      Data Source Name

      The name of the CPFS data source. The name helps you quickly identify the data source. You can specify a name based on your business requirements. Example: cpfs.

      CPFS Mount Path

      The mount path of the CPFS file system. Example: /cpfs/00d0******1b-000001.

      If you have not added a Portable Operating System Interface for UNIX (POSIX) mount target or installed the CPFS-POSIX client for your CPFS file system, add a mount target and install the client.

      You can run the following commands on the CPFS cluster management node to query the CPFS cluster status and the mount path.

      • Query the CPFS cluster status

        • Command

          mmgetstate -a
        • Sample output

           Node number  Node name                            GPFS state  
          ---------------------------------------------------------------
                     1  cpfs-00d0******1b-000001-qr-001  active
                     2  cpfs-00d0******1b-000001-qr-002  active
                     3  cpfs-00d0******1b-000001-qr-003  active
                     4  iZbp******haqrZ              active
      • Query the CPFS mount path

        • Command

          df -h
        • Sample output

        Filesystem               Size  Used Avail Use% Mounted on
        devtmpfs                 3.8G     0  3.8G   0% /dev
        tmpfs                    3.8G   16K  3.8G   1% /dev/shm
        tmpfs                    3.8G  528K  3.8G   1% /run
        tmpfs                    3.8G     0  3.8G   0% /sys/fs/cgroup
        /dev/vda1                 40G  7.3G   33G  19% /
        tmpfs                    763M     0  763M   0% /run/user/0
        00d0******1b-000001  3.6T  564M  3.6T   1% /cpfs/00d0******1b-000001

        In the preceding output, /cpfs/00d0******1b-000001 is the CPFS mount path.

    • Source Type: OSS Compatible Storage

      Configure the parameters described in the following table.

      Parameter

      Description

      Source Type

      The type of the data source. Select OSS Compatible Storage.

      Data Source Name

      The name of the data source for OSS-Compatible Storage. You can specify a name based on your business requirements. Example: oss-bucket.

      Use HTTPS

      Specifies whether to transmit data over HTTPS. HTTPS provides higher transmission security than HTTP.

      OSS Bucket

      The name of the OSS bucket for OSS-Compatible Storage. The name is provided by a third-party storage service provider.

      OSS Endpoint

      The VPC endpoint provided by the third-party storage service provider. Obtain the VPC endpoint from the administrator for OSS-Compatible Storage.

      AccessKey ID

      The AccessKey ID and AccessKey secret provided by the third-party storage service provider to access the VPC. Obtain the AccessKey pair from the administrator for OSS-Compatible Storage. The full permission to read data from OSS-Compatible Storage must be granted on the key.

      AccessKey Secret

    • Source Type: S3 Compatible Storage

      Configure the parameters described in the following table.

      Parameter

      Description

      Source Type

      The type of the data source. Select S3 Compatible Storage.

      Data Source Name

      The name of the data source for S3 Compatible Storage. You can specify a name based on your business requirements. Example: awss3.

      Use HTTPS

      Specifies whether to transmit data over HTTPS. HTTPS provides higher transmission security than HTTP.

      S3 Bucket

      The name of the S3-Compatible Storage bucket.

      S3 Endpoint

      The endpoint connected to the buckets that can be used to perform operations on S3 objects. Example: s3.us-east-1.amazonaws.com. Obtain the endpoint from the administrator for S3-Compatible Storage.

      Access Key

      The security credential for accessing S3-Compatible Storage as an Identity and Access Management (IAM) user. Obtain the AccessKey pair from the administrator for S3-Compatible Storage. The full permission to read data from S3-Compatible Storage must be granted on the key.

      Secret Key

    After you add a data source, the data source is displayed on the Data Source List tab.

Related operations

After a data source is added, you can click More in the Actions column and then select the required operation. The following table describes the available operations.

Operation

Description

Edit Data Source

Modifies the parameters of the data source.

Unregister Data Source

If you no longer need to synchronize data, you can unregister the data source. After a data source is unregistered, data is no longer synchronized.

  1. On the Sync Plan tab, delete all the synchronization plans for the data source.

  2. On the Data Source List tab, find the data source that you want to unregister and choose More > Unregister Data Source in the Actions column.

  3. In the View Client Groups panel, delete the Cloud Backup client for data synchronization.

  4. On the server where the Cloud Backup client for data synchronization is installed, uninstall the client. For more information, see How do I uninstall a Cloud Backup client?

What to do next

Create a synchronization plan