All Products
Search
Document Center

Apsara File Storage NAS:Migrate data to a NAS file system

Last Updated:Dec 11, 2023

After you configure an intermediate node for data transfer, you must use a migration tool to migrate data to an Apsara File Storage NAS (NAS) file system. This topic describes how to use a migration tool to migrate data from an on-premises storage system to a NAS file system.

Prerequisites

An Elastic Compute Service (ECS) instance is configured as the intermediate node for data transfer. For more information, see Configure an intermediate node for data transfer.

Migrate data by using an SSH File Transfer Protocol (SFTP) client

If you need to upload a small number of files at a time, we recommend that you install and use an SFTP client on your on-premises computer. This solution has the following benefits:

  • Supports multiple operating systems.

  • Provides a graphical user interface (GUI).

  1. Install an SFTP client on your on-premises computer.

    Download and install an SFTP client based on your on-premises operating system. Multiple SFTP versions are available. In this example, FileZilla is used.

  2. Establish a connection between the SFTP client and the ECS instance.

    1. Open and configure the FileZilla client. The following table describes the parameters. Click Quickconnect to establish a connection.

      After the on-premises client and the ECS instance are connected, the on-premises file system is displayed in the left pane. The file system that is mounted on the ECS instance is displayed in the right pane.U-D NFS0201

      Parameter

      Description

      Host

      The public IP address of the ECS instance. For example, you can enter 192.0.2.1.

      Username

      The username of the ECS instance. The user has the read and write permissions on the NAS file system. For example, you can enter root.

      Note
      • The default username of a Linux ECS instance is root or ecs-user.

      • The default username of a Windows ECS instance is administrator.

      Password

      The logon password of the ECS instance. For example, you can enter the logon password of the root user.

      Note

      If you forget the logon password of the ECS instance, you can reset the password. For more information, see Reset the logon password of an instance.

      Port

      The port number used by SFTP. Default value: 22.

    2. In the Remote site field in the right pane, enter the path to the directory where the NAS file system is mounted, for example, /mnt. Press Enter to view the list of files in the NAS file system.

      UD-NFS0202

  3. Upload data.

    Drag and drop one or more files or directories from the left pane to the right pane to upload the data.

Migrate data by using the rsync command-line tool

If you need to frequently upload files or upload a large number of files, we recommend that you install the rsync command-line tool on your on-premises computer. This solution has the following benefits:

  • After a file is uploaded, the metadata of the file remains unchanged, including the owner and permission information.

  • You can synchronize incremental data.

  • You can configure crontab for your on-premises Linux or macOS system to automatically back up data to the NAS file system.

  1. Install the rsync tool.

    • Linux

      Operating system

      Command

      CentOS

      Use the yum package manager to install rsync.

      sudo yum install rsync

      Redhat

      Ubuntu

      Use the apt package manager to install rsync.

      sudo apt-get install rsync

      Debian

      Note

      If you are using a different version of Linux, use the corresponding package manager to install rsync.

    • macOS

      Download and install the Homebrew package manager, and then run the following command to install rsync:

      brew install rsync
    • Windows

      Download and install the Cygwin simulation environment. During the installation process, you can search for and install rsync. You can also download, compile, and install rsync.

      Note

      You must enable the SSH TCP port 22 for the security group of the associated virtual private cloud (VPC).

  2. Upload data.

    Run the following command to upload a specified on-premises directory to the NAS file system in incremental synchronization mode:

    rsync -avP DirToSync/ root@192.0.2.0:/mnt/DirToSync/

    Replace the parameters in the command with the actual values. The following table describes the parameters.

    Parameter

    Description

    DirToSync

    The name of the on-premises directory that you want to upload.

    root

    The owner of the destination directory in the NAS file system.

    192.0.2.0

    The public IP address of the Linux or Windows ECS instance on which the NAS file system is mounted.

    /mnt

    A path in the ECS instance. The path is used to mount the NAS file system.

    Note
    • The source path in the rsync command must end with a forward slash (/). Otherwise, the source path does not match the destination path after the data is synchronized.

    • You can also use rsync to concurrently copy and upload data to Apsara File Storage NAS. Run the following command:

      threads=<Number of threads>; 
      src=<Source path/>; 
      dest=<Destination path/>; 
      rsync -av -f"+ */" -f"- *" $src $dest && (cd $src && find . -type f | xargs -n1 -P$threads -I% rsync -av % $dest/% )

      For example, the number of threads is 10, the source path is /abc, and the destination path is /mnt1.

      threads=10; 
      src=/abc/; 
      dest=/mnt1/; 
      rsync -av -f"+ */" -f"- *" $src $dest && (cd $src && find . -type f | xargs -n1 -P$threads -I% rsync -av % $dest/% )
  3. Optional. Configure scheduled upload and backup jobs.

    You can run rsync commands to configure crontab for your on-premises Linux or macOS system to upload data at a regular interval.

    • Linux

      1. Establish a password-free connection between the on-premises computer and the ECS instance. For more information, see Connect to a Linux instance by using an SSH key pair.

        Run the following command to check whether the connection is successful:

        ssh -i ~/.ssh/ecs.pem root@1.2.3.4
        Note

        ~/.ssh/ecs.pem is the path of the key file on your on-premises computer.

      2. Configure crontab.

        Run the crontab -e command to open the editor and configure a scheduled upload job. The following example shows the sample settings:

        0 23 * * * rsync -av -e "ssh -i ~/.ssh/ecs.pem" ~/Documents/ root@192.0.2.0:/mnt/Documents/

        The preceding crontab configuration allows Linux operating systems to automatically upload data from the on-premises Documents directory to the NAS file system at 23:00:00 every day. You can modify the parameters in the crontab configuration based on your business requirements.

    • macOS

      1. Configure access permissions on the /usr/sbin/cron directory.

        Go to System Preferences. In the window that appears, choose Security & Privacy > Privacy > Full Disk Access, click Unlock, and then click +. In the window that appears, select Macintosh HD Directory and press the cmd+shift+. combination keys to display hidden directories. Then, select the /usr/sbin/cron directory.

      2. Establish a password-free connection between the on-premises computer and the ECS instance. For more information, see Connect to a Linux instance by using an SSH key pair.

        Run the following command to check whether the connection is successful:

        ssh -i ~/.ssh/ecs.pem root@1.2.3.4
        Note

        ~/.ssh/ecs.pem is the path of the key file on your on-premises computer.

      3. Configure crontab.

        Run the crontab -e command to open the editor and configure a scheduled upload job. The following example shows the sample settings:

        0 23 * * * rsync -av -e "ssh -i ~/.ssh/ecs.pem" ~/Documents/ root@1.2.3.4:/mnt/Documents/

        The preceding crontab configuration allows Linux operating systems to automatically upload data from the on-premises Documents directory to the NAS file system at 23:00:00 every day. You can modify the parameters in the crontab configuration based on your business requirements.

    Note

    If the resync tool cannot meet your requirements, you can use the fpsync tool to migrate your data in multiple threads. For more information, see Appendix: Migrate data by using the fpsync command-line tool.

Migrate data by using the Robocopy tool

Robocopy is a directory copy command that is provided by Windows. You can create two image copies that have the same file structure without the need to copy any unnecessary duplicate files. You can retain all relevant file information, such as the date and timestamp. To migrate large amounts of data, you can install the latest Python program on a Windows ECS instance and configure the migration.py script.

  1. Log on to the ECS instance that is used to migrate data.

  2. Migrate data.

    Run the following command to migrate data from the source file system (disk Z) to the destination file system (disk Y):

    robocopy Z:\ Y:\ /e /w:5 /z /mt:32
    Note

    Only the data in the specified directory is migrated. The directory is not migrated.

    The following table describes the parameters. Replace the values of the parameters with the actual values.

    Parameter

    Description

    /mt

    Specifies the number of concurrent threads. Default value: 8.

    Valid values: 1 to 128.

    In this example, 32 threads are used for multi-thread replication.

    /w

    Specifies the number of seconds between two consecutive retries caused by errors.

    /z

    Enables resumable upload.

    /e

    Copies all subdirectories, including empty directories.

    /copyall

    Copies all file information. The information includes the following items:

    • Data

    • Attributes

    • Timestamp

    • Access control list (ACL)

    • Owner information

    • Audit information

    Note

    If you want to accelerate the migration of large amounts of data, for example, hundreds of millions of small files larger than 10 TB, you can install the latest Python program on a Windows ECS instance. For more information, see How do I accelerate data migration to an SMB file system?

  3. Check the migration result.

    After you complete data migration, run the following command to check whether the data that is stored in the destination file system is consistent with the data that is stored in the source file system.

    ROBOCOPY Z:\ Y:\ /e /l /ns /njs /njh /ndl /fp /log:reconcile.txt

    The following table describes the parameters. Replace the values of the parameters with the actual values.

    Parameter

    Description

    /e

    Lists only directories, including empty directories.

    /l

    Records the differences without modifying or copying files.

    /fp

    Includes the full paths of files in logs. This parameter is required only if the /ndl parameter is not specified.

    /ns

    Does not include the file size in logs.

    /ndl

    Does not include folders in logs.

    /njs

    Does not include the job summary.

    /njh

    Does not include the job header.

    /log:reconcile.txt

    Writes the migration result to the reconcile.txt log file. If the migration result already exists in the log file, the existing content is overwritten.

Migrate data by using IIS FTP

If you need to upload a small number of files, we recommend that you install and configure an FTP client on your on-premises computer. This solution has the following benefits:

  • Supports multiple operating systems.

  • Provides a GUI.

Configure the IIS FTP service on the ECS instance and configure the FTP client on your on-premises computer. For more information, see Set up the Windows IIS web service.

Note
  • You must enable the FTP TCP port for the security group of the associated VPC.

  • You can also configure other FTP servers and clients to upload and download data over the Internet.

  • You are not charged for the inbound traffic of an elastic IP address (EIP). However, you are charged for the outbound traffic of an EIP. Therefore, you are not charged for the traffic that is generated when you upload data to the NAS file system over the Internet. However, you are charged for the traffic that is generated when you download data from the NAS file system. For more information about the billing of EIP, see Pay-as-you-go.

Appendix: Migrate data by using the fpsync command-line tool

  1. Download and install the fpsync tool.

    wget -N https://github.com/martymac/fpart/archive/fpart-1.1.0.tar.gz -P /tmp
    tar -C /tmp/ -xvf /tmp/fpart-1.1.0.tar.gz
    cd /tmp/fpart-fpart-1.1.0
    sudo yum install -y automake libtool
    autoreconf -i
    ./configure
    make
    sudo make install
    sudo yum install parallel -y
    printf "will cite" | parallel --bibtex
    sudo yum install -y rsync
  2. Copy the entire file directory.

    fpsync -n 10 -f 10000 /data/src/ /data/dst/
    Note

    For more information about the fpsync tool, see fpsync.

References

If your data center frequently reads and writes large amounts of data to and from a NAS file system, you must create a physical connection and mount the NAS file system to the data center. For more information, see Access file systems in on-premises data centers.

If you need to upload large amounts of data to a NAS file system, you can upload the data to Object Storage Service (OSS) and then migrate the data from OSS to NAS. We recommend that you use this solution if the Internet bandwidth of the ECS instance cannot meet your requirements. For more information, see Upload objects and Migrate data.

What to do next

After the data is uploaded, you can mount the NAS file system on the ECS instance or the container to which your business belongs. This allows you to share the data in the NAS file system.

For example, you can mount an NFS file system on a Linux ECS instance or mount an SMB file system on a Windows ECS instance. Then, you can access data in the NAS file system the same way you access your on-premises data. For more information, see Mount an NFS file system on a Linux ECS instance and Mount an SMB file system on a Windows ECS instance.

You can also deploy business applications on the cloud. Then, you can read and write large amounts of data to and from the NAS file system by using programs on multiple compute nodes. For example, you can Use NGINX as a proxy for Apsara File Storage NAS or Use Windows IIS to access Apsara File Storage NAS.

If you no longer upload data to or download data from an intermediate ECS instance, you can release the ECS instance. For more information, see Release an instance.