edit-icon download-icon

Use nasimport to migrate data to NAS

Last Updated: Dec 28, 2017

The nasimport tool helps you to synchronize and migrate objects from your local data centers, Alibaba Cloud OSS, or third-party storage services to Alibaba Cloud NAS.

Background

Nasimport supports the following features:

  • Data sources such as ephemeral disks, OSS, Amazon S3, Baidu Object Storage, Tencent Cloud COS, Jinshan Object Storage, UPYUN, Qiniu, and HTTP links.
  • Automatic mounting of NAS.
  • Synchronization of existing data (allowing synchronization of the objects after a specified time point).
  • Automatic synchronization of incremental data.
  • Resumable data transfer.
  • Parallel data uploading and downloading.

Prerequisites

  • You must run nasimport on an ECS instance that can mount the target NAS file system. See Mount a file system to check whether the ECS can mount a NAS file system and how to mount a NAS file system.

  • You must run nasimport in a Java JDK 1.7 environment or higher. We recommend that you use the Oracle JDK.

Note: Check the number of files allowed to be opened by the process before running the program (run ulimit -n to view). If the number is smaller than 10,240, modify the limit accordingly.

Deploy nasimport

  1. Create a synchronization working directory on the local server, and then download the nasimport toolkit to this directory.

    For example, create /root/ms as the working directory, and download the toolkit to this directory.

    1. export work_dir=/root/ms
    2. wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/45306/cn_zh/1479113980204/nasimport_linux.tgz
    3. tar zxvf ./nasimport_linux.tgz -C "$work_dir"
  2. Edit the configuration file config/sys.properties in the working directory ($work_dir):

    1. vim $work_dir/config/sys.properties
    2. workingDir=/root/ms
    3. slaveUserName=
    4. slavePassword=
    5. privateKeyFile=
    6. slaveTaskThreadNum=60
    7. slaveMaxThroughput(KB/s)=100000000
    8. slaveAbortWhenUncatchedException=false
    9. dispatcherThreadNum=5

We recommend that you use the default configuration. However, you can edit the configuration field values as needed.

Field Description
workingDir The current working directory. The directory to which the toolkit is extracted.
slaveTaskThreadNum The number of simultaneous work threads that perform synchronization.
slaveMaxThroughput(KB/s) The maximum migration throughput.
slaveAbortWhenUncatchedException Determines whether to skip or stop an unknown error. Unknown errors are not stopped by default.
dispatcherThreadNum The number of parallel threads for dispatching a job. We recommend retaining the default value for most scenarios.

Use nasimport

Nasimport command

Nasimport supports the following commands:

  • Submit a job:

    1. java -jar $work_dir/nasimport.jar -c $work_dir/config/sys.properties submit $jobConfigPath
  • Cancel a job:

    1. java -jar $work_dir/nasimport.jar -c $work_dir/config/sys.properties clean $jobName
  • View the status:

    1. java -jar $work_dir/nasimport.jar -c $work_dir/config/sys.properties stat detail
  • Retry a job:

    1. ava -jar $work_dir/nasimport.jar -c $work_dir/config/sys.properties retry $jobName

Start the service

Run the following command.

  1. cd $work_dir
  2. nohup java -Dskip_exist_file=false -jar $work_dir/nasimport.jar -c $work_dir/config/sys.properties start > $work_dir/nasimport.log 2>&1 &

Note:

  • A log file is automatically generated in the current directory where you start the service. We recommend that you start the service in the working directory ($work_dir).
  • If skip_exist_file is true when the job starts, then files in the NAS file system with the same length as the source are skipped during uploading.

Define a job

Edit the job description file local_job.cfg. Field descriptions are as follows.

Field name Description
jobName The custom job name which uniquely identifies a job. You can submit multiple jobs with different names.
jobType You can set this field to import (performing a data synchronization action) or audit (only verifying the global consistency of the synchronized source data and destination data).
isIncremental=false This field specifies whether to enable automatic incremental mode. If it is set to true, incremental data is rescanned at the interval specified by incrementalModeInterval (unit: seconds) and synchronized to the NAS.
incrementalModeInterval=86400 This field specifies a synchronization interval in automatic incremental mode.
importSince This field specifies a time value. Only data later than this time value are synchronized. This time value is specified as a Unix timestamp (number of seconds) and the default value is 0.
srcType This field specifies the synchronization source type. Currently, OSS, Qiniu, Baidu, KS3, Youpai, and local sources are supported.
srcAccessKey If srcType is set to OSS, Qiniu, Baidu, KS3, or Youpai, this field must be the AccessKey of the data source.
srcSecretKey If srcType is set to OSS, Qiniu, Baidu, KS3, or Youpai, this field must be the Secret Key of the data source.
srcDomain The source endpoint.
srcBucket The name of the source bucket.
srcPrefix The source prefix. This field is empty by default. If srcType is set to local, enter the local directory to be synchronized as the prefix. Note that the directory must ended with a ‘/‘. If srcType is set to OSS, Qniu, Baidu, KS3, or Youpai, you must enter the prefix of the objects to be synchronized. To synchronize all objects, leave the prefix empty.
destType The synchronized object type (NAS by default).
destMountDir The local mount directory of NAS.
destMountTarget The NAS mount point domain name.
destNeedMount=true Determines whether the tool performs automatic mounting. The default value is true. You can also set it to false, and manually mount the NAS to the destMountDir directory.
destPrefix The prefix of the synchronized destination objects. The default value is empty.
taskObjectCountLimit The maximum number of objects for each task. This affects the parallel execution of tasks and is usually set to the total number of objects/the number of download threads you've configured. If the total number of objects is unknown, retain the default value.
taskObjectSizeLimit The size limit (in bytes) of data downloaded in each task.
scanThreadCount The number of threads that scan objects in parallel. This field affects object scanning efficiency.
maxMultiThreadScanDepth The maximum directory depth for parallel scans. We recommend retaining the default value for most scenarios.

Note:

  • If you have enabled automatic incremental mode, the job periodically scans the latest data. The job does not end automatically.
  • If srcType is Youpai, the list object operation cannot implement checkpoints due to API limitations of UPYUN itself. Terminating the process before all list operations are completed causes re-listing of all objects.

Submit a job

Run the following command.

  1. java -jar $work_dir/nasimport.jar -c $work_dir/config/sys.properties submit $work_dir/nas_job.cfg

Note:

  • If a job with the same name is in progress, the job cannot submit and will fail.
  • To pause synchronization, you can stop the nasimport process. Restart the nasimort process to resume the synchronization from the point at which it was paused.
  • If you want to resynchronize all objects, you can stop the nassimport process and then run a command to clear the current job. For example, if the current job is named nas_job (this name is specified in the file nas_job.cfg), run the following command.

    1. ps axu | grep "nasimport.jar.- start" | grep -v grep | awk '{print "kill -9 "$2}' | bash
    2. java -jar $work_dir/nasimport.jar -c $work_dir/conf/sys.properties clean nas_job

View the job status

Run the following command.

  1. java -jar $work_dir/nasimport.jar -c $work_dir/config/sys.properties stat detail
  2. --------------job stats begin---------------
  3. ----------------job stat begin------------------
  4. JobName:nas_job
  5. JobState:Running
  6. PendingTasks:0
  7. RunningTasks:1
  8. SucceedTasks:0
  9. FailedTasks:0
  10. ScanFinished:true
  11. RunningTasks Progress:
  12. FD813E8B93F55E67A843DBCFA3FAF5B6_1449307162636:26378979/26378979 1/1
  13. ----------------job stat end------------------
  14. --------------job stats end---------------

The overall progress of the current job, and progress of the current task, are displayed. For example,

  • 26378979/26378979 indicates the total amount of data to be uploaded (26,378,979 bytes) and the amount of data already uploaded (26,378,979 bytes)
  • 1/1 indicates the total number of objects to be uploaded (1) and the number of objects already uploaded (1).

The migration tool splits a job into multiple tasks for parallel execution. Once all tasks are completed, the job is completed. After a job is completed, the JobState becomes Successful or Failed indicating the result of the job.

If the job fails, use the following command to view the cause of failure for each task: (In the following command, you must replace $jobName with the name of the actual job, which is specified in the file local_job.cfg).

  1. cat $work_dir/master/jobs/$jobName/failed_tasks/*/audit.log

If failure is due to temporary unavailability of the source or destination data, use the following command to retry the failed task:

  1. java -jar $work_dir/nasimport.jar -c $work_dir/config/sys.properties retry $jobNam

Reasons for failure

Jobs may fail if any of the following scenarios occur:

  • Job configuration errors, such as AccessKey/ID errors and insufficient permissions, occur. In this case, all tasks fail. To identify the cause, check the $work_dir/nasimport.log file.

  • The encoding of the source object name is inconsistent with the system’s default object name encoding. For example:

    • In Windows, the default object name encoding is GBK.
    • In Linux, the default object name encoding is UTF-8.

      This problem is more likely to occur when the data source is NFS.

  • A change is made to the object in the source directory during the upload process. This is indicated by a SIZE_NOT_MATCH error in audit.log. In this case, the old object is uploaded successfully, but the change is not synchronized to the NAS.

  • A source object is deleted during the upload process. In this case, downloading the object fails.

  • An error occurs in the data source, causing the download of the source data to fail.

  • Performing clean without terminating the process first may cause exceptions in program execution.

Recommendations

When configuring the migration service, if the source is OSS, set srcDomain to an intranet domain name such as internal so that you do not incur the cost of downstream traffic from the OSS source and retain faster migration speeds. Fees are charged only by the number of accesses to OSS. You can retrieve the OSS intranet domain name in the OSS console.

If your NAS is in a VPC and the data source is OSS, set srcDomain to the VPC environment domain name provided by OSS. See Regions and endpoints for more information about the corresponding VPC environment domain names of each region.

Thank you! We've received your feedback.