This topic describes how to deploy ossimport in distributed mode. Distributed deployment of ossimport is supported only for Linux.
A cluster of at least two machines is deployed, with one being a master and the others being workers.
A connection over SSH is established between the master and the workers.
All workers use the same username and password.Note
An SSH connection is established between the master and workers, or login credentials of the workers are configured in sys.properties.
Download and install ossimport
Download ossimport-2.3.6.tar.gz to your local computer.
All subsequent operations are performed on the master.
Log on to the server and run the following command to create the ossimport directory:
mkdir -p $HOME/ossimport
Go to the directory of the package and run the following command to decompress the package to the specified directory:
tar -zxvf ossimport-2.3.6.tar.gz -C $HOME/ossimport
The structure of the decompressed file is as follows:
ossimport ├── bin │ ├── console.jar # The JAR package for the Console module. │ ├── master.jar # The JAR package for the Master module. │ ├── tracker.jar # The JAR package for the Tracker module. │ └── worker.jar # The JAR package for the Worker module. ├── conf │ ├── job.cfg # The Job configuration file template. │ ├── sys.properties # The configuration file that contains system parameters. │ └── workers # The list of workers. ├── console.sh # The command-line tool. Only Linux is supported. ├── logs # The directory that contains logs. └── README.md # The file that introduces and explains ossimport. We recommend that you read this file before you use ossimport.
OSS_IMPORT_HOME: the root directory of ossimport. By default, the root directory is
$HOME/ossimportin the decompression command. You can specify a root directory by using
export OSS_IMPORT_HOME=<dir>or by modifying the
$HOME/.bashrcconfiguration item in the system configuration file. We recommend that you use the default root directory.
OSS_IMPORT_WORK_DIR: the working directory of ossimport. You can specify a working directory by configuring
conf/sys.properties. We recommend that you use
$HOME/ossimport/workdiras the working directory.
Specify absolute paths for OSS_IMPORT_HOME or OSS_IMPORT_WORK_DIR, such as
The distributed deployment of ossimport has three configuration files:
conf/job.cfg: the configuration file template used to configure jobs in distributed mode. Configure the parameters based on your actual migration job.
conf/sys.properties: the configuration file that contains system operating parameters, such as the working directory and worker-related parameters.
conf/workers: the worker list.
Before you start a migration job, check the parameters in
job.cfg. After a migration job is submitted, you cannot modify parameter settings in the files.
Configure and check
workersbefore you start the service. You cannot add an item to or remove an item from the file after the service is started.
If you use ossimport in distributed mode to perform migration jobs, you need to perform the following steps in most cases:
Deploy the service. To do so, run the bash console.sh deploy command in the Linux terminal. This command deploys ossimport to all machines specified in the conf/workers configuration file.Note
Ensure that the configuration files conf/job.cfg and conf/workers are properly configured before you deploy the service.
Clear jobs with the same name. If you have run a job with the same name and want to run the job again, clear the job with the same name first. If you have not run the job or you want to retry the tasks of a failed job, do not run the clean command. To clear a job with the same name, run
bash console.sh clean job_namein the Linux terminal.
Submit the data migration job. You cannot submit jobs with the same name. If you have jobs with the same name, run the clean command to clear the jobs.
A configuration file is required to submit a job. You can create a job configuration file based on the
conf/job.cfgtemplate file. To submit a job, run
bash console.sh submit [job_cfg_file]in the Linux terminal. The
job_cfg_fileparameter in the command is optional and is set to
$OSS_IMPORT_HOME/conf/job.cfgby default, where
$OSS_IMPORT_HOMEis the directory that contains
Start the service. To do so, run
bash console.sh startin the Linux terminal.
View the job status. To do so, run
bash console.sh statin the Linux terminal.
Retry failed tasks. Tasks may fail due to network issues or other reasons. When you run the retry command, only failed tasks are retried. To retry failed tasks, run
bash console.sh retry [job_name]in the Linux terminal. In the command, the optional
job_nameparameter specifies the job whose failed tasks you want to retry. If you do not configure this parameter, failed tasks of all jobs are retried.
Stop the service. To do so, run
bash console.sh stopin the Linux terminal.
If an error occurs because of incorrect parameters in a
bash console.shcommand, the correct command format is displayed.
We recommend that you specify absolute paths in configuration files and submitted jobs.
job.cfgfile contains job configuration items.Important
You cannot modify the configuration items in the file after the job submitted.
Common causes of job failures
The file in the source directory is modified during the upload process. In this case, the
SIZE_NOT_MATCHerror is recorded in the
log/audit.log, meaning that the original file is uploaded and modifications are not uploaded to OSS.
The source file is deleted during upload. This causes download failures.
The name of the file to upload does not conform to the naming rules of OSS. For example, upload fails if the name of the file to upload starts with a forward slash (/) or is left empty.
The source file fails to be downloaded.
The program exits unexpectedly and the job state is Abort. If this happens, contact our technical support team.
Job status and logs
After a job is submitted, the master splits the job into tasks, the workers run the tasks, and the tracker collects the task status. After a job is completed, the structure of the workdir directory is as follows:
workdir ├── bin │ ├── console.jar # The JAR package for the Console module. │ ├── master.jar # The JAR package for the Master module. │ ├── tracker.jar # The JAR package for the Tracker module. │ └── worker.jar # The JAR package for the Worker module. ├── conf │ ├── job.cfg # The Job configuration file template. │ ├── sys.properties # The configuration file that contains system parameters. │ └── workers # The list of workers. ├── logs │ ├── import.log # Migration logs. │ ├── master.log # Master logs. │ ├── tracker.log # Tracker logs. │ └── workers # Worker logs. ├── master │ ├── jobqueue # Jobs that are not split. │ └── jobs # Job status. │ └── xxtooss # Job names. │ ├── checkpoints # Checkpoints generated when the master splits jobs into tasks. │ │ └── 0 │ │ └── ED09636A6EA24A292460866AFDD7A89A.cpt │ ├── dispatched # Tasks that are dispatched to workers but not complete. │ │ └── 192.168.1.6 │ ├── failed_tasks # Failed tasks. │ │ └── A41506C07BF1DF2A3EDB4CE31756B93F_1499348973217@192.168.1.6 │ │ ├── audit.log # The logs of tasks. You can view the logs to identify error causes. │ │ ├── DONE # The mark file of successful tasks. If the task fails, the content is empty. │ │ ├── error.list # The list of task errors. You can view the errors in the file. │ │ ├── STATUS # The mark file that indicates task status. The content of this file is Failed or Completed, indicating that the task failed or succeeded. │ │ └── TASK # Description of the tasks. │ ├── pending_tasks # Tasks that are not dispatched. │ └── succeed_tasks # Tasks that run successfully. │ └── A41506C07BF1DF2A3EDB4CE31756B93F_1499668462358@192.168.1.6 │ ├── audit.log # The logs of tasks. You can view the logs to identify error causes. │ ├── DONE # The mark file of successful tasks. │ ├── error.list # The list of task errors. If the tasks are successful ,the error list is empty. │ ├── STATUS # The mark file that indicates task status. The content of this file is Failed or Completed, indicating that the task failed or succeeded. │ └── TASK # Description of the tasks. └── worker # Stores the status of the tasks being run by the worker. After tasks are run, they are managed by the master. └── jobs ├── local_test2 │ └── tasks └── local_test_4 └── tasksImportant
To view the information about job running, check
To troubleshoot failed tasks, check
To view errors, check
The preceding logs are only for your reference. Do not deploy your services and applications based on them.
Common errors and troubleshooting
For more information about common errors and troubleshooting, see FAQ.