Survey resources in Hive - Migration Hub - Alibaba Cloud Documentation Center

This topic describes how to use the hive-scanner tool provided by Cloud Migration Hub (CMH) to survey resources in your Hive system step by step.

Prepare the environment

Before you survey resources in your Hive system, you must install the hive-scanner tool on your tool server. For more information, see Prepare for using the hive-scanner tool.

Make sure that the tool runs in the following working directory on your tool server:

|-hive-scanner/
            |-application.yml
            |-hms-data-scan-0.0.1-SNAPSHOT.jar
            |-start.sh

Run the tool

1. Edit the application.yml configuration file. Modify the url, username, password, exportFilePath, and HiveServerIp parameters.

spring:
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
   
  url: jdbc:mysql://******:3306/db # The address of the Hive metastore database.
    username: username # The username used to log on to the database.
    password: password # The password used to log on to the database. 

scan:
  
 exportFilePath: cmh-meta-data.json # The name of the file for storing the survey data.
  hiveServerIp: 120.77.*.* # The IP address of the Hive server. 

logging:
  level:
    root: info # The log level.

2. Run the tool.

You can run the following command on the CLI to conduct a survey. Then, you can analyze the survey results on the tool server.

sh  start.sh

The following figure shows the logs that are recorded in the hms-scan.log file after you run the tool.

Analyze data on the tool server

After you run the hive-scanner tool, a xxx.json file is generated in the output directory.

Open the xxx.json file. You can view the overview of resources and the top object lists.

{
    "url": "hive ip",
    "hiveVersion": "Version number of Hive",
    "hiveMetaDbStatList": [ // The resource statistics by database.
        {
            "transactionalTableNum": "Number of transactional tables",
            "externalTableNum": "Number of external tables",
            "dbName": "Database name",
            "dbSize": "Database size",
            "functionNum": "Number of functions",
            "tableNum": "Total number of tables",
            "source": "Resource name: IP address/Database name",
            "partitionTableNum": "Number of partitioned tables",
            "viewTableNum": "Number of view tables",
            "top10PartBySize": [ // The top 10 partitions by size.
                {
                    "partName": "Table name.Partition name ",
                    "totalSize": "Partition size"
                }
            ],
            "top10TableBySize": [ // The top 10 tables by size.
                {
                    "tblName": "Table name",
                    "totalSize": "Table size"
                }
            ],
            "top10TableByPartNum": [ // The top 10 tables by the number of partitions.
                {
                    "tblName": "Table name",
                    "partitionNum": "Number of partitions"
                }
            ]
        }
    ]  
}

Upload resource information

After you verify that the data is correct on the tool server, you can upload the JSON file to CMH.

Log on to the CMH console. In the left-side navigation pane, choose Discovery > Research tools. Click the Offline collection tab. In the HIVE collection section, click Upload. In the HIVE collection dialog box, upload the JSON file.

After the JSON file is uploaded, you can view the import task in the Research task list section. Click the task name to view the uploaded resource information. If the resource information is correct, click Resource confirmation in the operation column to import the resource information to CMH.