All Products
Search
Document Center

DataHub:Console command-line tool

Last Updated:Nov 11, 2021

Console command-line tool

Preparations

Download the datahub_console.tar.gz package and decompress the package. Enter the AccessKey pair and the endpoint in the datahub.properties file in the conf directory. After the configuration is complete, run the script in the bin directory.

Note:

  • Java Development Kit (JDK) 1.8 or later is used.

Instructions

Basic operations

  • View all executable commands.

  • Append another command to help to view the parameters required by the command. For example, run the help lt command to view the parameters required by the lt command.

help
  • Clear the screen.

clear
  • Exit the program.

exit
  • View the details of a command error. After a command error is reported, you can run the stacktrace command to view the error details.

stacktrace
  • Run a multi-command script.

script

Mange projects

  • Create a project.

cp -p test_project  -c test_comment
  • Delete a project.

dp -p test_project
  • Query projects.

lp

Manage topics

  • Create a topic.

    • -m: the data type of the topic. A value of BLOB indicates that a topic of the BLOB type is to be created. A value of TUPLE indicates that a topic of the TUPLE type is to be created.

    • The fields for a topic of the TUPLE type are in the format of [(fieldName,fieldType,isNull)]. Separate the fields with commas (,).

ct -p test_project -t test_topic -m TUPLE -f [(name,string,true)] -s 3 -l 3 -c test_comment
  • Delete a topic.

dt test_project test_topic
  • Query the information about a topic.

gt -p test_project -t test_topic
  • Export the schema of a topic as a JSON file.

gts -f filepath -p test_project -t test_topic
  • Query topics.

lt -p test_project
  • Import a JSON file to create a topic.

rtt -s 3 -l 3 -c test_comment -f filepath -p test_project -t test_topic

Manage DataConnectors

  • Create a DataConnector to synchronize data to MaxCompute.

    • -m: the partition mode. Valid values: SYSTEM_TIME, USER_DEFINE, EVENT_TIME, and META_TIME.

    • -tr: the partition interval. Default value: 60. Unit: minutes.

    • -tf: the partition format. A value of ds indicates that data is partitioned by day. A value of ds hh indicates that data is partitioned by hour. A value of ds hh mm indicates that data is partitioned by minute.

coc -p test_project -t test_topic -m SYSTEM_TIME -e odpsEndpoint -op odpsProject -ot odpsTable -oa odpsAccessId -ok odpsAccessKey -tr 60 -c (field1,field2) -tf ds hh mm
  • Create a field for a DataConnector.

acf -p test_project -t test_topic -c connectorId -f fieldName
  • Create a DataConnector to synchronize data to ApsaraDB RDS, ApsaraDB RDS for MySQL, or AnalyticDB for MySQL.

    • -ty: the type of the DataConnector. Valid values:

      • mysql: creates a DataConnector to synchronize data to ApsaraDB RDS for MySQL.

      • ads: creates a DataConnector to synchronize data to AnalyticDB for MySQL.

    • -ht: the write mode. Valid values:

      • IGNORE

      • OVERWRITE

    • -n: the fields to be synchronized. Example: (field1,field2).

cdc -p test_project -t test_topic -h host -po 3306 -ty mysql -d mysql_database -ta msyql_table -u username -pa password -ht IGNORE -n (field1,field2)
  • Create a DataConnector.

    • -m: the authentication method. Valid values:

      • AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.

      • STS: uses a Security Token Service (STS) token for authentication.

cdhc -p test_project -t test_topic -sp sourceProject -st sourceTopic -m AK -i accessid k accessKey
  • Create a DataConnector to synchronize data to Function Compute.

    • -au: the authentication method. Valid values:

      • AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.

      • STS: uses an STS token for authentication.

    • -n: the fields to be synchronized. Example: (field1,field2).

cfc -p test_project -t test_topic -e endpoint -s service -f function -au AK -i accessId -k accessKey -n (field1,field2)
  • Create a DataConnector to synchronize data to Hologres.

    • -au: the authentication method. You can use only the AccessKey pair for authentication.

    • -m: the parsing type. If you set this option to Delimiter, you must specify the lineDelimiter, parseData, and columnDelimiter fields. If you set this option to IngormaticaJson, you must specify the parseData field.

      • Delimiter

      • InformaticaJson

chc -p test_project -t test_topic -e endpoint -cl (field,field2) -au AK -hp holoProject -ht holoTopic -i accessId -k accessKey -m Delimiter -l 1 -b false -n (field1,field2)
  • Create a DataConnector to synchronize data to Tablestore.

    • -m: the authentication method. Default value: STS. Valid values:

      • AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.

      • STS: uses an STS token for authentication.

    • -wm: the write mode. Valid values:

      • PUT

      • UPDATE

    • -c: the fields to be synchronized. Example: (field1,field2).

cotsc -p test_project -t test_topic -i accessId -k accessKey -it instanceId -m AK -t table -wm PUT -c (field1,field2)
  • Create a DataConnector to synchronize data to Object Storage Service (OSS).

csc -p test_project -t test_topic -b bucket -e endpoint -pr ossPrefix -tf ossTimeFormat -tr timeRange -c (f1,f2)
  • Delete a DataConnector.

    • You can specify multiple DataConnector IDs. Separate the IDs with spaces.

dc -p test_project -t test_topic -c connectorId
  • Query the details of a DataConnector.

gc -p test_project -t test_topic -c connectorId
  • Query the DataConnectors in a topic.

lc -p test_project -t test_topic
  • Restart a DataConnector.

rc -p test_project -t test_topic -c connectorId
  • Update the AccessKey pair for a DataConnector.

uca -p test_project -t test_topic -c connectorId -ty connectorType -a accessId -k accessKey

Manage shards

  • Merge shards.

ms -p test_project -t test_topic -s shardId -a adjacentShardId
  • Split a shard.

ss -p test_project -t test_topic -s shardId
  • Query all shards in a topic.

ls -p test_project -t topicName
  • Query the synchronization status of a shard.

gcs -p test_project -t test_topic -s shardId -c connectorId
  • Query the consumer offset of each shard.

gso -p test_project -t test_topic -s subid -i shardId

Manage subscriptions

  • Create a subscription.

css -p test_project -t test_topic -c comment
  • Delete a subscription.

dsc -p test_project -t test_topic -s subId
  • Query subscriptions.

lss -p test_project -t test_topic

Upload and download data

  • Upload data.

    • -f: the path of the file to be uploaded. Add escape characters for a path in Windows, for example, D:\\test\\test.txt.

    • -m: the text delimiter. Commas (,) and spaces can be used as delimiters.

    • -n: the size of the data to be uploaded each time. Default value: 1000.

uf -f filepath -p test_topic -t test_topic -m "," -n 1000

Example: Upload a CSV file

The following example shows how to use the console command-line tool to upload a CSV file to DataHub. The following shows the format of the CSV file.

1. 0,qe614c760fuk8judu01tn5x055rpt1,true,100.1,14321111111
2. 1,znv1py74o8ynn87k66o32ao4x875wi,true,100.1,14321111111
3. 2,7nm0mtpgo1q0ubuljjjx9b000ybltl,true,100.1,14321111111
4. 3,10t0n6pvonnan16279w848ukko5f6l,true,100.1,14321111111
5. 4,0ub584kw88s6dczd0mta7itmta10jo,true,100.1,14321111111
6. 5,1ltfpf0jt7fhvf0oy4lo8m3z62c940,true,100.1,14321111111
7. 6,zpqsfxqy9379lmcehd7q8kftntrozb,true,100.1,14321111111
8. 7,ce1ga9aln346xcj761c3iytshyzuxg,true,100.1,14321111111
9. 8,k5j2id9a0ko90cykl40s6ojq6gruyi,true,100.1,14321111111
10. 9,ns2zcx9bdip5y0aqd1tdicf7bkdmsm,true,100.1,14321111111
11. 10,54rs9cm1xau2fk66pzyz62tf9tsse4,true,100.1,14321111111

Each line is a record to be written to DataHub. Fields are separated by commas (,). Save the CSV file as /temp/test.csv on the on-premises computer. The following table describes the schema of the DataHub topic to which the CSV file is written.

Field name

Data type

id

BIGINT

name

STRING

gender

BOOLEAN

salary

DOUBLE

my_time

TIMESTAMP

Run the following command by using the console command-line tool:

uf -f /temp/test.csv -p test_topic -t test_topic -m "," -n 1000
  • Download data.

    • -f: the storage path of the file to be downloaded. Add escape characters for a path in Windows, for example, D:\\test\\test.txt.

    • -ti: the offset from which you want to read data, in the format of yyyy-mm-dd hh:mm:ss.

    • -l: the size of the data that is to be read each time.

    • -g: specifies whether to read data all the time.

      • 0: reads data only once. No more consumption occurs after the specified size of data is read.

      • 1: reads data all the time.

down -p test_project -t test_topic -s shardId -d subId -f filePath -ti "1970-01-01 00:00:00" -l 100 -g 0

FAQ

  • Failed to start the script: Run the script in Windows to check whether the path of the script contains parentheses ().