Console command-line tool

Preparations

Download the datahub_console.tar.gz package and decompress the package. Enter the AccessKey pair and the endpoint in the datahub.properties file in the conf directory. After the configuration is complete, run the script in the bin directory.

Note:

Java Development Kit (JDK) 1.8 or later is used.

Instructions

Basic operations

View all executable commands.
Append another command to help to view the parameters required by the command. For example, run the help lt command to view the parameters required by the lt command.

help

Clear the screen.

clear

Exit the program.

exit

View the details of a command error. After a command error is reported, you can run the stacktrace command to view the error details.

stacktrace

Run a multi-command script.

script

Mange projects

Create a project.

cp -p test_project  -c test_comment

Delete a project.

dp -p test_project

Query projects.

lp

Manage topics

Create a topic.
- -m: the data type of the topic. A value of BLOB indicates that a topic of the BLOB type is to be created. A value of TUPLE indicates that a topic of the TUPLE type is to be created.
- The fields for a topic of the TUPLE type are in the format of [(fieldName,fieldType,isNull)]. Separate the fields with commas (,).

ct -p test_project -t test_topic -m TUPLE -f [(name,string,true)] -s 3 -l 3 -c test_comment

Delete a topic.

dt test_project test_topic

Query the information about a topic.

gt -p test_project -t test_topic

Export the schema of a topic as a JSON file.

gts -f filepath -p test_project -t test_topic

Query topics.

lt -p test_project

Import a JSON file to create a topic.

rtt -s 3 -l 3 -c test_comment -f filepath -p test_project -t test_topic

Manage DataConnectors

Create a DataConnector to synchronize data to MaxCompute.
- -m: the partition mode. Valid values: SYSTEM_TIME, USER_DEFINE, EVENT_TIME, and META_TIME.
- -tr: the partition interval. Default value: 60. Unit: minutes.
- -tf: the partition format. A value of ds indicates that data is partitioned by day. A value of ds hh indicates that data is partitioned by hour. A value of ds hh mm indicates that data is partitioned by minute.

coc -p test_project -t test_topic -m SYSTEM_TIME -e odpsEndpoint -op odpsProject -ot odpsTable -oa odpsAccessId -ok odpsAccessKey -tr 60 -c (field1,field2) -tf ds hh mm

Create a field for a DataConnector.

acf -p test_project -t test_topic -c connectorId -f fieldName

Create a DataConnector to synchronize data to ApsaraDB RDS, ApsaraDB RDS for MySQL, or AnalyticDB for MySQL.
- -ty: the type of the DataConnector. Valid values:
  - mysql: creates a DataConnector to synchronize data to ApsaraDB RDS for MySQL.
  - ads: creates a DataConnector to synchronize data to AnalyticDB for MySQL.
- -ht: the write mode. Valid values:
  - IGNORE
  - OVERWRITE
- -n: the fields to be synchronized. Example: (field1,field2).

cdc -p test_project -t test_topic -h host -po 3306 -ty mysql -d mysql_database -ta msyql_table -u username -pa password -ht IGNORE -n (field1,field2)

Create a DataConnector.
- -m: the authentication method. Valid values:
  - AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
  - STS: uses a Security Token Service (STS) token for authentication.

cdhc -p test_project -t test_topic -sp sourceProject -st sourceTopic -m AK -i accessid k accessKey

Create a DataConnector to synchronize data to Function Compute.
- -au: the authentication method. Valid values:
  - AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
  - STS: uses an STS token for authentication.
- -n: the fields to be synchronized. Example: (field1,field2).

cfc -p test_project -t test_topic -e endpoint -s service -f function -au AK -i accessId -k accessKey -n (field1,field2)

Create a DataConnector to synchronize data to Hologres.
- -au: the authentication method. You can use only the AccessKey pair for authentication.
- -m: the parsing type. If you set this option to Delimiter, you must specify the lineDelimiter, parseData, and columnDelimiter fields. If you set this option to IngormaticaJson, you must specify the parseData field.
  - Delimiter
  - InformaticaJson

chc -p test_project -t test_topic -e endpoint -cl (field,field2) -au AK -hp holoProject -ht holoTopic -i accessId -k accessKey -m Delimiter -l 1 -b false -n (field1,field2)

Create a DataConnector to synchronize data to Tablestore.
- -m: the authentication method. Default value: STS. Valid values:
  - AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
  - STS: uses an STS token for authentication.
- -wm: the write mode. Valid values:
  - PUT
  - UPDATE
- -c: the fields to be synchronized. Example: (field1,field2).

cotsc -p test_project -t test_topic -i accessId -k accessKey -it instanceId -m AK -t table -wm PUT -c (field1,field2)

Create a DataConnector to synchronize data to Object Storage Service (OSS).

csc -p test_project -t test_topic -b bucket -e endpoint -pr ossPrefix -tf ossTimeFormat -tr timeRange -c (f1,f2)

Delete a DataConnector.
- You can specify multiple DataConnector IDs. Separate the IDs with spaces.

dc -p test_project -t test_topic -c connectorId

Query the details of a DataConnector.

gc -p test_project -t test_topic -c connectorId

Query the DataConnectors in a topic.

lc -p test_project -t test_topic

Restart a DataConnector.

rc -p test_project -t test_topic -c connectorId

Update the AccessKey pair for a DataConnector.

uca -p test_project -t test_topic -c connectorId -ty connectorType -a accessId -k accessKey

Manage shards

Merge shards.

ms -p test_project -t test_topic -s shardId -a adjacentShardId

Split a shard.

ss -p test_project -t test_topic -s shardId

Query all shards in a topic.

ls -p test_project -t topicName

Query the synchronization status of a shard.

gcs -p test_project -t test_topic -s shardId -c connectorId

Query the consumer offset of each shard.

gso -p test_project -t test_topic -s subid -i shardId

Manage subscriptions

Create a subscription.

css -p test_project -t test_topic -c comment

Delete a subscription.

dsc -p test_project -t test_topic -s subId

Query subscriptions.

lss -p test_project -t test_topic

Upload and download data

Upload data.
- -f: the path of the file to be uploaded. Add escape characters for a path in Windows, for example, D:\\test\\test.txt.
- -m: the text delimiter. Commas (,) and spaces can be used as delimiters.
- -n: the size of the data to be uploaded each time. Default value: 1000.

uf -f filepath -p test_topic -t test_topic -m "," -n 1000

Example: Upload a CSV file

The following example shows how to use the console command-line tool to upload a CSV file to DataHub. The following shows the format of the CSV file.

1. 0,qe614c760fuk8judu01tn5x055rpt1,true,100.1,14321111111
2. 1,znv1py74o8ynn87k66o32ao4x875wi,true,100.1,14321111111
3. 2,7nm0mtpgo1q0ubuljjjx9b000ybltl,true,100.1,14321111111
4. 3,10t0n6pvonnan16279w848ukko5f6l,true,100.1,14321111111
5. 4,0ub584kw88s6dczd0mta7itmta10jo,true,100.1,14321111111
6. 5,1ltfpf0jt7fhvf0oy4lo8m3z62c940,true,100.1,14321111111
7. 6,zpqsfxqy9379lmcehd7q8kftntrozb,true,100.1,14321111111
8. 7,ce1ga9aln346xcj761c3iytshyzuxg,true,100.1,14321111111
9. 8,k5j2id9a0ko90cykl40s6ojq6gruyi,true,100.1,14321111111
10. 9,ns2zcx9bdip5y0aqd1tdicf7bkdmsm,true,100.1,14321111111
11. 10,54rs9cm1xau2fk66pzyz62tf9tsse4,true,100.1,14321111111

Each line is a record to be written to DataHub. Fields are separated by commas (,). Save the CSV file as /temp/test.csv on the on-premises computer. The following table describes the schema of the DataHub topic to which the CSV file is written.

Field name	Data type
id	BIGINT
name	STRING
gender	BOOLEAN
salary	DOUBLE
my_time	TIMESTAMP

Run the following command by using the console command-line tool:

uf -f /temp/test.csv -p test_topic -t test_topic -m "," -n 1000

Download data.
- -f: the storage path of the file to be downloaded. Add escape characters for a path in Windows, for example, D:\\test\\test.txt.
- -ti: the offset from which you want to read data, in the format of yyyy-mm-dd hh:mm:ss.
- -l: the size of the data that is to be read each time.
- -g: specifies whether to read data all the time.
  - 0: reads data only once. No more consumption occurs after the specified size of data is read.
  - 1: reads data all the time.

down -p test_project -t test_topic -s shardId -d subId -f filePath -ti "1970-01-01 00:00:00" -l 100 -g 0

FAQ

Failed to start the script: Run the script in Windows to check whether the path of the script contains parentheses ().