Console command-line tool
Preparations
Download the datahub_console.tar.gz package and decompress the package. Enter the AccessKey pair and the endpoint in the datahub.properties file in the conf directory. After the configuration is complete, run the script in the bin directory.
Note:
Java Development Kit (JDK) 1.8 or later is used.
Instructions
Basic operations
View all executable commands.
Append another command to help to view the parameters required by the command. For example, run the help lt command to view the parameters required by the lt command.
helpClear the screen.
clearExit the program.
exitView the details of a command error. After a command error is reported, you can run the stacktrace command to view the error details.
stacktraceRun a multi-command script.
scriptMange projects
Create a project.
cp -p test_project -c test_commentDelete a project.
dp -p test_projectQuery projects.
lpManage topics
Create a topic.
-m: the data type of the topic. A value of BLOB indicates that a topic of the BLOB type is to be created. A value of TUPLE indicates that a topic of the TUPLE type is to be created.
The fields for a topic of the TUPLE type are in the format of [(fieldName,fieldType,isNull)]. Separate the fields with commas (,).
ct -p test_project -t test_topic -m TUPLE -f [(name,string,true)] -s 3 -l 3 -c test_commentDelete a topic.
dt test_project test_topicQuery the information about a topic.
gt -p test_project -t test_topicExport the schema of a topic as a JSON file.
gts -f filepath -p test_project -t test_topicQuery topics.
lt -p test_projectImport a JSON file to create a topic.
rtt -s 3 -l 3 -c test_comment -f filepath -p test_project -t test_topicManage DataConnectors
Create a DataConnector to synchronize data to MaxCompute.
-m: the partition mode. Valid values: SYSTEM_TIME, USER_DEFINE, EVENT_TIME, and META_TIME.
-tr: the partition interval. Default value: 60. Unit: minutes.
-tf: the partition format. A value of ds indicates that data is partitioned by day. A value of ds hh indicates that data is partitioned by hour. A value of ds hh mm indicates that data is partitioned by minute.
coc -p test_project -t test_topic -m SYSTEM_TIME -e odpsEndpoint -op odpsProject -ot odpsTable -oa odpsAccessId -ok odpsAccessKey -tr 60 -c (field1,field2) -tf ds hh mmCreate a field for a DataConnector.
acf -p test_project -t test_topic -c connectorId -f fieldNameCreate a DataConnector to synchronize data to ApsaraDB RDS, ApsaraDB RDS for MySQL, or AnalyticDB for MySQL.
-ty: the type of the DataConnector. Valid values:
mysql: creates a DataConnector to synchronize data to ApsaraDB RDS for MySQL.
ads: creates a DataConnector to synchronize data to AnalyticDB for MySQL.
-ht: the write mode. Valid values:
IGNORE
OVERWRITE
-n: the fields to be synchronized. Example: (field1,field2).
cdc -p test_project -t test_topic -h host -po 3306 -ty mysql -d mysql_database -ta msyql_table -u username -pa password -ht IGNORE -n (field1,field2)Create a DataConnector.
-m: the authentication method. Valid values:
AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
STS: uses a Security Token Service (STS) token for authentication.
cdhc -p test_project -t test_topic -sp sourceProject -st sourceTopic -m AK -i accessid k accessKeyCreate a DataConnector to synchronize data to Function Compute.
-au: the authentication method. Valid values:
AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
STS: uses an STS token for authentication.
-n: the fields to be synchronized. Example: (field1,field2).
cfc -p test_project -t test_topic -e endpoint -s service -f function -au AK -i accessId -k accessKey -n (field1,field2)Create a DataConnector to synchronize data to Hologres.
-au: the authentication method. You can use only the AccessKey pair for authentication.
-m: the parsing type. If you set this option to Delimiter, you must specify the lineDelimiter, parseData, and columnDelimiter fields. If you set this option to IngormaticaJson, you must specify the parseData field.
Delimiter
InformaticaJson
chc -p test_project -t test_topic -e endpoint -cl (field,field2) -au AK -hp holoProject -ht holoTopic -i accessId -k accessKey -m Delimiter -l 1 -b false -n (field1,field2)Create a DataConnector to synchronize data to Tablestore.
-m: the authentication method. Default value: STS. Valid values:
AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
STS: uses an STS token for authentication.
-wm: the write mode. Valid values:
PUT
UPDATE
-c: the fields to be synchronized. Example: (field1,field2).
cotsc -p test_project -t test_topic -i accessId -k accessKey -it instanceId -m AK -t table -wm PUT -c (field1,field2)Create a DataConnector to synchronize data to Object Storage Service (OSS).
csc -p test_project -t test_topic -b bucket -e endpoint -pr ossPrefix -tf ossTimeFormat -tr timeRange -c (f1,f2)Delete a DataConnector.
You can specify multiple DataConnector IDs. Separate the IDs with spaces.
dc -p test_project -t test_topic -c connectorIdQuery the details of a DataConnector.
gc -p test_project -t test_topic -c connectorIdQuery the DataConnectors in a topic.
lc -p test_project -t test_topicRestart a DataConnector.
rc -p test_project -t test_topic -c connectorIdUpdate the AccessKey pair for a DataConnector.
uca -p test_project -t test_topic -c connectorId -ty connectorType -a accessId -k accessKeyManage shards
Merge shards.
ms -p test_project -t test_topic -s shardId -a adjacentShardIdSplit a shard.
ss -p test_project -t test_topic -s shardIdQuery all shards in a topic.
ls -p test_project -t topicNameQuery the synchronization status of a shard.
gcs -p test_project -t test_topic -s shardId -c connectorIdQuery the consumer offset of each shard.
gso -p test_project -t test_topic -s subid -i shardIdManage subscriptions
Create a subscription.
css -p test_project -t test_topic -c commentDelete a subscription.
dsc -p test_project -t test_topic -s subIdQuery subscriptions.
lss -p test_project -t test_topicUpload and download data
Upload data.
-f: the path of the file to be uploaded. Add escape characters for a path in Windows, for example, D:\\test\\test.txt.
-m: the text delimiter. Commas (,) and spaces can be used as delimiters.
-n: the size of the data to be uploaded each time. Default value: 1000.
uf -f filepath -p test_topic -t test_topic -m "," -n 1000Example: Upload a CSV file
The following example shows how to use the console command-line tool to upload a CSV file to DataHub. The following shows the format of the CSV file.
1. 0,qe614c760fuk8judu01tn5x055rpt1,true,100.1,14321111111
2. 1,znv1py74o8ynn87k66o32ao4x875wi,true,100.1,14321111111
3. 2,7nm0mtpgo1q0ubuljjjx9b000ybltl,true,100.1,14321111111
4. 3,10t0n6pvonnan16279w848ukko5f6l,true,100.1,14321111111
5. 4,0ub584kw88s6dczd0mta7itmta10jo,true,100.1,14321111111
6. 5,1ltfpf0jt7fhvf0oy4lo8m3z62c940,true,100.1,14321111111
7. 6,zpqsfxqy9379lmcehd7q8kftntrozb,true,100.1,14321111111
8. 7,ce1ga9aln346xcj761c3iytshyzuxg,true,100.1,14321111111
9. 8,k5j2id9a0ko90cykl40s6ojq6gruyi,true,100.1,14321111111
10. 9,ns2zcx9bdip5y0aqd1tdicf7bkdmsm,true,100.1,14321111111
11. 10,54rs9cm1xau2fk66pzyz62tf9tsse4,true,100.1,14321111111Each line is a record to be written to DataHub. Fields are separated by commas (,). Save the CSV file as /temp/test.csv on the on-premises computer. The following table describes the schema of the DataHub topic to which the CSV file is written.
Field name | Data type |
|---|---|
id | BIGINT |
name | STRING |
gender | BOOLEAN |
salary | DOUBLE |
my_time | TIMESTAMP |
Run the following command by using the console command-line tool:
uf -f /temp/test.csv -p test_topic -t test_topic -m "," -n 1000Download data.
-f: the storage path of the file to be downloaded. Add escape characters for a path in Windows, for example, D:\\test\\test.txt.
-ti: the offset from which you want to read data, in the format of yyyy-mm-dd hh:mm:ss.
-l: the size of the data that is to be read each time.
-g: specifies whether to read data all the time.
0: reads data only once. No more consumption occurs after the specified size of data is read.
1: reads data all the time.
down -p test_project -t test_topic -s shardId -d subId -f filePath -ti "1970-01-01 00:00:00" -l 100 -g 0FAQ
Failed to start the script: Run the script in Windows to check whether the path of the script contains parentheses ().