Data import scenarios - AnalyticDB for MySQL - Alibaba Cloud Documentation Center

Test services and specifications


Service	Specifications
AnalyticDB for MySQL 3.0 in the Alibaba Cloud public cloud	An AnalyticDB for MySQL cluster in elastic mode for Cluster Edition with one worker node (24 cores).
ElasticSearch 6.7.0	An Elasticsearch cluster of the Standard Edition with one node (24 cores).
AnalyticDB for MySQL 3.0 on physical servers	Three physical servers. Each server has 32 vCPUs, 128 GiB of memory, a 3.84 TB SSD, a 960 GB SSD, and twelve 8 TB HDDs.

Test environment


Test environment	Configuration
ECS	Two Elastic Compute Service (ECS) instances with 32 vCPUs, 128 GiB of memory, and 3,576 GiB of local NVMe SSD storage. Note The ECS instances must be deployed in the same zone as AnalyticDB for MySQL and Elasticsearch clusters with sufficient bandwidths.

Test methods

Method to write data in real time to AnalyticDB for MySQL V3.0 in the Alibaba Cloud public cloud: On the ECS instances, use the Java program to read multiple local TPC-H fragment files and implement multi-threaded import of 2,000 rows per batch based on JDBC. The following SQL statement is used for the import: INSERT INTO lineitem values (...).
Method to write data in real time to AnalyticDB for MySQL V3.0 on physical servers: On the ECS instances, use the Java program to read multiple local TPC-H fragment files and implement multi-threaded import of 2,000 rows per batch based on JDBC. The size of each row is 350 bits. Import results can be queried in less than 1 second. The following SQL statement is used for the import: INSERT INTO lineitem values (...).
Method to write data in real time to Elasticsearch: On the ECS instances, use the Python program to read multiple local TPC-H fragment files and implement multi-threaded import of 2,000 rows per batch based on the Elasticsearch repository.

Note Data used in the test comes from the TPC-H dataset. For more information. visit the TPC-H official website.

Test results


Number of concurrent threads from the client	TPS (AnalyticDB for MySQL V3.0 in the Alibaba Cloud public cloud)	TPS (AnalyticDB for MySQL V3.0 on physical servers)	TPS (Elasticsearch)
8	33033	120192	12211
16	56816	218472	7165
32	95083	398087	6267
64	153857	643618	5890
128	186732	787572	5516

Table creation statements

The following statement is used to create a table in the AnalyticDB for MySQL cluster:

CREATE TABLE `lineitem` (
 `l_orderkey` bigint NOT NULL COMMENT '',
 `l_partkey` int NOT NULL COMMENT '',
 `l_suppkey` int NOT NULL COMMENT '',
 `l_linenumber` int NOT NULL COMMENT '',
 `l_quantity` decimal(15, 2) NOT NULL COMMENT '',
 `l_extendedprice` decimal(15, 2) NOT NULL COMMENT '',
 `l_discount` decimal(15, 2) NOT NULL COMMENT '',
 `l_tax` decimal(15, 2) NOT NULL COMMENT '',
 `l_returnflag` varchar NOT NULL COMMENT '',
 `l_linestatus` varchar NOT NULL COMMENT '',
 `l_shipdate` date NOT NULL COMMENT '',
 `l_commitdate` date NOT NULL COMMENT '',
 `l_receiptdate` date NOT NULL COMMENT '',
 `l_shipinstruct` varchar NOT NULL COMMENT '',
 `l_shipmode` varchar NOT NULL COMMENT '',
 `l_comment` varchar NOT NULL COMMENT ''
PRIMARY KEY(l_orderkey)
) DISTRIBUTE BY HASH(`l_orderkey`) INDEX_ALL='Y'

The following statement is used to create a table in the Elasticsearch cluster:

curl -X PUT 'http://es_ip:9200/tpch' \
-H 'Content-Type: application/json' \
-d '{
    "settings": {
        "number_of_shards": 32,
        "number_of_replicas" : 2
    },
    "mappings": {
         "lineitem": { 
              "properties": {
               "L_ORDERKEY": {
                  "type": "integer"
               },
               "L_PARTKEY": {
                  "type": "integer"
               },
               "L_SUPPKEY": {
                  "type": "integer"
               },
               "L_LINENUMBER": {
                  "type": "integer"
               },
               "L_QUANTITY": {
                  "type": "double"
               },
               "L_EXTENDEDPRICE": {
                  "type": "double"
               },
               "L_DISCOUNT": {
                  "type": "double"
               },
               "L_TAX": {
                  "type": "double"
               },
               "L_RETURNFLAG": {
                  "type": "keyword"
               },
               "L_LINESTATUS": {
                  "type": "keyword"
               },
               "L_SHIPDATE": {
                  "type": "date"
               },
               "L_COMMITDATE": {
                  "type": "date"
               },
               "L_RECEIPTDATE": {
                  "type": "date"
               },
               "L_SHIPINSTRUCT": {
                  "type": "keyword"
               },
               "L_SHIPMODE": {
                  "type": "keyword"
               },
               "L_COMMENT": {
                  "type": "keyword"
               }
            }
          }
     }
}'

The following script is used to import data:

from threading import Thread
from elasticsearch import Elasticsearch


def func(i):
    es = Elasticsearch(hosts=[
        "es_ip:9200"
    ])
    idx = 0
    with open(r"lineitem.tbl.{}".format(i)) as f:
        actions = []
        while 1:
            r = f.readlines(2000)
            if not r:
                break
            for i in r:
                data = i.split('|')
                body = {
                    'L_ORDERKEY': int(data[0]),
                    'L_PARTKEY': int(data[1]),
                    'L_SUPPKEY': int(data[2]),
                    'L_LINENUMBER': int(data[3]),
                    'L_QUANTITY': float(data[4]),
                    'L_EXTENDEDPRICE': float(data[5]),
                    'L_DISCOUNT': float(data[6]),
                    'L_TAX': float(data[7]),
                    'L_RETURNFLAG': data[8],
                    'L_LINESTATUS': data[9],
                    'L_SHIPDATE': data[10],
                    'L_COMMITDATE': data[11],
                    'L_RECEIPTDATE': data[12],
                    'L_SHIPINSTRUCT': data[13],
                    'L_SHIPMODE': data[14],
                    'L_COMMENT': data[15]
                }
                actions.append({"index": {"_index": "tpch", "_type": "lineitem", "routing": int(data[0])}})
                actions.append(body)
                idx += 1
            es.bulk(actions)
            actions = []
            print(idx)


if __name__ == '__main__':
    for i in range(0, 16):
        Thread(target=func, args=(i + 1,)).start()