If you use Data Lake Analytics (DLA) to query data from Tablestore, you can add hints to SQL statements. This allows you to optimize query performance by using one of the following methods:
DLA is a distributed computing system that shards each table in Tablestore based on specific rules. The shards do not overlap with each other. The computing layer concurrently executes these shards to improve the concurrency of computing. You can use the following parameters to shard tables:
ots-split-unit-mbparameter specifies the size of each shard. Valid values: 1 to 102400. Default value: 10.
If this parameter is set to a too small value for a table that contains large amounts of data, a large number of shards are generated. As a result, timeouts frequently occur because Tablestore cannot promptly execute these shards. This affects the stability of data writing. If the table you want to access contains more than 10 GB of data, we recommend that you set this parameter to 256. If the table contains more than 100 GB of data, we recommend that you set this parameter to a value greater than 2048.
If this parameter is set to a too large value for a table that contains very small amounts of data, DLA may not concurrently execute the shards, which causes a long latency. In this situation, decrease the value of this parameter.
ots-split-optimizeparameter specifies whether to enable the function to automatically optimize sharding. If this parameter is set to true, this function is enabled. Default value: false.
In some cases, SQL statements automatically generate a large number of shards. As a result, timeouts occur because Tablestore cannot promptly execute these shards. In addition, the shard list generated by Tablestore is mostly sequential. Adjacent shards are stored on the same host. As a result, hotspot problems may occur when DLA accesses Tablestore.
DLA allows you to prevent these problems by taking measures, such as automatically dispersing shards, merging adjacent shards, and distributing shards at random.
ots-split-size-ratioparameter specifies the proportion that is used to control the total number of shards after adjacent shards are merged. You can specify this parameter only after
ots-split-optimizeis set to true. Valid values: 0.0001‒1.0000.
If you have 100,000 shards and set the
ots-split-size-ratioparameter to 0.3, the system merges adjacent shards and reduces the total number of shards to about 30,000. The total number of shards after merging is based on the service capacity of Tablestore.
ots-insert-as-updateparameter specifies whether to replace the INSERT statement with the UPDATE statement. Valid values: true and false. Default value: false.
Tablestore SDKs provide the
RowUpdateChangeoperations to update data. This update does not change the values of the primary keys. You can call the
RowPutChangeoperation to update data by overwriting an entire row based on primary keys. You can also call the
RowUpdateChangeoperation to update data in some non-primary key columns based on primary keys. DLA does not support the UPDATE statement. You can only use the INSERT statement and hints to update the data stored in Tablestore.
ots-loose-castparameter specifies whether to use loose type casting in SQL statements, for example, whether to convert the data type LONG to DOUBLE or DOUBLE to LONG. Valid values: true and false. Default value: false.
DLA runs a strong-typed language. If the data type of a field in Tablestore is BIGINT and DLA defines the field type as VARCHAR or STRING, DLA returns an error when SQL statements are executed. In this case, you can set the
ots-loose-castparameter to true to enable loose type casting. After loose type casting is enabled, DLA uses this function to automatically convert the data type. For example,
"123"of the STRING type is converted to
123of the INTEGER type.
Use search indexes
ots-index-first parameter specifies whether to query tables by using search indexes first. If this parameter is set to auto, DLA queries tables by using search indexes first.
In most cases, DLA queries and analyzes data from the base table of Tablestore. If you enable search indexes from which you want to query data, DLA uses search indexes whenever possible.
Search indexes of DLA differ from strong consistent indexes such as the secondary indexes of MySQL in the following ways:
Search indexes of DLA are based on asynchronous replication and not strongly consistent with the base table.
Search indexes of DLA are based on inverted indexes rather than B-Tree indexes.
After search indexes are enabled, DLA compares indexes with what you query to determine whether search indexes can be used.
Control the number of concurrent requests on a single DLA node
task-concurrency parameter specifies the number of concurrent requests on a single DLA node. Valid values: integers from 1 to 32. Default value: 32.
A DLA virtual cluster (VC) has multiple nodes, and data is obtained from Tablestore in highly concurrent requests. If a large number of requests time out or are retried on Tablestore, the number of requests on DLA becomes too large. This affects the stability of the DLA VC. For security purposes, if the execution of SQL statements fails due to timeouts that occur on Tablestore, you must control the concurrency of requests on DLA, for example, set the
task-concurrency parameter to 2.