This topic describes how to use Spark SQL to perform data analysis and interactive development on the Tablestore data source.

CREATE TABLE syntax

CREATE TABLE tbName
USING tablestore
OPTIONS(propertyName=propertyValue[,propertyName=propertyValue]*);

Parameters

Parameter Description
access.key.id The AccessKey ID.
access.key.secret The AccessKey secret.
endpoint The endpoint of the Tablestore API.
table.name The name of the Tablestore data table.
instance.name The name of the Tablestore instance.
batch.update.size The number of data entries updated in the Tablestore data table in each batch. The default value is 0, which indicates that data entries are updated one by one.

This parameter takes effect only when you write data into a database.

catalog The description of fields in the Tablestore data table, in JSON format.
The following code provides an example of the catalog configuration for the schema of a Tablestore data table named table1:
{"columns":{
  "col0":{"cf":"cf0", "col":"col0", "type":"string"},
  "col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
  "col2":{"cf":"cf2", "col":"col2", "type":"double"},
  "col3":{"cf":"cf3", "col":"col3", "type":"float"},
  "col4":{"cf":"cf4", "col":"col4", "type":"int"},
  "col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
  "col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
  "col7":{"cf":"cf7", "col":"col7", "type":"string"},
  "col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
}

Table schema

When you create a Tablestore data table, you do not need to explicitly define the fields in the data table. Example:
spark-sql> CREATE DATABASE IF NOT EXISTS default;
spark-sql> USE default;
spark-sql> DROP TABLE IF EXISTS ots_table_test;
spark-sql> CREATE TABLE ots_table_test
         > USING tablestore
         > OPTIONS(
         > endpoint="http://xxx.cn-hangzhou.vpc.ots.aliyuncs.com",
         > access.key.id="yHiu*******BG2s",
         > access.key.secret="ABctuw0M***************iKKljZy",
         > table.name="test",
         > instance.name="myInstance",
         > batch.update.size="100",
         > catalog='{"columns":{"pk":{"col":"pk","type":"string"},"data":{"col":"data","type":"string"}}}');

spark-sql> DESC ots_table_test;
pk  string  NULL
data  string  NULL
Time taken: 0.501 seconds, Fetched 2 row(s)