DML - ApsaraDB for HBase - Alibaba Cloud Documentation Center

This topic describes the DML syntax and provides examples on how to use the DML syntax.

SELECT

Queries data.

Syntax

select_statement ::=  SELECT [ JSON | HOTDATA ] ( select_clause | '*' )
                      FROM table_name
                      [ WHERE where_clause ]
                      [ GROUP BY group_by_clause ]
                      [ ORDER BY ordering_clause ]
                      [ PER PARTITION LIMIT (integer | bind_marker) ]
                      [ LIMIT (integer | bind_marker) ]
                      [ ALLOW FILTERING ]
select_clause    ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector         ::=  column_name
                      | term
                      | CAST '(' selector AS cql_type ')'
                      | function_name '(' [ selector ( ',' selector )* ] ')'
                      | COUNT '(' '*' ')'
where_clause     ::=  relation ( AND relation )*
relation         ::=  column_name operator term
                      '(' column_name ( ',' column_name )* ')' operator tuple_literal
                      TOKEN '(' column_name ( ',' column_name )* ')' operator term
operator         ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
group_by_clause  ::=  column_name ( ',' column_name )*
ordering_clause  ::=  column_name [ ASC | DESC ] ( ',' column_name [ ASC | DESC ] )*

Note

You can execute the SELECT statement to read one or more columns of one or more rows in a table. This statement returns a result set that contains the rows that match the query conditions specified in the request. Each row contains the values that match the specified query condition. You can also apply functions such as aggregate functions to the result set.
A SELECT statement contains at least one SELECT clause and the name of a table to which the SELECT clause applies. Take note that Lindorm Cassandra Query Language (CQL) does not support JOIN clauses or subqueries. Therefore, SELECT statements can apply only to a single table. In most cases, a SELECT statement contains a WHERE clause. You can also specify other clauses to sort or refine query results. If the ALLOW FILTERING clause is specified in SELECT statements, queries that contain filters are allowed.

Examples

CREATE TABLE persioninfo  (c1 int primary key, c2 text, c3 text);
SELECT c1, c2 FROM persioninfo WHERE c1 IN (199, 200, 207);
SELECT JSON c1, c2 FROM persioninfo WHERE c1 = 199;
SELECT c1 AS name, c2 AS address FROM persioninfo;


CREATE TABLE persioninfo  (c1 int, c2 text, c3 text, PRIMARY KEY(c1,c2));
SELECT *
FROM persioninfo
WHERE c1 = 12
  AND c2 > '2011-02-03'
  AND c2 <= '2012-01-01'

SELECT COUNT (*) AS user_count FROM persioninfo;

SELECT clauses in SELECT statements

A SELECT clause specifies the columns to query and return in the result set. The SELECT clause also converts the query results in the result set before the query results are returned. A SELECT clause can consist of comma-separated selectors. A SELECT clause can also contain only a wildcard character (*) to specify all the columns that are defined in a specified table. A SELECT statement can contain the following information:

Selectors
The following types of selectors are supported:
- column_name: specifies the name of a column in the specified table. The values of the specified column are queried.
- term: specifies a term. In most cases, a term is nested within other selectors, such as functions. If you specify only a term as a selector, the query returns a result set that contains rows whose column values match the term.
- CAST '(' selector AS cql_type ')': specifies a force conversion. This selector can be used to convert a nested selector into another selector that is supported by Lindorm CQL.
- function_name '(' [ selector ( ',' selector )* ] ')': specifies a function call in which parameters themselves are selectors.
- COUNT '(' '*' ')': specifies a special call to the COUNT function. The call is represented as COUNT(*). This call is used to calculate the number of non-empty results.
WRITETIME function
SELECT clauses support the WRITETIME function. This function can be used only in SELECT clauses. It accepts only one parameter that specifies a column name. This function can be used to query the metadata that is stored in each column. The metadata contains the timestamps for the values in the specified column.
where clause
The WHERE clause specifies the rows to query. A WHERE clause consists of relations on the primary key columns or columns that have secondary indexes defined. Not all relations can be specified in a query. For example, non-equity relations are not allowed on a partition key. The IN operator is considered as an equity relation. For a given partition key, the clustering columns are used to order rows. The relations that can be specified for clustering columns are restricted to the relations that can select a contiguous set of rows. For example, execute the following statement to create the posts table:
```
CREATE TABLE posts (
    userid text,
    blog_title text,
    posted_at timestamp,
    entry_title text,
    content text,
    category int,
    PRIMARY KEY (userid, blog_title, posted_at)
)    
```
You can execute the following statement to perform a query:
```
 SELECT entry_title, content FROM posts
 WHERE userid = 'john doe'
   AND blog_title='John''s Blog'
   AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31'
```
However, you cannot execute the following statement to perform a query. This is because the statement cannot query a contiguous set of rows when no secondary indexes are specified.
```
SELECT entry_title, content FROM posts
 WHERE userid = 'john doe'
   AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31'
```
The IN relation can be specified only on the last column of the partition key and the last column of the primary key. You can also use tuple notation to combine clustering columns. The following statement provides an example:
```
SELECT * FROM posts
 WHERE userid = 'john doe'
   AND (blog_title, posted_at) > ('John''s Blog', '2012-01-01')
```
The preceding statement is used to query the rows whose blog_tile is John's Blog and post_at is 2012-01-01 and sort the rows into clustering order. If a row whose post_at is less than or equal to 2012-01-01 and blog_title is greater than John's Blog, the row is returned. However, the following case is excluded:
```
SELECT * FROM posts
 WHERE userid = 'john doe'
   AND blog_title > 'John''s Blog'
   AND posted_at > '2012-01-01'
```
Tuple notation can also be used for IN clauses on clustering columns. The following statement provides an example:
```
SELECT * FROM posts
 WHERE userid = 'john doe'
   AND (blog_title, posted_at) IN (('John''s Blog', '2012-01-01'), ('Extreme Chess', '2014-06-01'))
```
Grouping results
The GROUP BY clause can be used to group all the selected rows that share the same values for a set of columns into a single row. After the GROUP BY clause is specified, you can group rows only at the partition key or clustering column level. Therefore, you can specify the names of the primary key columns as the parameters for the GROUP BY clause only in the order in which the columns are specified in the primary key. If a primary key column is limited due to the equality constraint, this column cannot be specified in the GROUP BY clause.

Note If the GROUP BY clause is specified, Lindorm CQL transfers data to a server node for processing. Therefore, the process may be time-consuming if a large amount of data is processed.
Ordering results
The ORDER BY clause allows you to specify the order in which the results are returned. You can specify column names and an order as the parameters for the ORDER BY clause. To specify the ascending order, you can use the ASC keyword. To specify the descending order, you can use the DESC keyword. The default order is ascending. The possible orders are limited by the CLUSTERING ORDER clause defined on the table.
- If no CLUSTERING ORDER clause is specified, the valid orders are the order that is defined by clustering columns and the reverse of that order.
- Otherwise, the valid orders are the order that is specified by the CLUSTERING ORDER clause and the reverse of that order.
Limiting results
The LIMIT clause in a SELECT statement is used to limit the number of rows that can be returned by a query.
Allowing filtering
By default, Lindorm CQL supports only SELECT queries for which no filtering clause is specified on the server side. These SELECT queries return result sets that contain all or partial real-time records. This is because the performance of these queries is predictable. The amount of time that is required to run these queries is proportional to the number of returned results. You can use the LIMIT clause to limit the number of returned results.

The ALLOW FILTERING clause can be explicitly used to run queries that contain filters. After you specify the ALLOW FILTERING clause in a query, the performance of the query may be unpredictable, even if the query selects only a small number of records from the total data stored in a cluster. The performance of the query is based on the total data volume.
For example, you can execute the following statements to create a table that records user profiles and create a secondary index on the birth year column. The user profiles contain the username, first name, last name, birth year, and residence country or region of each user.
```
CREATE TABLE users (
    username text PRIMARY KEY,
    firstname text,
    lastname text,
    birth_year int,
    country text
)

CREATE INDEX ON users(birth_year);
```
The following two queries are valid:
```
SELECT * FROM users;
SELECT * FROM users WHERE birth_year = 1981;
```
In the preceding two queries, Lindorm CQL ensures that the query performance is proportional to the number of returned results. In particular, if no users are born in 1981, the performance of the second query does not vary based on the number of user profiles that are stored in the database. No direct relationship exists between the query performance and the number of user profiles. Due to the implementation of the secondary index, the performance of the second query is still based on the number of nodes in the cluster. The volume of the data stored in the cluster is also an indirect factor on the performance of the second query. In practice, both queries may return large result sets. However, you can always use the LIMIT clause to limit the number of returned results. The following query is rejected:
```
SELECT * FROM users WHERE birth_year = 1981 AND country = 'FR';
```
This is because Lindorm CQL may still scan large amounts of data even if the query returns only a small number of results. In most cases, this query scans all indexed rows for users who were born in 1981 even if only a few users are from France. However, you can specify the ALLOW FILTERING clause to force the system to run the query. The following query is valid:
```
SELECT * FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING;
```
HOTDATA
You can separate hot data and cold data when you create tables. If you want to query hot data, you can specify the HOTDATA keyword in a SELECT statement. The following statement provides an example:
```
SELECT HOTDATA * FROM persioninfo;
```
If you do not specify the HOTDATA keyword for a query, the query scans both the hot and cold data. Then, the query returns the combined results.
Operations on search indexes
Lindorm CQL supports fuzzy queries, multi-dimensional queries, and sorting for columns on which search indexes are created.
- Common fuzzy matches
  You can create search indexes on columns. When you execute SELECT statements, you can implement fuzzy matching on these columns. You can use a percentage sign (%) to specify an arbitrary number of characters to perform fuzzy matches. Common fuzzy matches include prefix matches, suffix matches, contains matches, and exact matches. The following statements provide examples:
```
SELECT * FROM persioninfo WHERE c2 LIKE 'v2';// The exact match. The value of the c2 column is v2. 
SELECT * FROM kss.tb WHERE c2 LIKE '%v2' // The prefix match. The value of the c2 column is v2 or an arbitrary length of string that is suffixed with v2. 
SELECT * FROM kss.tb WHERE c2 LIKE 'v2%' // The suffix match. The value of the c2 column is v2 or an arbitrary length of string that is prefixed with v2. 
SELECT * FROM kss.tb WHERE c2 LIKE '%v2%'// The contains match. The value of the c2 column is v2 or an arbitrary length of string that contains v2. 
```
- search_query keyword
  You can specify the search_query keyword in queries. Lindorm CQL allows you to include a statement that is similar to a Solr query statement into a SELECT statement. This feature helps meet various requirements for queries. The following statement provides an example:
```
SELECT selectors
  FROM table
  WHERE (indexed_column_expression | search_query = 'search_expression') 
  [ LIMIT n ]
  [ ORDER BY column_name ] ;
```
  In a SELECT statement, the search_expression parameter is specified after the search_query keyword. For more information about the limits on the characters that can be specified in the search_expression parameter, see Local parameters in queries. You can specify the following common options after the search_query keyword:
```
{
  "q": query_expression (string), 
  "fq": filter_query_expression(s) (string_or_array_of_strings, ...),  
  "facet": facet_query_expression (object) 
  "sort": sort_expression (string), 
  "start": start_index(number),
  timeAllowed: search_time_limit_ms,  
  "TZ": zoneID),   //  Any valid zone ID in java TimeZone class                            
  "paging": "driver" (string),
  "distrib.singlePass": true|false (boolean),
  "shards.failover": true|false (boolean),    // Default: true
  "shards.tolerant": true|false (boolean),    // Default: false
  "commit": true|false (boolean),
  "route.partition": partition_routing_expression (array_of_strings), 
  "route.range": range_routing_expression (array_of_strings), 
  "query.name": query_name (string),
}
```
  The following statements provide an example on how to use search_query:
```
CREATE TABLE persioninfo (name text PRIMARY KEY, city text,id int, friend nested) WITH extensions = {'CONSISTENCY_TYPE':'strong', 'MUTABILITY':'IMMUTABLE'};
CREATE SEARCH INDEX sidx on persioninf WITH COLUMNS (id,city);
INSERT INTO persioninfo (name,city, id, friend) VALUES ('MY','hangzhou', 18, '[{"lili":18}, {"gg":20}]');
SELECT name,id FROM persioninfo  WHERE search_query = '+city:hangzhou +age:[1 TO 18]';
```
  Note Lindorm CQL does not allow you to use search_query clauses in combination with other query conditions. This is because search_query clauses already have rich semantics. For example, the select * from table where search_query = 'c1:name' And column = 'a' statement is not supported.

UPDATE

Updates data.

Syntax

update_statement ::=  UPDATE table_name
                      [ USING update_parameter ( AND update_parameter )* ]
                      SET assignment ( ',' assignment )*
                      WHERE where_clause
update_parameter ::=  ( integer | bind_marker )
assignment       ::=  simple_selection '=' term
                     | column_name '=' column_name ( '+' | '-' ) term
                     | column_name '=' list_literal '+' column_name
simple_selection ::=  column_name
                     | column_name '[' term ']'
                     | column_name '.' `field_name
condition        ::=  simple_selection operator term

Note

You can execute the UPDATE statement to update data in one or more columns for specified rows in a table. You can use the WHERE clause to specify the rows to update. The WHERE clause must contain all the columns that are specified in the primary key of the table. You can then use the SET keyword to specify non-primary key columns.
By default, the system does not check whether a row exists when you execute an UPDATE statement. This makes Lindorm CQL distinct from SQL. If the row exists, the data in the row is updated. If the row does not exist, the system creates the row. After the UPDATE statement is executed, you cannot check whether the data is updated or the row is created. Lindorm CQL allows counter columns to coexist with non-counter columns. This makes Lindorm CQL distinct from Apache Cassandra CQL.
When you execute the UPDATE statement, all updates within the same partition key are applied atomically and in isolation.
In the assignment section, you can specify an increment or decrement counter. For example, you can specify the following expression: c = c + 3. The column name that follows the equal sign (=) must be the same as the column name that precedes the equal sign (=). Take note that you can perform increment or decrement operations only on counter columns. You can update counter columns only by performing increment or decrement operations.

Examples

CREATE TABLE product (productname text PRIMARY KEY, company text, subdepartment text)
UPDATE product 
   SET company   = 'aliyun cloud',
       subdepartment = 'database nosql'
 WHERE productname = 'lindorm';

DELETE

Deletes a row or part of a row.

Syntax

delete_statement ::=  DELETE [ simple_selection ( ',' simple_selection ) ]
                      FROM table_name
                      [ USING update_parameter ( AND update_parameter )* ]
                      WHERE where_clause

Note

You can execute the DELETE statement to delete rows or columns. If you specify column names after the DELETE keyword, the system deletes only the specified columns from the rows that are specified by the WHERE clause. Otherwise, all of the specified rows are deleted.
A WHERE clause specifies the rows to be deleted. You can also specify the IN operator in a DELETE statement to delete multiple rows. You cannot delete rows that fall within a specified range.
When you execute the DELETE statement, all delete operations within the same partition key are applied atomically and in isolation.

Examples

 INSERT INTO t1 (c1,c2,c3) VALUES ( '11', '22', '33');
 DELETE c2 FROM t1 WHERE c1 IN ( '11');