Description and examples of DML syntax in Lindorm - Lindorm

This topic describes the syntax of Data Manipulation Language (DML) statements and provides examples.

SELECT

Run a query.

Syntax

select_statement ::=  SELECT [ JSON | HOTDATA ] ( select_clause | '*' )
                      FROM table_name
                      [ WHERE where_clause ]
                      [ GROUP BY group_by_clause ]
                      [ ORDER BY ordering_clause ]
                      [ PER PARTITION LIMIT (integer | bind_marker) ]
                      [ LIMIT (integer | bind_marker) ]
                      [ ALLOW FILTERING ]
select_clause    ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector         ::=  column_name
                      | term
                      | CAST '(' selector AS cql_type ')'
                      | function_name '(' [ selector ( ',' selector )* ] ')'
                      | COUNT '(' '*' ')'
where_clause     ::=  relation ( AND relation )*
relation         ::=  column_name operator term
                      '(' column_name ( ',' column_name )* ')' operator tuple_literal
                      TOKEN '(' column_name ( ',' column_name )* ')' operator term
operator         ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
group_by_clause  ::=  column_name ( ',' column_name )*
ordering_clause  ::=  column_name [ ASC | DESC ] ( ',' column_name [ ASC | DESC ] )*

Note

A SELECT statement reads one or more columns from one or more rows in a table and returns a result set of rows that match the query. Each row contains the selected values. You can also apply functions, including aggregate functions, to the results.
A SELECT statement must include at least one select clause and the name of the table to query. Note that Lindorm CQL does not currently support joins or subqueries, so SELECT statements apply only to a single table. Most SELECT statements also include a WHERE clause, and you can optionally add other clauses to sort or limit the results. If you include the ALLOW FILTERING clause, queries that require filtering are permitted.

Examples

CREATE TABLE persioninfo  (c1 int primary key, c2 text, c3 text);
SELECT c1, c2 FROM persioninfo WHERE c1 IN (199, 200, 207);
SELECT JSON c1, c2 FROM persioninfo WHERE c1 = 199;
SELECT c1 AS name, c2 AS address FROM persioninfo;


CREATE TABLE persioninfo  (c1 int , c2 text, c3 text, PRIMARY KEY(c1,c2));
SELECT *
FROM persioninfo
WHERE c1 = 12
  AND c2 > '2011-02-03'
  AND c2 <= '2012-01-01';

SELECT COUNT (*) AS user_count FROM persioninfo;

Select clause

The select clause determines which columns to query and return in the result set. It also specifies any transformations to apply to the results. The clause consists of a comma-separated list of selectors or a wildcard character (*) to select all columns defined in the table. The select clause includes the following components:

Selectors
The following types are available:
- column_name: The name of a column in the queried table, used to retrieve the column’s value.
- term: A term, typically nested within other selectors such as functions. If you select a term directly, the corresponding column in the result set contains only that term’s value for every returned row.
- CAST '(' selector AS cql_type ')': A cast operation that converts a nested selector to a compatible type.
- function_name '(' [ selector ( ',' selector )* ] ')': A function invocation where the arguments are selectors.
- COUNT '(' '*' ')': A special invocation of the COUNT function—COUNT(*)—that counts all non-empty results.
WRITETIME function
The select clause supports only one special function—the WRITETIME function—which is not allowed anywhere else. This function takes exactly one argument, which must be a column name. It retrieves the timestamp stored as metadata for that column, indicating when the column’s value was written.
WHERE clause
The WHERE clause specifies the rows to query. It consists of relations on columns that are part of the primary key (partition key or clustering columns) or have a secondary index defined on them. Not all relational queries are allowed. For example, non-equality relations on a partition key are unsupported (the IN operator is treated as an equality relation). For a given partition key, rows are sorted by the clustering columns, and relations on these columns are restricted to those that select a contiguous set of rows based on that sort order. For example, create a table named posts:
```
CREATE TABLE posts (
    userid text,
    blog_title text,
    posted_at timestamp,
    entry_title text,
    content text,
    category int,
    PRIMARY KEY (userid, blog_title, posted_at)
);
```
You can run the following query:
```
 SELECT entry_title, content FROM posts
 WHERE userid = 'john doe'
   AND blog_title='John''s Blog'
   AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31';
```
However, the following query is not allowed because it does not select a contiguous set of rows. This assumes that no secondary index is set on the table.
```
SELECT entry_title, content FROM posts
 WHERE userid = 'john doe'
   AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31';
```
Additionally, the IN relation is allowed only on the last column of the partition key and the last column of the primary key. You can also use tuple notation to group clustering columns. For example:
```
SELECT * FROM posts
 WHERE userid = 'john doe'
   AND (blog_title, posted_at) > ('John''s Blog', '2012-01-01');
```
This query requests all rows that come after `blog_title` = 'John''s Blog' and `posted_at` = '2012-01-01' in the clustering order. Specifically, a row is returned if its `blog_title` is greater than 'John''s Blog', even if its `posted_at` is less than or equal to '2012-01-01'. This is different from the following query:
```
SELECT * FROM posts
 WHERE userid = 'john doe'
   AND blog_title > 'John''s Blog'
   AND posted_at > '2012-01-01';
```
Tuple notation can also be used in the IN clause for clustering columns:
```
SELECT * FROM posts
 WHERE userid = 'john doe'
   AND (blog_title, posted_at) IN (('John''s Blog', '2012-01-01'), ('Extreme Chess', '2014-06-01'));
```
Grouping results
The GROUP BY option condenses all selected rows that share identical values for a specified set of columns into a single row. With GROUP BY, you can group rows only at the partition key level or the clustering column level. Therefore, GROUP BY accepts only primary key column names as parameters, in primary key order. If a primary key column is constrained by an equality condition, it does not need to appear in the GROUP BY clause.
Note
Because the CQL GROUP BY clause pulls data to a single server node for processing, the operation can be time-consuming if the data volume is large.
Ordering results
The ORDER BY clause lets you choose the order of returned results. It takes a list of column names and a sort direction as parameters (ASC for ascending, DESC for descending; unspecified defaults to ASC). The allowed sort orders are constrained by the clustering order defined on the table:
- If the table was created without a specific CLUSTERING ORDER, the allowed sort orders are the natural order of the clustering columns or its reverse.
- Otherwise, the allowed sort orders are the order specified in the CLUSTERING ORDER option or its reverse.
Limiting results
The LIMIT option of a SELECT statement restricts the number of rows that the query returns.
Allowing filtering
By default, Lindorm CQL allows only select queries that do not involve server-side data filtering. This means that all records read are guaranteed to be returned in the result set, either fully or partially. These non-filtering queries have predictable performance because they can be completed in a time proportional to the volume of data returned by the query, which you can control with LIMIT.
The ALLOW FILTERING option lets you explicitly run certain queries that require filtering. Note that a query that uses ALLOW FILTERING may cause unpredictable server-side performance issues. Even a query that selects a small number of records can perform poorly, depending on the total amount of data stored in the cluster.
For example, create the following table, which contains user profiles with their birth year (with a secondary index) and country of residence:
```
CREATE TABLE users (
    username text PRIMARY KEY,
    firstname text,
    lastname text,
    birth_year int,
    country text
);

CREATE INDEX ON users(birth_year);
```
You can run the following valid queries:
```
SELECT * FROM users;
SELECT * FROM users WHERE birth_year = 1981;
```
In both cases, Lindorm CQL guarantees that the performance of these queries is proportional to the amount of data returned. Specifically, if no users were born in 1981, the performance of the second query does not directly depend on the number of user profiles stored in the database. However, due to how secondary indexes are implemented, the query might still depend on the number of nodes in the cluster and indirectly on the amount of data stored in the cluster. In practice, both queries can return very large result sets. However, you can always control the amount of data returned by adding a LIMIT. The following query is rejected:
```
SELECT * FROM users WHERE birth_year = 1981 AND country = 'FR';
```
This is because Lindorm CQL cannot guarantee that it will not scan a large amount of data, even if the query result is small. It typically scans the index entries for all users born in 1981, even if only a few are from France. However, you can use ALLOW FILTERING to force the execution of this query. Therefore, the following query is valid:
```
SELECT * FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING;
```
HOTDATA
In scenarios where you specify hot and cold data separation when you create a table, you can specify HOTDATA in the SELECT statement to query only hot data:
```
SELECT HOTDATA * FROM persioninfo;
```
If you do not specify this option, the query searches both hot and cold data and then merges the results.

SEARCH INDEX SELECT operation

Lindorm supports fuzzy match queries, multi-dimensional queries, and sorting for columns that have a SEARCH INDEX.

Common fuzzy matches

When you perform a SELECT operation on columns with a SEARCH INDEX, you can use fuzzy matching. The wildcard character % represents zero or more characters. Common fuzzy match types include prefix match, suffix match, prefix and suffix match, and exact match. The following examples show how to use these match types:

SELECT * FROM persioninfo WHERE c2 LIKE 'v2';// Exact match. The value in column c2 is 'v2'.
SELECT * FROM kss.tb WHERE c2 LIKE '%v2'; // Suffix match. Matches values that end with 'v2'.
SELECT * FROM kss.tb WHERE c2 LIKE 'v2%' ;// Prefix match. Matches values that start with 'v2'.
SELECT * FROM kss.tb WHERE c2 LIKE '%v2%';// Contains match. Matches values that contain 'v2'.

Specify a search_query

You can use the search_query keyword to run any native Solr query statement within a SELECT statement. This supports more complex query and retrieval needs. The following example shows the syntax:

SELECT selectors
  FROM table
  WHERE (indexed_column_expression | search_query = 'search_expression') 
  [ LIMIT n ]
  [ ORDER BY column_name ] ;

In a SELECT statement, the search_query keyword is followed by a search expression. For more information about the character restrictions for the search expression, see Local Parameters in Queries. The following options are commonly used with search_query:

{
  "q": query_expression (string), 
  "fq": filter_query_expression(s) (string_or_array_of_strings, ...),  
  "facet": facet_query_expression (object) 
  "sort": sort_expression (string), 
  "start": start_index(number),
  timeAllowed: search_time_limit_ms,  
  "TZ": zoneID),   //  Any valid zone ID in java TimeZone class                            
  "paging": "driver" (string),
  "distrib.singlePass": true|false (boolean),
  "shards.failover": true|false (boolean),    // Default: true
  "shards.tolerant": true|false (boolean),    // Default: false
  "commit": true|false (boolean),
  "route.partition": partition_routing_expression (array_of_strings), 
  "route.range": range_routing_expression (array_of_strings), 
  "query.name": query_name (string),
}

An example follows:

CREATE TABLE persioninfo (name text PRIMARY KEY, city text,id int) WITH extensions = {'CONSISTENCY_TYPE':'strong', 'MUTABILITY':'IMMUTABLE'};
CREATE SEARCH INDEX sidx on persioninfo WITH COLUMNS (id,city);
INSERT INTO persioninfo (name,city, id) VALUES ('MY','hangzhou', 18);
SELECT name,id FROM persioninfo  WHERE search_query = 'city:hangzhou';

Note

Because search_query supports rich semantics, you cannot currently combine a search_query clause with other conditions. For example, select * from table where search_query = 'c1:name' AND column = 'a' is not supported.

UPDATE

Update the data.

Syntax

update_statement ::=  UPDATE table_name
                      [ USING update_parameter ( AND update_parameter )* ]
                      SET assignment ( ',' assignment )*
                      WHERE where_clause
update_parameter ::=  ( integer | bind_marker )
assignment       ::=  simple_selection '=' term
                     | column_name '=' column_name ( '+' | '-' ) term
                     | column_name '=' list_literal '+' column_name
simple_selection ::=  column_name
                     | column_name '[' term ']'
                     | column_name '.' `field_name
condition        ::=  simple_selection operator term

Note

An UPDATE statement updates one or more columns for specified rows in a table. The WHERE clause selects the rows to update and must include all columns that form the PRIMARY KEY. Use the SET keyword to assign values to non-primary-key columns.
Unlike in standard SQL, an UPDATE statement does not check whether a row exists before the operation. If the row does not exist, a new row is created. Otherwise, the existing row is updated. Therefore, you cannot determine whether a row was created or updated. In Lindorm CQL, counter columns can coexist with non-counter columns. This behavior differs from that of Apache Cassandra CQL.
In an UPDATE statement, all updates for the same partition key are atomic and isolated.
For an assignment, use expressions like c = c + 3 to increment or decrement a counter. The column name after the equals (=) sign must match the one before it. Note that increment and decrement operations are allowed only on counter columns and are the only permitted update operations for counters.

Example

CREATE TABLE product (productname text PRIMARY KEY, company text, subdepartment text);
UPDATE product 
   SET company   = 'aliyun cloud',
       subdepartment = 'database nosql'
 WHERE productname = 'lindorm';

DELETE

You can delete a line or part of a line.

Syntax

delete_statement ::=  DELETE [ simple_selection ( ',' simple_selection ) ]
                      FROM table_name
                      [ USING update_parameter ( AND update_parameter )* ]
                      WHERE where_clause

Note

The DELETE statement removes columns or entire rows. If you specify column names immediately after the DELETE keyword, only those columns are deleted from the rows indicated by the WHERE clause. Otherwise, entire rows are deleted.
The WHERE clause specifies the rows to delete. You can use the IN operator to delete multiple rows in a single statement. Deleting a range of rows is not currently supported.
In a DELETE statement, all delete operations affecting rows with the same partition key are applied atomically and in isolation.

Example

 INSERT INTO t1 (c1,c2,c3) VALUES ( '11', '22', '33');
 DELETE c2 FROM t1 WHERE c1 IN ( '11');