Indexes play a vital role in speeding up queries. The wide table engine provided by ApsaraDB for Lindorm (Lindorm) supports secondary indexes and full-text indexes. Secondary indexes can improve query performance. Full-text indexes that are backboned by the Search Index technology are suitable for complex multidimensional queries. You can use full-text indexes if you want to perform fuzzy searches, aggregate and analyze data, sort query results, and perform paged queries.

Overview

The wide table engine provided by Lindorm allows you to query tables based on primary key columns. This makes queries efficient. If you want to query tables based on columns other than primary key columns, you can create only secondary indexes. If you want to create a secondary index, predefine a sequence for the columns to be indexed. This indicates that you cannot query data in an efficient way if the data does not follow the predefined sequence. If the number of query models increases, the number of required secondary indexes grows. This affects write throughput and increases storage costs. Lindorm uses inverted indexes that are backboned by Search Index to resolve this issue. You can create full-text indexes in the following scenarios:

  • Perform multidimensional queries. Lindorm aggregates data across columns and returns the query result in an efficient way.
  • Collect and analyze data. Lindorm provides basic capabilities for collecting and analyzing data of specified columns.
  • Sort query data. Lindorm supports the ORDER BY keyword. In this case, the returned result can be sorted based on specified columns.

Scenarios

If you want to store large volumes of data and query data based on a combination of conditions, use full-text indexes that are backboned by Search Index. For example, you can create full-text indexes for order data in logistics.

Data stream overview

Full-text indexes are backboned by Search Index. Full-text indexes are created based on the deep integration of the wide table engine and search engine. After you write data to a wide table in Lindorm, Lindorm Tunnel Service (LTS) that is built in Lindorm synchronizes data to the search engine in real time. LTS is formerly known as Big Data Hub Services (BDS). In this architecture, Lindorm allows you to independently deploy and manage the wide table engine, LTS, and the search engine. For example, if the search engine or LTS cannot meet your business needs, you can separately upgrade each module. The wide table engine, LTS, and the search engine are independently hosted on different servers. You can select servers of different specifications for each module. This ensures the system stability.

Comparison between full-text indexes and secondary indexes

Lindorm also supports secondary indexes. This allows you to query data based on columns other than primary key columns at low costs. If you want to query data from specified columns, use secondary indexes. If you want to perform complex multidimensional queries, we recommend that you use full-text indexes that are backboned by Search Index.

Data latency

A latency exists when you synchronize data in a wide table to the search engine. The following formula is used to calculate the latency:

Total latency = Latency for data synchronization + Required time for committing index data
  • Latency for data synchronization: In most cases, the latency for data synchronization is within 200 ms. You can scale out LTS to reduce the latency.
  • Required time for committing index data: The index data can be searchable only after it is committed. Default value: 15s. Minimum value: 1s. A smaller value may result in a larger size of compacted index data. This undermines query performance. Therefore, we recommend that you set a proper value based on tests. For more information, see Index data visibility.

Enable full-text indexes

After you enable the wide table engine, you can enable full-text indexes. Then, the search engine and LTS are automatically enabled for your Lindorm cluster. LTS is formerly known as BDS.

Usage notes

Full-text indexes that are backboned by Search Index provide the following benefits:
  • Full-text indexes that you create on tables are transparent to client servers. The user experience is similar to that of secondary indexes.
  • If you want to query data from a primary table, the system automatically compiles and optimizes an indexed table based on query conditions. Then, the system aggregates the data and returns the query result to your client server.
  • Full-text indexes can be created based on thousands of columns.
  • If you write data to a primary table, the data is automatically synchronized to the indexed table. The latency is within 200 ms.

You can use Cassandra Query Language (CQL) to query data based on full-text indexes. You can execute the following sample statements:

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.PlainTextAuthProvider;
import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Session;

import java.io.IOException;

public class Test {

    public static void main(String[] argv) throws IOException, InterruptedException {
        String[] contactPoints = new String[]{
                "The endpoint of the Lindorm cluster that you want to connect. This parameter value cannot contain the port number."
        };
        Cluster cluster = null;
        try {
            Cluster.builder().addContactPoints(contactPoints).withAuthProvider(new PlainTextAuthProvider("Username", "Password")).build();
            cluster.init();
            Session session = cluster.connect();
            // Create a keyspace. We recommend that you execute the CREATE KEYSPACE statement in lindorm-cqlsh.
            session.execute("CREATE KEYSPACE kss WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};");
            // Create a table. We recommend that you execute the CREATE TABLE statement in lindorm-cqlsh. If the Lindorm cluster has nodes that are running in multiple zones, the CONSISTENCY_TYPE and MUTABILITY parameters are required. The configuration in the following statement applies to most scenarios. If you have questions, contact technical support. If the nodes in the Lindorm cluster are deployed in a single zone, you do not need to specify CONSISTENCY_TYPE or MUTABILITY.
            session.execute("CREATE TABLE kss.tb ( cn1 text PRIMARY KEY , cn2 text , cn3 text )  WITH extensions = {'CONSISTENCY_TYPE':'strong', 'MUTABILITY':'MUTABLE_LATEST'};");

            // Create a search index. We recommend that you execute the CREATE SEARCH INDEX statement in lindorm-cqlsh. You can execute the following statement to create an index named tbidx. WITH COLUMNS (cn2, cn3) indicates that the cn2 and cn3 columns are indexed. If you use WITH COLUMNS (*) or do not add WITH COLUMNS to the statement, all the columns are indexed.
            session.execute("CREATE SEARCH INDEX tbidx ON kss.tb WITH COLUMNS (cn2, cn3);");

            // Rebuild the index. After you create the index, you must rebuild the index. You can rebuild the index in a synchronous way or use the ASYNC keyword to rebuild the index in an asynchronous way. You can rebuild the index after the index is created or before you query the indexed data.
            session.execute("REBUILD SEARCH INDEX ON kss.tb;"); //rebuild search index [ASYNC] on tablename;

            // Write data.
            session.execute("INSERT INTO kss.tb (cn1, cn2, cn3) VALUES ( 'v11', 'v12', 'v13');");
            session.execute("INSERT INTO kss.tb (cn1, cn2, cn3) VALUES ( 'v21', 'v22', 'v23');");
            session.execute("INSERT INTO kss.tb (cn1, cn2, cn3) VALUES ( 'v31', 'v32', 'v33');");

            // Query data.
            ResultSet resultSet = session.execute("select * from kss.tb");
            resultSet.forEach(System.out::println);             

            // Query data from the primary table.
            resultSet = session.execute("select * from kss.tb WHERE cn1 = 'v11'");
            resultSet.forEach(System.out::println);

            // Perform fuzzy searches.
            resultSet = session.execute("select * from kss.tb where cn2 like 'v2' ");//match
            resultSet.forEach(System.out::println);
            resultSet = session.execute("select * from kss.tb where cn2 like '%v2");// Perform a prefix fuzzy search. You can search for strings that contain a specific prefix, including variable-length strings and infinite strings. In this example, you can search for a string that contains only v2 and a string that uses the string v2 as the prefix.
            resultSet.forEach(System.out::println);
            resultSet = session.execute("select * from kss.tb where cn2 like 'v2%'");// Perform a suffix fuzzy search. You can search for strings that contain a specific suffix, including variable-length strings and infinite strings. In this example, you can search for a string that contains only v2 and a string that uses the string v2 as the suffix.
            resultSet.forEach(System.out::println);
            resultSet = session.execute("select * from kss.tb where cn2 like '%v2%'");// Perform a fuzzy search. You can search for strings that contain a specific string, including variable-length strings and infinite strings. In this example, you can search for a string that contains only v2 and a string that contains the string v2.
            resultSet.forEach(System.out::println);

            // Common operations
            resultSet = session.execute("select * from kss.tb where cn2 > 'v20' order by cn3 ");//
            resultSet.forEach(System.out::println);

            // Modify the search index. You can add or delete a column.
            session.execute("ALTER SEARCH INDEX SCHEMA ON kss.tb ADD FIELD cn1;");// You can add multiple columns at a time. For example, you can use ADD FIELD cn1, cn2, cn3 in the statement.
            session.execute("ALTER SEARCH INDEX SCHEMA ON kss.tb DROP FIELD cn1;");// You can delete multiple columns at a time. For example, you can use DROP FIELD cn1, cn2, cn3 in the statement.
            // After you modify the search index, execute the REBUILD SEARCH INDEX statement.

            session.execute("DROP SEARCH INDEX ON kss.tb");// If you want to delete the search index, execute the DROP SEARCH INDEX statement on tablename. In the statement, tablename indicates the name of the table from which the search index is deleted.

        } finally {
            if (cluster != null)
                cluster.close();
        }

    }

}
            
Note If you want to obtain more information about how to use CQL to query data based on full-text indexes, join a DingTalk group. For more information, see Expert support.

Limits

If you use CQL to query data based on full-text indexes, take note of the following limits:
  • You cannot use GROUP BY in a CQL statement.
  • If you want to perform a fuzzy search, use the LIKE operator only in conjunction with the percent sign (%). You can search for a string that starts or ends with a specific string, or search for a string that contains a specific string. You can also perform an exact match query.