When you create full-text indexes in AnalyticDB for MySQL, you can use entities and stop words to make the segmentation results more suitable to your actual business scenarios. AnalyticDB for MySQL provides custom dictionaries to apply entities and stop words.
- To create or update a custom dictionary, you must have DDL and DML permissions on the custom dictionary.
- To use a custom dictionary, you must have the SELECT permission on the custom dictionary.
A custom dictionary is a table. You can perform read and write operations on custom dictionaries in the same manner as you do on tables. Data that is written to custom dictionaries takes effect in real time.
- You cannot perform DDL changes on custom dictionaries.
- You cannot perform UPDATE or TRUNCATE operations on custom dictionaries.
- Custom dictionaries must be used together with full-text indexes.
- Before you delete a full-text index, you must delete the corresponding custom dictionary.
- You can create only a single custom dictionary for a cluster. To create multiple custom dictionaries, contact technical support.
- By default, a custom dictionary allows a maximum of 10,000 entries to be inserted. To change this upper limit, contact technical support.
Create a custom dictionary
To create a custom dictionary in AnalyticDB for MySQL, use the following syntax:
CREATE TABLE tbl_dict_name ( `value` varchar(255) NOT NULL COMMENT 'Value of an entity or stop word', `type` varchar(4) NOT NULL [DEFAULT 'main' COMMENT 'main indicates an entity, and stop indicates a stop word. Stop words are supported as of V126.96.36.199.'], PRIMARY KEY (`value`,`type`) ) COMMENT='Your dictionary table' FULLTEXT_DICT = 'Y';
- A custom dictionary can contain only two fields: value and type.
- The value and type fields are of the VARCHAR type and cannot be empty.
- The primary key of a dictionary must contain both the value and type fields.
- value: the value of the word.
- type: the type of the word, which can be main or stop. main indicates an entity, and stop indicates a stop word.
- main: identifies the entities that have specific meanings in natural language text. Entities that are added to custom dictionaries are not segmented. Examples: names of places and organizations in the general field, and brands, products, and models in the e-commerce field.
- stop: filters out unnecessary words. In some scenarios, specific words after segmentation must be filtered out. Example:
fromwords in SQL audit log scenarios.Important Stop words are supported only for minor engine version 188.8.131.52 or later. For more information about how to view the minor engine version of a cluster, see How can I view the version of an AnalyticDB for MySQL cluster?
- FULLTEXT_DICT: specifies whether the table is a dictionary table.
Update a custom dictionary
The INSERT or DELETE operation performed on a custom dictionary takes effect immediately. The newly written data of a table is segmented by using the latest words in the custom dictionary.
- Insert a stop word named
INSERT INTO `tbl_dict_name` (`value`, `type`) VALUES ('and', 'stop');
- Remove a word.
DELETE FROM `tbl_dict_name` WHERE `value` = 'and' AND `type` = 'stop';
Use a custom dictionary
To specify a custom dictionary when you create a full-text index in a table by executing the CREATE TABLE or ALTER TABLE statement, use the following syntax:
FULLTEXT INDEX idx_name(`column_name`) [ WITH ANALYZER analyzer_name ] [ WITH DICT tbl_dict_name];
- idx_name: the name of the full-text index.
- column_name: the name of the column on which to create the full-text index.
- WITH ANALYZER analyzer_name: specifies the analyzer. AnalyticDB for MySQL supports a variety of analyzers. For more information, see Analyzers for full-text indexes.
- WITH DICT tbl_dict_name: specifies the custom dictionary.
Create a full-text index named
fidx_con a VARCHAR-type column named
tbl_fulltext_demotable. Use the built-in AliNLP analyzer and the predefined custom dictionary named
ALTER TABLE `tbl_fulltext_demo` ADD FULLTEXT INDEX fidx_c(`content`) WITH ANALYZER alinlp WITH DICT `tbl_ext_dict`;