When you create full-text indexes in AnalyticDB for MySQL, you can use entities and stop words to make the segmentation results more suitable to your actual business scenarios. AnalyticDB for MySQL provides custom dictionaries to apply entities and stop words.
Prerequisites
To create or update a custom dictionary, you must have DDL and DML permissions on the custom dictionary.
To use a custom dictionary, you must have the SELECT permission on the custom dictionary.
Overview
A custom dictionary is a table. You can perform read and write operations on custom dictionaries in the same manner as you do on tables. Data that is written to custom dictionaries takes effect in real time.
Limits
You cannot perform DDL changes on custom dictionaries.
You cannot perform UPDATE or TRUNCATE operations on custom dictionaries.
Custom dictionaries must be used together with full-text indexes.
Before you delete a full-text index, you must delete the corresponding custom dictionary.
You can create only a single custom dictionary for a cluster. To create multiple custom dictionaries, contact technical support.
By default, a custom dictionary allows a maximum of 10,000 entries to be inserted. To change this upper limit, contact technical support.
Create a custom dictionary
Syntax
To create a custom dictionary in AnalyticDB for MySQL, use the following syntax:
CREATE TABLE [IF NOT EXISTS] table_name (
`value` VARCHAR NOT NULL COMMENT 'column_comment',
`type` VARCHAR NOT NULL [DEFAULT 'main|stop' COMMENT 'column_comment'],
PRIMARY KEY (`value`,`type`)
) COMMENT 'table_comment'
FULLTEXT_DICT = 'Y',
INDEX_ALL = 'Y|N'
Constraints
A custom dictionary can contain only two fields: value and type.
The value and type fields are of the VARCHAR type and cannot be empty.
The primary key of a dictionary must contain both the value and type fields.
Parameters
table_name: the name of the table. The name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The name must start with a letter or an underscore (_).
value: the field that stores the content of the word. The field name is fixed to value.
type: the field that stores the type of the word. The field name is fixed to type. Valid values:
main (default): identifies the entities that have specific meanings in natural language text. Entities that are added to custom dictionaries are not segmented. Examples: names of places and organizations in the general field, and brands, products, and models in the e-commerce field.
stop: filters out unnecessary words. Example: the
from
words in SQL audit log scenarios.ImportantStop words are supported only for AnalyticDB for MySQL clusters of V3.1.4.24 or later.
To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.
COMMENT: the comment of the column or table.
FULLTEXT_DICT: specifies whether the table is a dictionary table. Set the value to
Y
.INDEX_ALL: specifies whether to create indexes for all columns. Valid values:
Y
: creates indexes for all columns. If the table engine is XUANWU, the default value is Y. We recommend that you use the default value.N
: creates indexes only for the primary key. If the table engine is XUANWU_V2, the default value is N. You must explicitly set this parameter toY
. Otherwise, an error occurs.NoteFor AnalyticDB for MySQL clusters earlier than V3.2.2.0, the table engine is XUANWU.
For AnalyticDB for MySQL clusters of V3.2.2.0 or later, you can execute the
SHOW ADB_CONFIG KEY=RC_DDL_ENGINE_REWRITE_XUANWUV2;
statement to determine the engine type based on the value of theRC_DDL_ENGINE_REWRITE_XUANWUV2
parameter. If the value istrue
, the default table engine is XUANWU_V2. If the value isfalse
and you do not explicitly set the table engine to XUANWU_V2, the default table engine is XUANWU.
Example
Create a custom dictionary table named tbl_dict_name
.
CREATE TABLE tbl_dict_name (
`value` VARCHAR NOT NULL COMMENT 'value of an entity or a stop word',
`type` VARCHAR NOT NULL [DEFAULT 'main' COMMENT 'main indicates an entity, and stop indicates a stop word. Stop words are supported only as of V3.1.4.24.'],
PRIMARY KEY (`value`,`type`)
) COMMENT='your dictionary table'
FULLTEXT_DICT = 'Y',
INDEX_ALL = 'Y';
Update a custom dictionary
The INSERT or DELETE operation performed on a custom dictionary takes effect immediately. The newly written data of a table is segmented by using the latest words in the custom dictionary.
Examples
Insert a stop word named
and
.INSERT INTO `tbl_dict_name` (`value`, `type`) VALUES ('and', 'stop');
Remove a word.
DELETE FROM `tbl_dict_name` WHERE `value` = 'and' AND `type` = 'stop';
Use a custom dictionary
Syntax
To specify a custom dictionary when you create a full-text index in a table by executing the CREATE TABLE or ALTER TABLE statement, use the following syntax:
FULLTEXT INDEX idx_name(`column_name`) [ WITH ANALYZER analyzer_name ] [ WITH DICT tbl_dict_name];
Parameters
idx_name: the name of the full-text index.
column_name: the name of the column on which to create the full-text index.
WITH ANALYZER analyzer_name: the analyzer. AnalyticDB for MySQL supports a variety of analyzers. For more information, see Analyzers for full-text indexes.
WITH DICT tbl_dict_name: the custom dictionary.
Example
Create a full-text index named fidx_c
on a VARCHAR-type column named content
in the tbl_fulltext_demo
table. Use the built-in AliNLP analyzer and the predefined custom dictionary named tbl_ext_dict
.
ALTER TABLE `tbl_fulltext_demo`
ADD FULLTEXT INDEX fidx_c(`content`)
WITH ANALYZER alinlp
WITH DICT `tbl_ext_dict`;