Overview
Word analysis is a fundamental component of a search engine, and its effectiveness directly impacts search results. Because business scenarios vary, different industries and customers have unique requirements. Custom, application-level word analysis is crucial for achieving the best search performance.
The tailored retrieval feature addresses this need. OpenSearch Industry Algorithm Edition provides a rich set of industry-specific analyzers. You can use these analyzers as a base to train your own custom analyzer with simple configuration. This process requires no additional data integration. During training, the tailored retrieval model automatically extracts and adapts to your existing data.
Charges for a tailored retrieval model are based on storage capacity, computing resources, and model training. For pricing details, see the Billing Overview.
Quick start
To create and use a tailored retrieval model, follow these three steps:
-
Create and train a model.
-
Create a custom analyzer.
-
Configure the custom analyzer.
Create and train a model
-
Navigate to Search Algorithm Center > Retrieval Configuration > Tailored Retrieval Models. Select the target exclusive application and click Create.
-
Enter a Model Name, select a Model Type, a Basic Analyzer, and the Training Fields. Check the desired Normalization settings, and then click Submit.
The available basic analyzers include: Chinese - General Analysis, Chinese - E-commerce Analysis, IT Content Analysis, Industry - General Gaming Analysis, Industry - Education Q&A Search, Industry - IT Content Analysis, and Industry - General E-commerce Analysis.
For Normalization, you can select one or more of the following options: Uppercase to Lowercase, Traditional to Simplified Chinese, and Full-width to Half-width Characters. These settings apply only to queries and do not affect the original field content.
-
The model name cannot be changed after creation.
-
Training Fields supports only the
short_textandtextdata types.
-
After creating a model, its status defaults to Unavailable. On the Tailored Retrieval Models list page, find the new model and click Train Model in the Actions column.
-
Model training typically takes one to two business days to complete.
-
You can retrain a model. Each completed training adds a new, sequentially numbered version to the Training History section on the details page.
Create a custom analyzer
After the tailored retrieval model is successfully trained and its status becomes Available, you can create a custom analyzer.
-
Go to the Search Algorithm Center > Analyzer Management page. Select the Text Analyzer tab and click Create.
-
Enter a name, select Tailored Model Analyzer as the analyzer type, choose the corresponding HA3 engine instance and tailored retrieval model, and click Save.
-
After the custom analyzer is created, you can use it to test word analysis and access features such as Entry Management.
Configure the custom analyzer
After the custom analyzer is created, you can apply it to an index by performing an offline change.
-
Go to Instance Management > HA3 Engine. Find the target application, go to its details page, and click Modify Offline Application.
-
On the page for configuring the index schema, find the target index, replace its analyzer with the custom analyzer configured with the tailored retrieval model, and select the model version you want to apply.
In the Analysis Method column for the index, select Analyzer Model from the drop-down menu. In the sub-menu that appears, select the target model name and its version number.
-
Complete the offline change and wait for the index to rebuild.
The status area on the Offline Application tab shows Application Initializing, indicating that an index rebuild is in progress. Wait for this process to complete.
-
After the index rebuild is complete, you can test the results on the Search Test page.
On the Search Test page, select the HA3 Engine and your application. To verify the retrieval results of the custom analyzer, enter vulcan_analyzer_2:'2' in the query box, set the config clause to start:0,hit:10,format:fulljson, and click Search.
Details page
Tailored retrieval models list page
The Tailored Retrieval Models list includes the Model Name, Model Type, Model Status (Available or Unavailable), Start Time of the Last Training, Status of the Latest Version, and Actions (Details, Train Model, Delete).
-
You cannot delete a tailored retrieval model that is referenced by an index.
-
If the status of the latest version is Training, the Train Model button is disabled. You can click Train Model again when the model is in any other status.
Tailored retrieval model details page
Basic Information (read-only): Includes the Creation Time, Model Status, Start Time of the Last Training, and Status of the Latest Version.
Configuration Information (read-only): Includes the Basic Analyzer, Training Fields, and normalization settings selected during model creation.
Training History: Includes the Model Version, Configuration Information, Version Status, Training Start and End Times, and the Referenced Index. You can also test the model's performance from this section.
On the test page, enter text in the Test Text box (for example, roasted lamb chops) and click the Test button. The Analysis Result area displays the word analysis results as tags (for example, the text is split into "roasted" and "lamb chops").
You can download a comparison report of typical use cases to evaluate performance.
Limitations
-
This feature is available only for applications in Industry Algorithm Edition - Dedicated Cluster instances.
-
You can create a maximum of five tailored retrieval models per instance.
-
A tailored retrieval model created for a specific application cannot be used by other applications.
-
Currently, you can only create custom analyzers for text analysis.