All Products
Search
Document Center

Search for test questions

Last Updated: Sep 09, 2021

Characteristics of searching for test questions in online education scenarios

  1. A massive number of test questions may exist in a question library. The number of the questions is continuously increasing. This causes high pressure on the databases.

  2. Most search behaviors occur during peak hours. The number of concurrent searches is large. In this case, search results may be returned at a high latency, which affects user experience.

  3. Different stages of learning are covered. More and more user scenarios are involved.

  4. Subject disciplines are classified into various categories. Data becomes more and more complex. Therefore, interdisciplinary errors may occur for search queries.

  5. Powerful algorithms are required to improve search accuracy.

  6. Multimodal searching capabilities are required to meet the requirements for image and text searching.

  7. Multilingual processing capabilities are required to process search queries in multiple languages such as English.

Best practices of OpenSearch in education industries

Exclusive analyzer for queries for test questions

  1. Query processing flowchart

1
2. Understanding of query semantics
An analyzer is the most basic module that affects the search effect. OpenSearch integrates with an exclusive analyzer for queries for test questions. In addition, you can upload your own query terms to create a custom analyzer.
  • Examples

Query

What is the area of the following triangle in square centimetres?

Spelling correction

What is the area of the following triangle in square centimeters?

Discipline category prediction

Mathematics

Analysis

What is the area of the following triangle in square centimeters?

Term weight analysis

4 1 1 7 1 1 1 7 1 7 7 1

Synonym rewriting

square centimeters -> (cm ^ 2)

Text vectorization

-0.100582,-0.0540699,-0.0417337,0.0602,...

3. Category prediction

What is category prediction?

After you enter a search query, multiple commodities are found. The system calculates the relevance between the search query and the category of each commodity. Provided that the relevance is referenced in the corresponding sort expression, the higher the relevance, the higher the sort score of the commodity. In this case, the commodity ranks higher.

Application of category prediction in online education scenarios

  • Predict the discipline and question type to which a test question belongs based on the image information in the query and the result of optical character recognition (OCR).

  • Predict the types of the fields such as the question description and options.

2

4. Term weight analysis

Description: The term weight analysis feature evaluates the importance of each term in search queries and quantifies the evaluated importance as a weight. OpenSearch may not use low-importance terms to retrieve documents. This helps increase the number of documents that are retrieved. If the search queries that you entered contain low-importance terms and these terms are involved in the document retrieval process, only a small number of documents may be retrieved based on the search queries.

Purposes: Remove low-importance terms from a query, rewrite a query, and analyze text relevance.

1. Generate training data based on user behavior.

2. Train the term weight analysis model.

  • The sequence labeling model.

  • The prediction label (7,4,1). The higher the score, the more important the term, and the more accurate the retrieved result is.

  • Examples:

query

The factors of 35 are ( ) and the multiples of 24 within 100 are ( ).

Corresponding term weight scores

1 7 1 4 1 1 1 1 1 7 1 4 1 1 1 1 1

In this question, the weight scores of "factors" and "multiples" are 7 points, which are the highest. OpenSearch preferentially uses "factors" and "multiples" to retrieve documents. The weight scores of "35" and "24" are 4 points. The weight scores of other elements in the question are 1 point. OpenSearch does not use those elements whose weight scores are 1 point to retrieve documents.

5. Query rewriting

To meet different business requirements, OpenSearch allows you to perform multiple interventions at a time such as the use of intervention dictionaries, spelling correction, synonyms, and term weight analysis.

  • Examples

(1) The OCR feature may identify some non-question elements, which interfere with the results of query analysis. In this case, you can use term weight analysis to ensure that non-question element fields are labelled with low weight. This can improve the retrieval and sort effects.

(2) You can create an intervention dictionary for synonym configuration to expand the retrieval scope. For example, if a query contains cubic meters, you can add tons as the synonym of cubic meters.

Custom sort

OpenSearch supports rough sort and fine sort. Rough sort is the process of selecting the top N high-quality documents from all documents that are retrieved. Then, the top N high-quality documents are scored and sorted in the fine sort process. This way, you can obtain the documents that best match your requirements. To implement a finer-grained sorting effect, you can write sort expressions and use them for applications to control the sorting of search results.

Effect comparison

An online education platform offers K12 education solutions. The platform has tens of millions of users. Their question libraries include about 80 million test questions and the questions are continuously increasing. The question libraries consist of two parts: their own question libraries and third-party question libraries. Before the platform uses OpenSearch, the platform implements the photo search feature by using the OCR feature and their own Elasticsearch-based search service. However, the platform faces many problems such as low accuracy of search results and high search latency.

After they use OpenSearch to implement their search feature:

  1. The absolute value of search accuracy is increased by 5%.

  2. The search latency is reduced. The original latency ranges from 100 ms to 300 ms, which is reduced to a stable time of 50 ms.

  3. Data can be synchronized from an offline application to OpenSearch at a throughput greater than 4,000 transactions per second (TPS).

  • Sample query: "Zhang Huiyan says that the style of Song poetry in the Song dynasty is probably similar to Yuefu."

Results that are retrieved before OpenSearch is used

Results that are retrieved after OpenSearch is used

top1

Zhang Hui is a solo singer of a song and dance troupe. Her wage is RMB 5,800 per month. In June 2006, Zhang Hui participated in three performances of the troupe in Shanghai and received a reward of RMB 3,800...

top2

Zhang Huiyan's love for music comes from...

top3

Among the following documents, which one is the document that is cited in an article published by Ms. Zhang Hui in music periodicals of China?

  • Sample query: "The following figure shows different flat patterns of a geometrical body from three different views. The geometrical body consists of some identical small cubes. ___ identical small cubes are required to build the geometrical body. From left to right, the flat patterns of the geometrical body are from the front view, left-side view, and overhead view."

Results that are retrieved before OpenSearch is used

Results that are retrieved after OpenSearch is used

top1

34

top2

56

top3

78