All Products
Search
Document Center

OpenSearch:Content community

Last Updated:Nov 15, 2024

Community content typically includes user generated content (UGC) and professionally generated content (PGC). Due to the diversity of keywords and content and the uneven degree of standardization for word usage, search engines need to implement intelligent semantic analysis of keywords and content to identify the real intention of searches. This way, search engines can provide the most comprehensive and relevant results. This topic describes how to improve search experience for users and increase the business conversion rate based on the application of OpenSearch Industry-specific Enhanced Edition for Content Community in community and forum scenarios. This topic also describes the differences between General-purpose Edition and Industry-specific Enhanced Edition for Content Community, the vector-based retrieval click-through rate (CTR) prediction model, and the performance of custom searches.

The core of a community is users. Users join communities to consume content. The content can be generalized images and text, audio, and videos. The solutions to problems are also a type of content that can be obtained from communities. High-quality content can improve the activeness of users, bring traffic and visit duration to a platform, and help the platform attract more users and increase user stickiness. This brings more business opportunities and benefits to the platform.

Search is the most effective way to directly obtain content from communities. Each community is constantly solving the following problems that affect search experience:

  • How to accurately identify the intention of searches and return the most relevant results

  • How to develop differentiated and custom content distribution methods to improve search experience for users and enhance the sense of belonging and loyalty to the community

  • How to achieve interaction and connection between different fields, vertical categories, and channels in the community during content retrieval

  • How to achieve better integration and development of non-commercial content and commercial content

This topic describes the features and challenges of searches in the content industry. This topic also describes the solutions of OpenSearch Industry-specific Enhanced Edition for Content Community and provides best practices.

Search requirements of the content industry

»More opportunities for exposure: low zero-result rate. »Higher search quality: high relevance and high-quality sort. »More business characteristics: custom search results based on business characteristics. »More comprehensive supporting features: supporting features such as intelligent error correction, top searches, hints, and drop-down suggestions. »Lower cost of ownership: lower development, resource, and O&M costs compared with self-managed search engines. »Easier development and usage: short release cycle and simplified search engine development and performance optimization.

»Directional search intention of users: comprehensive search that aggregates content from multiple channels, and more relevant search results.

For example, a forum community provides services on multiple platforms, including web pages, applications, and mini programs. In addition, the forum community classifies content into multiple channels. As business grows, the traffic of comprehensive search on the homepage keeps increasing. Various paid services and traffic redirection services are integrated. The operations of search traffic become more and more important. Comprehensive search must be integrated with multiple channels. At the same time, more relevant search results are required. In addition to text relevance, more commercial factors need to be considered. Mature search engines have systems such as offline modules, online modules, query identification services, and algorithm platforms. This requires extensive development and algorithm tuning in addition to continuous and complex O&M. Due to limited manpower, self-managed search systems are difficult to meet business requirements.

Common search scenarios

  • Search for content such as blog posts, Q&A, and experience sharing

  • Discover high-quality content and hot posts

  • Redirect traffic to paid services

  • Filter content based on tags and categories

(The preceding figure is a screenshot of the Alibaba Cloud Developer Community.)

  • Hot activities and topics

  • PGC and UGC

  • Search guides such as hot searches, hints, and drop-down suggestions

  • Personalized content and real-time events

(The preceding figure is a screenshot of the Alibaba Cloud Developer Community.)

OpenSearch Industry-specific Enhanced Edition for Content Community

Architecture

Features

To resolve the tough issues and satisfy the requirements in different content search scenarios of the content industry, OpenSearch provides Industry-specific Enhanced Edition for Content Community that adopts the latest algorithms. This edition provides content-specific capabilities, such as intelligent semantic understanding, vector-based retrieval, and sort algorithms. These capabilities ensure high search performance and accuracy for the content industry. This edition resolves the tough issues such as the long search latency, high resource consumption, and high zero-result rate. The long search latency is caused by a large amount of dictionary data. To improve search accuracy for the content industry, OpenSearch provides a vector model to implement vector-based retrieval and multi-way search. In addition, OpenSearch provides a multimodal search solution.

1. Feature comparison

Feature

General-purpose Edition

Industry-specific Enhanced Edition for Content Community

One-stop configuration

After an application is created, you must create and configure query analysis rules, sort policies, and drop-down suggestion models.

This edition provides the capabilities and features that are required for common search scenarios in the content industry. It also provides templates for you to configure application schemas and index schemas with a few clicks. This way, new users can use this service with ease.

Query analysis

This edition provides query analysis capabilities for general purpose. The capabilities include synonym configuration, stop word filtering, spelling correction, term weight analysis, and category prediction.

This edition provides enhanced analyzers and the enhanced query analysis feature for the content industry. This edition is developed to satisfy the requirements in content search scenarios and resolve the tough issues in the content industry. This edition can create indexes that are more precise and better identify user needs. Therefore, this edition has better performance than General-purpose Edition.

Sort policy

After an application is created, you must configure sort policies and perform debugging based on business scenarios.

In addition to templates for application schemas and index schemas, this edition also provides common sort expressions. The templates and expressions can satisfy the sort requirements in most search scenarios of the content industry.

Feature iteration

This edition regularly updates built-in dictionaries, such as the dictionaries for the analyzer and query analysis features.

This edition is constantly updated in line with the changes in words and services in the content industry. It keeps optimizing the existing features of analysis and query analysis to adapt to industrial changes.

2. Comparison of query analysis performance

Compared with General-purpose Edition, Industry-specific Enhanced Edition provides better query analysis performance. For example, Industry-specific Enhanced Edition corrects the invalid entries in General-purpose Edition. Industry-specific Enhanced Edition enriches the existing dictionaries by incorporating a variety of word usages in the content industry.

  • Space-based analysis

Search query

General-purpose Edition

Industry-specific Enhanced Edition for Content Community

为了解压缩

为 了解 压缩

为了 解 压缩

实参与形参

实 参与 形参

实参 与 形参

结构体重载

结构 体重 载

结构体重载

googlechromeframe

googlechromeframe

google chrome frame

  • Spelling correction

Search query

General-purpose Edition

Industry-specific Enhanced Edition for Content Community

淘宝只能视觉

淘宝只能视觉

淘宝智能视觉

mybatics代码生成

mybatics代码生成

mybatis代码生成

计算机网路

计算机网路

计算机网络

微行小程序

微型小程序

微信小程序

深度学西

深度学西

深度学习

This feature provides a high-quality vector-based retrieval model for the distribution of vertical industry data in the content industry to ensure the retrieval performance of long-tail search queries that include spelling error queries and synonym-based rewritten queries.

  • Vector-based retrieval

Search query

美国gmted2010的shuju下载

Vector-based retrieval top 1

gmt43相关代码、资料下载地址

Vector-based retrieval top 2

gmt0054-2010.pdf

Vector-based retrieval top 3

gmted2010美国download地址

Search query

3D游戏画面处理

Vector-based retrieval top 1

3d游戏动画处理基础

Vector-based retrieval top 2

3d游戏动画的基础

Vector-based retrieval top 3

动画游戏处理

Search query

禁用n卡

Vector-based retrieval top 1

网卡的禁止和启动

Vector-based retrieval top 2

禁用网卡

Vector-based retrieval top 3

禁用及启用网卡

Sequence behavior-based modeling for personalized search

For example, if you entered "面试 (Interview)" to search for results before you enter "Java" to search for results, the results returned are different from those returned if you enter only "Java" to search for results. In this case, personalized retrieval is implemented to meet the search requirements of different users and improve search experience for users.

DeepRanking model for deep sort

The number of model parameters can reach 100 billion to ensure better search performance. The costs of model training and usage are lower than the costs of self-managed search engines invested in manpower, machines, and R&D support.

Deep retrieval model that integrates the NLP capabilities of Alibaba DAMO Academy to improve search performance and reduce the zero-result rate

image

Highly tailored models can be provided based on the accumulated technologies of Alibaba Group for users and data of different characteristics.

Purchase an Industry-specific Enhanced Edition instance

You can purchase an OpenSearch Industry-specific Enhanced Edition instance with ease and quickly get started with it. Industry-specific Enhanced Edition allows you to access industry-specific templates with a few clicks. You can select features based on your business requirements. In addition, this edition allows non-technical users to perform business intervention and optimization and digital operations.

Design a schema

For more information, see JOIN operations on multiple tables.

Import data

OpenSearch allows you to import data from data sources. You can also import data to OpenSearch by calling API operations, using SDKs, or using the console. For more information, see the following topics:

  1. Configure an ApsaraDB RDS for MySQL data source

  2. Configure a MaxCompute data source

  3. Configure a PolarDB for MySQL data source

  4. Import data by using the API or SDK

Configure an application of Industry-specific Enhanced Edition for Content Community

To configure an application of the IT industry, specify IT as the vertical industry in the Feature Selection step. You can specify template features based on your business requirements. By default, all features are selected.

Template features are classified into Query Analysis, Sort Policy, and Other Services. Query Analysis features include features such as Term Weight Analysis Dictionary for IT Content Industry, Synonym Dictionary for IT Content Industry, and Text Vectorization. Sort Policy features include Multimodal Search, Text Relevance, and Vector Relevance. The feature in Other Services is Drop-down Suggestions.

Search test

  1. To use IT vector indexes to search for results, you must select Text Vectorization and add the corresponding IT vector indexes.

  2. Perform a search test.

Custom services

If you have requirements for deep retrieval, sort optimization, or personalized search, the experts of the OpenSearch team can provide you with custom services.

Practice

As a Chinese IT content community, a technology community is committed to providing full-lifecycle services such as knowledge dissemination, online learning, and career development for Chinese software developers. The technology community provides a variety of products.

Since the community started to use OpenSearch, the community has connected its services to multiple PC and mobile platforms, covering comprehensive search on the homepage and searches on channels such as blog, download, and Q&A in one year. OpenSearch helps the community provide high-quality search services for users of its products, and increase business conversion rate by optimizing search capabilities. This increases overall business revenue for the community.

  • Compared with open source self-managed services, the CTR is increased by more than 80%.

  • Algorithm experts continue to help the community tune search services based on highly tailored models. Visits to exposed products are increased by 16.7%, and the item CTR is increased by 11.8%. The performance is still being improved.