Enterprise search for core application scenarios of ElasticStack-Alibaba Cloud Developer Community

This article is included in the Elastic Stack practical manual. Welcome to unlock the books created by developers together with me and learn ElasticsStack systematically. https://developer.aliyun.com/topic/elasticstack/playbook

what is enterprise search?

Enterprise Search, as its name implies, is the search service used by enterprises or the search service provided by enterprises. Specifically, customers of the enterprise can use the search service provided by the enterprise to search for the products and services provided by the enterprise. For example, e-commerce enterprises provide search service for customers to search for commodity information, the APP market provides search services for users to search for apps and so on. It is also possible that members of various departments within the enterprise use the search services within the enterprise to search for various information within the enterprise, such as project information, code information, document information, and so on.

Characteristics of enterprise search

enterprise search has its own characteristics due to different application scenarios. Compared with the familiar and commonly used Internet search sites such as Baidu and Google, enterprise search has the following differences:

different data sources

as we all know, Internet search engines such as Baidu and Google mainly capture data on the Internet through web crawlers; And the data searched by enterprises mainly comes from enterprises themselves and is provided by their own data sources.

Different data content

the data captured by the Internet search engine is mainly a variety of web pages, images, audio, video, documents, etc. published by each website; the data processed by enterprise search is mainly private information provided within the enterprise, such as product information, project information, internal documents, office software, emails, databases, etc. At the same time, enterprise search can also include a variety of public data.

Data Update frequency varies

web crawlers are passive to capture data. It takes some time to capture new data. The frequency of data updates may be uncertain due to various factors, and data updates may not be timely; the data sources searched by enterprises are self-controllable, data is often automatically generated by enterprises, and data updates are basically real-time.

Different data integrity

internet searches cannot capture and display all data due to various factors, such as incomplete website lists, prohibited crawling of website Robots, and laws and policies, it is normal that users cannot search for the required data. However, the data searched by enterprises are preset by enterprises, and the results searched by users should be displayed according to the design, it is unacceptable that the data that should be displayed cannot be searched.

Different users and requirements

internet search is aimed at ordinary users of the public. Search methods and search results are generally not changed due to the needs of individual users or some users; enterprise search is aimed at the internal users of the enterprise or the customers of a certain business of the enterprise. We should try our best to be close to the users' habits in terms of search methods and be complete and accurate enough in terms of search results, can accurately express business demands.

The controllability of search results is different

for users who use Internet search, the search results will not vary depending on users. The search results are sorted and displayed based on the PageRank algorithm, the search results of all users are basically the same. However, the search results of enterprises need to be controlled according to the user's permissions. The search results of users with different permissions are different, the results that should not be displayed to users cannot be displayed. At the same time, the results of enterprise search need to be explicitly controlled, such as through sorting policies and weight policies, it is even necessary to directly process the search results to control the search results.

Elastic enterprise search capability

Elasticsearch is a distributed search engine based on Apache Lucene. It has full-text search, multi-user and near real-time capabilities, and can be used to search various documents. Adhering to the concept of making products easier to use, Elastic introduced Elastic Stack in Elastic App Search 7.2, and launched Elastic Stack official version in Elastic Workspace Search 7.7, which will Elastic App Search, Elastic Workspace Search, site Search is packaged into a separate solution, which is called Elastic Enterprise Search, that is, Elastic Enterprise Search. App Search, Workspace Search, and Site Search basically cover all Search application scenarios of enterprises.

Based on App Search powerful storage and analysis functions, Elasticserach provides optimized APIs, intuitive dashboards, easy-to-use and adjustable controls, and quickly integrated clients.

App Search system architecture

Workspace Search provides a seamless connection to the office collaboration efficiency tool wizard and API for internal office Search scenarios, and builds centralized information sources with Elasticsearch. For information and documents scattered in various office software, automatic synchronization, re-organization, and customization are enabled to solve the problem of isolated information in the collaborative work process of teams. Common office softwares, such as Salesforce,Dropbox, Google docs, Sharepoint, Jira, and Confluence, provide friendly access guides. Of course, custom sources can also be used to access other systems. Workspace Search allows you to control permissions, configure correlation, and customize personalized results for each team member. This helps the team improve the speed, integrity, and utilization of information within a safe and controllable range.

Workspace Search self-contained Search interface

the core of Site Search is web crawler, which is a set of tools to help enterprises quickly build website Search functions. As long as you enter the URL, the crawler can automatically collect content and update it regularly. You can also manually re-index specific pages or the entire website. Site Search supports complex queries through automatic correction, double word matching, stem extraction, and synonyms. You can also quickly adjust page rankings, add or remove weights, and synonyms through an intuitive interface.

Although the application scenarios of App Search, Workspace Search, and Site Search are different, they are all enterprise Search scenarios, and the relevant support capabilities are common or similar. Next, we will understand Elastic Enterprise Search Enterprise Search capability by understanding Elastic solutions.

Quick Deployment

Elastic Enterprise Search Supports four deployment, respectively is Elastic cloud Instance, Elastic cloud Kubernetes cluster Deployment, Linux/MacOS package Deployment and Docker container Image deployment. The four deployment methods are simple and fast. Elastic cloud instances have the lowest threshold and the most features. They support free trial use for 14 days and are suitable for quick learning of product features; however, the deployment of Linux/MacOS packages is relatively complex, which is suitable for users who are familiar with the operating system and want to know the installation and deployment details. If you do not want to use cloud services or download and configure installation packages step by step, docker deployment is a good choice.

Unified authentication capability

Elastic App Search and Elastic Workspace Search support standard username and password mode, Elasticsearch local region mode, and Elasticsearch SAML third-party unified authentication mode for login authentication and role authorization. Which standard user name password mode, by the administrator in Elastic App Search Or Elastic Workspace Search On the Panel the user management; Elasticsearch book geographical mode Elasticsearch Native Realm by Elasticsearch direct Management And Store user Information; elasticsearch SAML MODE is Elasticsearch Use third-party unified certification the user's login certification, and Elastic App Search And Elastic Workspace Search Directly inherited Elasticearch In The SAML configuration.

Role authorization

no matter which logon authentication mode is used, Elastic Enterprise Search supports authorization by role. However, for each authentication mode, the authorization method is slightly different. In standard username and password authentication mode, Elastic App Search use Role-Based Access Control (Role Based Access Control) to authorize users. The authorized roles include Owner, Admin, Dev, Editor, analyst and so on. Elastic Workspace group users based on data content permissions, departments, and other factors, and then authorize the group. In Elasticsearch Native Realm and Elasticsearch SAML authentication modes, both Elastic App Search and Elastic Workspace Search use role mapping to authorize users. First, create a role in the Elasticsearch, then, map the roles created in the Elastic App Search in the Elastic Workspace Search and Elasticsearch. The Elastic App Search can be mapped to the following roles: Owner, Admin, Dev, Editor, and Analyst. The Elastic Workspace Search can be mapped to the following roles: Admin, User.

Supports different levels of content sources

Workspace Search allows you to collect data from various sources and use custom APIs to access data. It is also applicable to more than a dozen common office applications, such as GitHub, Jira, Confluence, Google Driver, OneDriver, SharePoint Online, Gmail, and Slack, provides a data collection wizard for easy access. In addition, Workspace Search supports Organization Content Sources Organization Content Sources and Private Content Sources Private Content Sources, as well as Standard Content Sources Standard Content Sources and Standard Content Sources remote Content Sources. Organization content sources are generally configured by administrators for the entire organization, while private content sources can be configured by individual users and used only by themselves. All source data in the standard content source is collected and stored. However, the remote content source only collects part of the information and relies on the search endpoint of the data source for data retrieval. Because standard content sources collect full data, if multiple users establish multiple data connections to the same content source, data is collected and stored in multiple copies, it has a great impact on the storage capacity of Elasticsearch. However, remote data sources collect very little data, which in the same case has a very small impact on Elasticsearch. Of course, a prerequisite for creating a searchable remote content source is that the remote content source itself has a retrieval endpoint.

Site Search Web crawlers can automatically collect content and update it on a regular basis as long as you enter the web address. Users can manually re-index specific pages or the entire website.

Supports document-level permissions.

Workspace Search supports source document permission synchronization. Supported applications include Jira Cloud, Confluence Cloud, Google Driver, OneDriver, and SharePoint Online. You can also use the_allow_permissions and_deny_permissions fields to configure document-level permissions for other custom access content sources. Following code for document configuration permissions:

{  "_allow_permissions":[  "permission1"  ],  "_deny_permissions":[  ],  "id":1235,  "title":"The Meaning of Sleep",  "body":"Rest, recharge, and connect to the Ether.",  "url":"https://example.com",  "created_at":"2019-06-01T12:00:00+00:00",  "type":"list" }

the following code provides an example on how to assign permissions to users:

curl -X POST \ http://localhost:3002/api/ws/v1/sources/[CONTENT_SOURCE_ID]/permissions/[USER_NAME] \ -H "Authorization: Bearer [ACCESS_TOKEN]" \ -H 'Content-Type: application/json' \  -d '{  "permissions":[  "permission1"  ] }'

support Meta Engine

App Search Support Meta Engine. Meta Engine does not store documents. It combines multiple source document engines to search for content in multiple source document engines.

Supports custom search experience

enter keywords in the Workspace Search Search bar to Search, or add Workspace Search to the browser Search engine. You can enter keywords in the browser address bar to Search, the search experience is like using Google or Baidu in a browser.

Users can easily view searchable content sources and the latest content, and can also search for content by date.

You can set the search field, result field, field weight, field value weight, filter, sort, pagination, structure, highlight, and so on.

Run the following code to return the first page with one entry per page:

curl -X POST http://localhost:3002/api/ws/v1/search \ -H "Authorization: Bearer $ACCESS_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "query": "denali", "page": { "size": 1, "current": 1 } }'

run the following code to sort data in descending order of square_km and date_established:

curl -X POST http://localhost:3002/api/ws/v1/search \ -H "Authorization: Bearer $ACCESS_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "query": "denali", "sort": [ { "square_km": "desc" }, { "date_established": "asc" } ] }'

run the following code to search in the fields Title and Description. The weight of the Title is 10:

curl -X POST http://localhost:3002/api/ws/v1/search \ -H "Authorization: Bearer $ACCESS_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "query": "denali", "search_fields": { "title": { "weight": 10 }, "description": {} } }'

the following code sets the weight based on the value of the world_knowledage_site field. If the value of the field is true, the weight is 10:

curl -X GET 'https://[instance id].ent-search.[region].[provider].cloud.es.io/api/as/v1/engines/national-parks-demo/search' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer search-soaewu2ye6uc45dr8mcd54v8' \ -d '{ "query": "old growth", "boosts": {  "world_heritage_site": [  {  "type": "value",  "value": "true",  "operation": "multiply",  "factor": 10  }  ] } }

App Search supports adding tags to queries and filtering by tag:

curl -X GET 'https://[instance id].ent-search.[region].[provider].cloud.es.io/api/as/v1/engines/national-parks-demo/search' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer search-soaewu2ye6uc45dr8mcd54v8' \ -d '{ "query": "everglade", "analytics": {  "tags": [  "i-am-a-tag"  ] } }'
curl -X GET 'https://[instance id].ent-search.[region].[provider].cloud.es.io/api/as/v1/engines/national-parks-demo/analytics/queries' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer private-namt1hkv7ttsawuo452sxi6s' \ -d '{ "filters": { "tag": "i-am-a-tag" } }'

supports the control of query results.

App Search allows you to directly control the query results, as shown in the following figure. You can click an asterisk or drag the results to re-sort the query results. You can click the eye icon to hide the query results. Of course, all operations can also be set by using the API. For more information, see Curations API.

Query optimization

App Search can select a language when creating an engine. App Search, the engine is automatically optimized for different languages, including stem matching, character matching, phrase matching, and typesetting tolerance. App Search supports keyword recommendation and automatic completion. When you enter some keywords, App Search can recommend keywords based on the existing data in the engine. Users can obtain more accurate Search results by selecting more appropriate keywords. The following code provides the recommended keywords for Car keywords based on the Title and States fields in the document:

curl -X POST 'https://[instance id].ent-search.[region].[provider].cloud.es.io/api/as/v1/engines/national-parks-demo/query_suggestion' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer search-7eud55t7ecdmqzcanjsc9cqu' \ -d '{ "query": "car", "types": {  "documents": {  "fields": [  "title",  "states"  ]  } }, "size": 3 }'

results three recommended keywords are carlsbad, carlsbad caverns, and carolina.

{  "results":{  "documents":[  {  "suggestion":"carlsbad"  },  {  "suggestion":"carlsbad caverns"  },  {  "suggestion":"carolina"  }  ]  },  "meta":{  "request_id":"914f909793379ed5af9379b4401f19be"  } }

App Search supports synonym configuration. You can use synonyms to query required results. Use the following code to set the summit, peak, cliff, and moutain parameters as synonyms:

curl -X POST 'https://[instance id].ent-search.[region].[provider].cloud.es.io/api/as/v1/engines/national-parks-demo/synonyms' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx' \ -d '{ "synonyms": ["summit", "peak", "cliff", "mountain"] }

Enterprise Search allows you to record, query, and analyze Log API Log to facilitate analysis and optimization of user Search results, such as Search results, Search performance, and Search exceptions, and continuously improve user Search experience, positive feedback is generated.

Code integration

Enterprise Search provides Enterprise Search Python Client and Enterprise Search Ruby Client to easily integrate Enterprise Search with Python and Ruby code.

Provide search box UI

Elastic enterprise Search provides the React user Search interaction interface that connects to the App Search, which can be directly downloaded and imported to use. This saves a lot of front-end code workload, and requires a Search box, however, front-end applications without special requirements are also a good choice. Site Search, you only need to implement a few lines of code on your website to add a Search box supported by the Elasticsearch.

Summary

the business scenario of Enterprise Search determines the characteristics and requirements of Enterprise Search. Elastic builds a more easy-to-use Enterprise Search solution Elasticsearch based on the powerful functions of Elastic Enterprise Search. Elastic Enterprise Search supports full-scenario coverage from self-deployment to permission control, from document access to query optimization, and from front-end UI to result control for Enterprise Search scenarios, although the threshold for building a set of enterprise search system is very low and the usability is also very good, it is a set of perfect interface, many functions and relatively complex system after all. The above content only briefly introduces its basic capabilities. If you need to apply it to the production environment, you need to carefully read relevant documents and conduct in-depth research and practice according to actual business needs.

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now