Revealing Alibaba's Knowledge Graph Technology
Application of Ali Knowledge Graph
The Alibaba ecosystem has accumulated a large amount of commodity data. These valuable commodity data come from multiple markets. At the same time, various roles such as brand owners, industry operations, governance operations, consumers, state agencies, and logistics providers participate in it, contributing, Correcting such a huge commodity library. Whether it is intellectual property protection, or improving consumer shopping experience, it is of great significance to realize the standardization of product data (unification of product specifications and certainty of product information), as well as the deep interconnection with internal and external data. Ali product knowledge The map carries the basic and root work of commodity standardization. Based on this, we can know which products are the same product, and we can know exactly whether a brand is authorized and which markets the products under the brand are sold to.
Ali Knowledge Graph is centered on commodities, standard products, standard brands, standard barcodes, and standard classifications. Using entity recognition, entity linking, and semantic analysis technologies, it integrates nine major categories of first-level ontology such as public opinion, encyclopedia, and national industry standards. , including tens of billions of triples, forming a huge knowledge network.
Ali Knowledge Graph comprehensively utilizes cutting-edge NLP, semantic reasoning, and deep learning technologies to create an intelligent service system for commodities on the entire network and serve each role in the Ali ecosystem. Commodity knowledge graphs are widely used in core and innovative businesses such as search, front-end shopping guides, platform governance, intelligent question and answer, and brand owner operations. It can help brand owners see global data, help platform management and operations find problematic products, help the industry select products based on certain information, and improve the shopping experience of consumers by matching goods with people and stores, etc. Provide a reliable intelligent engine for new retail and internationalization.
▌Introduce machine learning algorithm to build inference engine
We design a set of frameworks to realize knowledge representation and reasoning. In addition: knowledge map entities, relationships, word forests (synonyms, hyponyms), vertical knowledge maps (such as geographic location maps, material maps), machine learning algorithm models, etc. are all included for a unified description.
According to different scenarios, we divide reasoning into: hyponymy and equivalence reasoning; inconsistency reasoning; knowledge discovery reasoning; ontology concept reasoning, etc. For example
1. Hypernymy and equivalence reasoning. When retrieving the parent category, recall the objects of the subcategory through hyponym reasoning, and use equivalent reasoning (synonyms of entities, variant words, models of the same model, etc.) to expand the recall. For example, in order to protect consumers, we need to intercept "food produced in a certain nuclear-contaminated area", and the inference engine translates it into "find the food produced in this area, and the attribute item is synonymous with "origin", and the attribute value is the food of the sub-entity of the area , and the same type of food as the hit food".
2. Inconsistent reasoning. In the process of playing games with problem sellers, we need to check the consistency of basic information such as product titles, attributes, pictures, product qualifications, and seller qualifications such as brands, materials, and ingredients. For example, the brand in the title is Nike and the brand in the attribute or tag is Nake. As shown in the figure below, the product title, attributes, and brand information on the tag are described on the left, and the inference is consistent. On the right is the product whose tag and product brand are inconsistent, and is judged as a problematic product by the inference engine.
3. Knowledge discovery reasoning. The purpose of consistency reasoning is to ensure the certainty of information. For example, through consistency reasoning, we can ensure that the food ingredient list covered by the data is correct. But consumers rarely look at the complicated numbers on the ingredient list when shopping. Consumers really care about highly perceived knowledge points such as sugar-free and salt-free. In order to improve consumer shopping experience, knowledge discovery reasoning uses underlying ingredient list data and national industry standards such as:
Sugar-free: carbohydrates ≤ 0.5 g/100 g (solid) or 100 mL (liquid)
No salt: sodium ≤5mg/100 g or 100 mL
We can convert the ingredient list data into knowledge points such as "sugar-free" and "salt-free". Thus truly turning data into knowledge. Through the AB test verification, similar knowledge points have greatly improved the consumer shopping experience in the front-end shopping guide.
▌The technical framework behind the reasoning engine
First, the inference engine converts natural language into logical form through semantic parsing. Semantic parsing uses a combination of neural network and symbolic logic execution: natural language is encoded into a distributed representation through syntax, grammatical analysis, NER, and Entity Linking, and the distributed representation of a sentence is further translated into a logical expression Mode.
In the process of transforming distributed representations into logical expressions, we first face the problem of mapping between representations and predicate operations. We regard predicates as actions, and perform symbolic operations through training, similar to using the attention mechanism in neural programmers to select appropriate operations, that is, to select the most likely predicate operations, and finally splicing predicate operations into possible logical expressions according to the analyzed syntax, etc., and then Convert logical expressions into queries, etc. The schematic diagram of the process is shown in the figure below.
Second, logical expressions trigger subsequent logical reasoning and graph reasoning. Logical expressions follow the following principles in the design process: logical expressions are close to human natural language, and at the same time, they are easy to understand by machines and humans. The expression ability meets the requirements of knowledge map data and knowledge representation. It should be easy to expand, can add new classes, entities and relationships very conveniently, and can support multiple logic languages and systems, such as Datalog, OWL, etc., that is, these languages and the algorithm modules behind them are pluggable, through pluggable Pulling out the functions, the inference engine has the ability to describe different logical systems.
Take hyponymy and equivalence reasoning as an example: "Food produced in China", "
Described in logical expressions as:
∀x: food (x) ⊓ (∀ y: synonym (y, place of origin)) (x, (∀ z: including subentity (China, z)))
Then find the same paragraph:
∀t, x: ($ c: belongs to product (x, c) ⊓ belongs to product (t, c))
In addition, the inference engine is also used for knowledge base auto-completion. We do knowledge base completion based on embedding. The main idea is to add the structural information in the knowledge base to the embedding, taking into account the characteristics of the Trans series, including edges, adjacent points, paths, text descriptions of entities (such as details), pictures and other features, for the prediction of new relationships and completion.
After three years of construction, Ali Knowledge Map has formed a huge knowledge map and massive standard data. At the same time, we have established a joint project team with the team of Professor Chen Huajun from Zhejiang University, and introduced cutting-edge natural language processing, knowledge representation and logical reasoning technologies. It is playing an increasingly important role under Alibaba's new retail and internationalization strategy.
The Alibaba ecosystem has accumulated a large amount of commodity data. These valuable commodity data come from multiple markets. At the same time, various roles such as brand owners, industry operations, governance operations, consumers, state agencies, and logistics providers participate in it, contributing, Correcting such a huge commodity library. Whether it is intellectual property protection, or improving consumer shopping experience, it is of great significance to realize the standardization of product data (unification of product specifications and certainty of product information), as well as the deep interconnection with internal and external data. Ali product knowledge The map carries the basic and root work of commodity standardization. Based on this, we can know which products are the same product, and we can know exactly whether a brand is authorized and which markets the products under the brand are sold to.
Ali Knowledge Graph is centered on commodities, standard products, standard brands, standard barcodes, and standard classifications. Using entity recognition, entity linking, and semantic analysis technologies, it integrates nine major categories of first-level ontology such as public opinion, encyclopedia, and national industry standards. , including tens of billions of triples, forming a huge knowledge network.
Ali Knowledge Graph comprehensively utilizes cutting-edge NLP, semantic reasoning, and deep learning technologies to create an intelligent service system for commodities on the entire network and serve each role in the Ali ecosystem. Commodity knowledge graphs are widely used in core and innovative businesses such as search, front-end shopping guides, platform governance, intelligent question and answer, and brand owner operations. It can help brand owners see global data, help platform management and operations find problematic products, help the industry select products based on certain information, and improve the shopping experience of consumers by matching goods with people and stores, etc. Provide a reliable intelligent engine for new retail and internationalization.
▌Introduce machine learning algorithm to build inference engine
We design a set of frameworks to realize knowledge representation and reasoning. In addition: knowledge map entities, relationships, word forests (synonyms, hyponyms), vertical knowledge maps (such as geographic location maps, material maps), machine learning algorithm models, etc. are all included for a unified description.
According to different scenarios, we divide reasoning into: hyponymy and equivalence reasoning; inconsistency reasoning; knowledge discovery reasoning; ontology concept reasoning, etc. For example
1. Hypernymy and equivalence reasoning. When retrieving the parent category, recall the objects of the subcategory through hyponym reasoning, and use equivalent reasoning (synonyms of entities, variant words, models of the same model, etc.) to expand the recall. For example, in order to protect consumers, we need to intercept "food produced in a certain nuclear-contaminated area", and the inference engine translates it into "find the food produced in this area, and the attribute item is synonymous with "origin", and the attribute value is the food of the sub-entity of the area , and the same type of food as the hit food".
2. Inconsistent reasoning. In the process of playing games with problem sellers, we need to check the consistency of basic information such as product titles, attributes, pictures, product qualifications, and seller qualifications such as brands, materials, and ingredients. For example, the brand in the title is Nike and the brand in the attribute or tag is Nake. As shown in the figure below, the product title, attributes, and brand information on the tag are described on the left, and the inference is consistent. On the right is the product whose tag and product brand are inconsistent, and is judged as a problematic product by the inference engine.
3. Knowledge discovery reasoning. The purpose of consistency reasoning is to ensure the certainty of information. For example, through consistency reasoning, we can ensure that the food ingredient list covered by the data is correct. But consumers rarely look at the complicated numbers on the ingredient list when shopping. Consumers really care about highly perceived knowledge points such as sugar-free and salt-free. In order to improve consumer shopping experience, knowledge discovery reasoning uses underlying ingredient list data and national industry standards such as:
Sugar-free: carbohydrates ≤ 0.5 g/100 g (solid) or 100 mL (liquid)
No salt: sodium ≤5mg/100 g or 100 mL
We can convert the ingredient list data into knowledge points such as "sugar-free" and "salt-free". Thus truly turning data into knowledge. Through the AB test verification, similar knowledge points have greatly improved the consumer shopping experience in the front-end shopping guide.
▌The technical framework behind the reasoning engine
First, the inference engine converts natural language into logical form through semantic parsing. Semantic parsing uses a combination of neural network and symbolic logic execution: natural language is encoded into a distributed representation through syntax, grammatical analysis, NER, and Entity Linking, and the distributed representation of a sentence is further translated into a logical expression Mode.
In the process of transforming distributed representations into logical expressions, we first face the problem of mapping between representations and predicate operations. We regard predicates as actions, and perform symbolic operations through training, similar to using the attention mechanism in neural programmers to select appropriate operations, that is, to select the most likely predicate operations, and finally splicing predicate operations into possible logical expressions according to the analyzed syntax, etc., and then Convert logical expressions into queries, etc. The schematic diagram of the process is shown in the figure below.
Second, logical expressions trigger subsequent logical reasoning and graph reasoning. Logical expressions follow the following principles in the design process: logical expressions are close to human natural language, and at the same time, they are easy to understand by machines and humans. The expression ability meets the requirements of knowledge map data and knowledge representation. It should be easy to expand, can add new classes, entities and relationships very conveniently, and can support multiple logic languages and systems, such as Datalog, OWL, etc., that is, these languages and the algorithm modules behind them are pluggable, through pluggable Pulling out the functions, the inference engine has the ability to describe different logical systems.
Take hyponymy and equivalence reasoning as an example: "Food produced in China", "
Described in logical expressions as:
∀x: food (x) ⊓ (∀ y: synonym (y, place of origin)) (x, (∀ z: including subentity (China, z)))
Then find the same paragraph:
∀t, x: ($ c: belongs to product (x, c) ⊓ belongs to product (t, c))
In addition, the inference engine is also used for knowledge base auto-completion. We do knowledge base completion based on embedding. The main idea is to add the structural information in the knowledge base to the embedding, taking into account the characteristics of the Trans series, including edges, adjacent points, paths, text descriptions of entities (such as details), pictures and other features, for the prediction of new relationships and completion.
After three years of construction, Ali Knowledge Map has formed a huge knowledge map and massive standard data. At the same time, we have established a joint project team with the team of Professor Chen Huajun from Zhejiang University, and introduced cutting-edge natural language processing, knowledge representation and logical reasoning technologies. It is playing an increasingly important role under Alibaba's new retail and internationalization strategy.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00