Building a Navigation Map for Data Assets: In-Depth Explanation of the Data Discovery and End-to-End Analysis Capabilities of UModel

This article introduces UModel’s data discovery and end-to-end analysis capabilities, enabling unified metadata exploration, relationship mapping, and...

1. Background Information

Imagine you are standing in a vast library filled with tens of thousands of books, where the catalog of each book is scattered across different rooms and each room uses its own unique indexing system. If you want to find a book about service calls, you have to go back and forth between the APM room, Kubernetes room, and cloud resource room, and remember different search rules for each room.

This is the real dilemma faced by many enterprises in the field of observability. UModel acts like an intelligent management system built for this chaotic library, which allows you to easily explore and understand the structure of the entire knowledge graph.

1.1 What is UModel?

UModel is a graph-based observable data modeling method designed to address core challenges in the collection, organization, and usage of observable data within enterprise-level environments. UModel employs a graph structure composed of nodes and links to describe the IT world, and implements unified representation, storage decoupling, and intelligent analysis of observable data through standardized data modeling.

As the foundational data modeling framework for Alibaba Cloud's observable system, UModel provides enterprises with a set of common observable interaction languages that enable humans, programs, and AI to understand and analyze observable data, thereby building true full-stack observability capabilities.

Core Concepts

UModel employs fundamental graph theory concepts and uses nodes and links to form a directed graph for modeling IT systems:

● Node: The core component is a Set (data collection), which represents a collection of homogeneous entities or data, such as EntitySet (entity set), MetricSet (metric set), and LogSet (log set). It also includes the Storage type for the Set, such as Simple Log Service (SLS), Prometheus, and MySQL.

● Link: indicates the relationships between nodes, such as EntitySetLink (entity association), DataLink (data association), and StorageLink (storage association).

● Field: defines constraints and properties for Sets and Links and encompasses over 20 configuration items, including names, types, constraint rules, and analysis features.

1.2 What is a UModel Query?

A UModel query is a dedicated interface in EntityStore for querying knowledge graph metadata. Using the .umodel query syntax, it enables exploration of EntitySet definitions, EntitySetLink relationships, and the complete knowledge graph structure. This provides robust support for data modeling analysis and schema management.

Query Differentiation

The following table describes the differences between UModel queries and queries of other types.

Query type	Destination data	Example
UModel query	Knowledge graph schema	EntitySet definitions and relationship type definitions
Entity query	Specific entity instances	Specific services, pods, and host instances
Topo query	Relationships between entities	Specific call relationships and deployment relationships

The UModel query operates at the metadata layer, which helps users understand the structure and definitions of data models, rather than the specific runtime data.

2. UModel Query

2.1 Data Model

Data Structure

The data returned by a UModel query has a fixed five-field structure.

Field	Type	Description	Example
`__type__`	string	Built-in system field	`link` or `node`
`kind`	string	UModel element type	`entity_set`, `entity_set_link`, `data_link`
`metadata`	string	Metadata information	Name, description, domain information, and others
`schema`	string	Schema definition	Field definitions, type constraints, and others
`spec`	string	Implementation specifications	Storage configuration, computation logic, and others

Note: metadata, schema, and spec are JSON-formatted strings. Use the json_extract_scalar function to extract values.

Data Examples

type	kind	metadata	schema	spec
node	`entity_set`	`{"kind":"entity_set","domain":"synthetics","name":"synthetics.task","description":{"en_us":"Synthetic Task refers to the specific endpoint or service being monitored..."}}`	`{"version":"v0.1.0","url":"umodel.aliyun.com"}`	`{"last_observed_time_field":"<strong>time</strong>","keep_alive_seconds":3600,"time_field":"<strong>time</strong>","primary_key_fields":["task_name","task_id"],"name_fields":<br>["task_name","task_id","task_url","task_type","probe_type","task_state"]...}`
node	`entity_set`	`{"short_description":{"en_us":"A service name is used to identify and differentiate between various services.","zh_cn":"` An application name is used to identify and distinguish different applications in the application performance management (APM) field. `"},"kind":"entity_set","domain":"apm"...}`	`{"version":"v0.1.0","url":"umodel.aliyun.com"}`	`{"last_observed_time_field":"<strong>last_observed_time</strong>","keep_alive_seconds":3600,"primary_key_fields":["service_id"],"name_fields":<br>["service"],"dynamic":false,"ordered_fields":["service","source","language"]...}`

2.2 Query Syntax

Basic Query Syntax

-- Basic query format
.umodel | [SPL operations...]
-- Query with constraints
.umodel | where <condition> | limit <count>

Core Query Patterns

1. List Queries - metadata enumeration

Query all UModel data:

-- Query all UModel data (not recommended for production environments):
.umodel
-- Paginated query
.umodel | limit 0, 10

Filter by type:

-- Query all EntitySet definitions
.umodel | where kind = 'entity_set' | limit 0, 10
-- Query all EntitySetLink definitions
.umodel | where kind = 'entity_set_link' | limit 0, 10
-- Query all link types (relationship definitions)
.umodel | where __type__ = 'link' | limit 0, 10
-- Query all node types (entity definitions)
.umodel | where __type__ = 'node' | limit 0, 10

Filter by property:

-- Query the definition of an entity with a specific name
.umodel | where json_extract_scalar(metadata, '$.name') = 'acs.ecs.instance' | limit 0, 10
-- Query all definitions in a specific domain
.umodel | where json_extract_scalar(metadata, '$.domain') = 'apm' | limit 0, 10
-- Query definitions across multiple domains
.umodel | where json_extract_scalar(metadata, '$.domain') in ('acs', 'apm', 'k8s') | limit 0, 10

2. Graph Analysis - relationship exploration

UModel supports metadata-driven graph computations for analyzing relationships between EntitySets:

Basic graph query syntax:

.umodel | graph-match <path> project <output>

Concepts:

In graph queries, two fundamental graph concepts are critical:

Node type (label): represented as <domain>@<kind> in UModel metadata graph queries. Example: apm@entity_set.
Node ID: represented as __entity_id__ in UModel metadata graph queries, formatted as kind::domain::name. Example: entity_set::apm::apm.service.

Path queries in graphs use ASCII characters to represent the direction of relationships.

Path expression	Direction description
`(A)-[e]->(B)` or `(A)-->(B)`	Directed relationship from A to B
`(A)<-[e]-(B)` or `(A)<--(B)`	Directed relationship from B to A
`(A)-[e]-(B)` or `(A)--(B)`	Bidirectional relationship (no direction enforced)

Query EntitySet relationships:

-- Query all relationships for a specific EntitySet
.umodel
| graph-match (s:"acs@entity_set" {__entity_id__: 'entity_set::acs::acs.ecs.instance'})
              -[e]-(d)
  project s, e, d | limit 0, 10

Directional relationship queries:

-- Incoming relationships (pointing to an EntitySet):
.umodel
| graph-match (s:"acs@entity_set" {__entity_id__: 'entity_set::acs::acs.ecs.instance'})
              <--(d)
  project s, d | limit 0, 10
-- Outgoing relationships (originating from an EntitySet):
.umodel
| graph-match (s:"acs@entity_set" {__entity_id__: 'entity_set::acs::acs.ack.cluster'})
              -->(d)
  project s, d | limit 0, 10

2.3 Advanced Queries

JSON path extraction

Since UModel data is stored in JSON format, JSON functions are required for field extraction:

-- Extract basic information
.umodel
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    entity_domain = json_extract_scalar(metadata, '$.domain'),
    entity_description = json_extract_scalar(metadata, '$.description.zh_cn')
| project entity_name, entity_domain, entity_description | limit 0, 100

Composite filtering with multiple conditions

-- Query with complex conditions
.umodel
| where kind = 'entity_set'
  and json_extract_scalar(metadata, '$.domain') in ('apm', 'k8s')
  and json_array_length(json_extract(spec, '$.fields')) > 5
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    field_count = json_array_length(json_extract(spec, '$.fields'))
| sort field_count desc
| limit 20

Aggregate analysis

-- Count the number of EntitySets by domain
.umodel
| where kind = 'entity_set'
| extend domain = json_extract_scalar(metadata, '$.domain')
| stats entity_count = count() by domain
| sort entity_count desc

2.4 Performance Optimization Recommendations

Use Precise Filters

-- Before optimization: broad scope
.umodel | where json_extract_scalar(metadata, '$.name') like '%service%'
-- After optimization: precise matching
.umodel | where kind = 'entity_set'
  and json_extract_scalar(metadata, '$.domain') = 'apm'
  and json_extract_scalar(metadata, '$.name') = 'apm.service'

Pre-filtering

-- Before optimization: late filtering
.umodel
| extend name = json_extract_scalar(metadata, '$.name')
| where name = 'apm.service'
-- After optimization: pre-filtering
.umodel
| where json_extract_scalar(metadata, '$.name') = 'apm.service'
| extend name = json_extract_scalar(metadata, '$.name')

Graph Query Optimization

-- Before optimization: full graph search
.umodel | graph-match (s)-[e]-(d) project s, e, d
-- After optimization: specifying the start point.umodel
| graph-match (s:"apm@entity_set" {__entity_id__: 'entity_set::apm::apm.service'})
              -[e]-(d)
  project s, e, d

3. Application Scenarios of UModel Queries

UModel queries can address a wide range of practical challenges and provide robust support for data modeling, schema management, and knowledge graph analysis.

3.1 Schema Exploration and Discovery

Scenario Description

In large-scale observability systems, hundreds of EntitySet definitions may be distributed across multiple domains. Users need to quickly identify what entity types are defined in the system and understand their basic information.

Application Examples

Explore all entity types:

-- List all EntitySets with their basic information
.umodel
| where kind = 'entity_set'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    entity_domain = json_extract_scalar(metadata, '$.domain'),
    description = json_extract_scalar(metadata, '$.description.zh_cn')
| project entity_name, entity_domain, description
| sort entity_domain, entity_name
| limit 0, 100

View by domain:

-- View all entity definitions within a specific domain, such as APM
.umodel
| where kind = 'entity_set'
  and json_extract_scalar(metadata, '$.domain') = 'apm'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    description = json_extract_scalar(metadata, '$.short_description.zh_cn')
| project entity_name, description
| limit 0, 50

3.2 Data Modeling and Analysis

Scenario Description

During data modeling optimization, you need to analyze information about existing EntitySets, including field complexity, primary key design, and index configuration, to identify the models that require optimization.

Application Examples

Analyze field complexity:

-- Analyze the distribution of field counts across EntitySets by domain
.umodel
| where kind = 'entity_set'
| extend
    domain = json_extract_scalar(metadata, '$.domain'),
    entity_name = json_extract_scalar(metadata, '$.name'),
    field_count = json_array_length(json_extract(spec, '$.fields'))
| stats
    avg_fields = avg(field_count),
    max_fields = max(field_count),
    min_fields = min(field_count),
    entity_count = count()
  by domain
| sort entity_count desc

Identify complex entities:

-- Find EntitySets with the highest number of fields (potential candidates for optimization)
.umodel
| where kind = 'entity_set'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    domain = json_extract_scalar(metadata, '$.domain'),
    field_count = json_array_length(json_extract(spec, '$.fields'))
| sort field_count desc
| limit 20

3.3 Relationship Graph Analysis

Scenario Description

Mapping relationships between EntitySets are fundamental to building a complete knowledge graph. Graph queries enable the analysis of associations among entities, helping to uncover dependencies and connections within the data model.

Application Examples

Query all relationships of an entity:

-- Query all relationships of a specific EntitySet, such as apm.service
.umodel
| graph-match (s:"apm@entity_set" {__entity_id__: 'entity_set::apm::apm.service'})
              -[e]-(d)
  project s, e, d
| limit 0, 50

Analyze relationship type distribution:

-- Count the occurrences of each relationship type
.umodel
| where kind = 'entity_set_link'
| extend
    link_name = json_extract_scalar(metadata, '$.name'),
    link_type = json_extract_scalar(metadata, '$.link_type')
| stats limk_count = count() by link_type
| sort limk_count desc

Find specific relationships:

-- Find all relationship definitions of the runs_on type
.umodel
| where kind = 'entity_set_link'
  and json_extract_scalar(metadata, '$.link_type') = 'runs_on'
| extend
    link_name = json_extract_scalar(metadata, '$.name'),
    source = json_extract_scalar(metadata, '$.source'),
    target = json_extract_scalar(metadata, '$.target')
| project link_name, source, target

3.4 Metadata Quality Check

Scenario Description

Ensure the integrity and consistency of UModel metadata by identifying issues such as missing descriptions and undefined fields.

Application Examples

Check EntitySets with missing descriptions:

-- Find EntitySets without descriptions in Chinese
.umodel
| where kind = 'entity_set'
  and (json_extract_scalar(metadata, '$.description.zh_cn') = ''
       or json_extract_scalar(metadata, '$.description.zh_cn') is null)
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    domain = json_extract_scalar(metadata, '$.domain')
| project entity_name, domain

Verify the integrity of field definitions:

-- Identify EntitySets with no fields defined
.umodel
| where kind = 'entity_set'
  and (json_extract(spec, '$.fields') is null
       or json_array_length(json_extract(spec, '$.fields')) = 0)
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    domain = json_extract_scalar(metadata, '$.domain')
| project entity_name, domain

3.5 Cross-domain Association Analysis

Scenario Description

In complex observability systems, entities from different domains, such as APM, Kubernetes, and cloud resources, may have cross-domain relationships. UModel queries can be used to analyze these cross-domain association patterns and understand how domains are interconnected.

Application Examples

Find cross-domain relationships:

-- Identify EntitySetLinks that connect different domains
.umodel
| where kind = 'entity_set_link'
| extend
    link_name = json_extract_scalar(metadata, '$.name'),
    source_domain = json_extract_scalar(spec, '$.src.domain'),
    target_domain = json_extract_scalar(spec, '$.dest.domain')
| where source_domain != target_domain
| project link_name, source_domain, target_domain
| limit 0, 50

Analyze inter-domain connectivity:

-- Count the number of relationships between domains
.umodel
| where kind = 'entity_set_link'
| extend
    source_domain = json_extract_scalar(spec, '$.src.domain'),
    target_domain = json_extract_scalar(spec, '$.dest.domain')
| stats count = count() by source_domain, target_domain
| sort count desc

3.6 Version and Evolution Analysis

Scenario Description

UModel schemas evolve as business develops. You need to track schema versioning and historical changes.

Application Examples

View schema version information:

-- View the schema versions of all EntitySets
.umodel
| where kind = 'entity_set'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    schema_version = json_extract_scalar(schema, '$.version'),
    schema_url = json_extract_scalar(schema, '$.url')
| project entity_name, schema_version, schema_url
| limit 0, 100

3.7 Fast Locating and Retrieval

Scenario Description

Quickly locate specific EntitySets or relationship definitions within a large volume of metadata. Fuzzy match and term query are supported.

Application Examples

Fuzzy search by name:

-- Search for EntitySets with "service" in the name
.umodel
| where kind = 'entity_set'
  and json_extract_scalar(metadata, '$.name') like '%service%'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    domain = json_extract_scalar(metadata, '$.domain')
| project entity_name, domain
| limit 0, 20

Exact search for a specific entity:

-- Find the complete definition of a specific EntitySet exactly
.umodel
| where json_extract_scalar(metadata, '$.name') = 'apm.service'
| limit 1

4. Summary

UModel query, as a dedicated interface in EntityStore for querying knowledge graph metadata, provides robust support capabilities for observability data modeling. You can use UModel queries to implement the following features:

Schema structure exploration: allows you to quickly understand all defined entity and relationship types within the system.
Data model analysis: enables you to deeply examine field designs, primary key configurations, complexity, and other aspects of EntitySets.
Relationship graph construction: allows you to use graph queries to analyze associations between entities and comprehend the topological structure of the knowledge graph.
Quality check: allows you to verify the integrity and consistency of metadata.
Cross-domain analysis: allows you to investigate association patterns across different domains.
Fast retrieval: enables you to rapidly locate destination definitions within large volumes of metadata.

These capabilities make UModel Query an indispensable tool for data modeling analysis, schema management, and knowledge graph exploration, providing a solid foundation for building and maintaining high-quality observability data models.

Community

Building a Navigation Map for Data Assets: In-Depth Explanation of the Data Discovery and End-to-End Analysis Capabilities of UModel

1. Background Information

1.1 What is UModel?

Core Concepts

1.2 What is a UModel Query?

Query Differentiation

2. UModel Query

2.1 Data Model

Data Structure

Data Examples

2.2 Query Syntax

Basic Query Syntax

Core Query Patterns

1. List Queries - metadata enumeration

2. Graph Analysis - relationship exploration

2.3 Advanced Queries

JSON path extraction

Composite filtering with multiple conditions

Aggregate analysis

2.4 Performance Optimization Recommendations

Use Precise Filters

Pre-filtering

Graph Query Optimization

3. Application Scenarios of UModel Queries

3.1 Schema Exploration and Discovery

Scenario Description

Application Examples

3.2 Data Modeling and Analysis

Scenario Description

Application Examples

3.3 Relationship Graph Analysis

Scenario Description

Application Examples

3.4 Metadata Quality Check

Scenario Description

Application Examples

3.5 Cross-domain Association Analysis

Scenario Description

Application Examples

3.6 Version and Evolution Analysis

Scenario Description

Application Examples

3.7 Fast Locating and Retrieval

Scenario Description

Application Examples

4. Summary

Read previous post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

Quick BI

Cloud Migration Solution