Distributed applications running in production generate log data from dozens or hundreds of sources, including application containers, load balancers, database instances, network appliances, and managed cloud services. The engineering challenge is not log collection alone. It is the design of a centralised system that ingests at a sustained throughput, indexes for sub-second query response, retains data at a predictable cost, and integrates with downstream analytics and alerting without requiring bespoke pipeline code. Alibaba Cloud Log Service is structured around this principle, exposing collection, storage, query, and integration capabilities through a unified API and console.
Three ingestion paths cover most needs.
Logtail is the managed collection agent. It runs on virtual machines, container hosts, and Kubernetes nodes, parsing log files locally and forwarding records over HTTPS to the regional ingestion endpoint on port 443. The parsing side is where it earns its keep. Logtail recognises single-line text, regex with named capture groups, JSON, delimited formats (CSV and TSV), syslog, and the common access log shapes out of the box, so there's no pre-processing layer to maintain on the host. Collection rules and parser definitions live in the console and are pushed down to agents centrally, which sidesteps the usual headache of drifting config files across a fleet. JSON records get an extra benefit: every top-level key becomes a queryable field on the resulting log entry, indexable and addressable from SQL with no further transformation.
For application-internal events that never hit disk, the producer SDKs Java, Python, Go, .NET, and Node.js call the PutLogs HTTP API directly. They batch locally, retry with exponential backoff, and dispatch asynchronously, keeping the application thread off the network path. The REST API stays open for lightweight cases where pulling in an SDK isn't warranted; requests are signed with HMAC-SHA256 using AccessKey credentials.
Data is organised through three nested constructs. A Project is a region-bound logical container that defines a RAM permission boundary; cross-region replication is not supported, and region choice should be governed by data residency requirements and ingestion-source proximity. A Logstore is a schema-flexible container within a Project, housing log records of a related kind, typically one Logstore per service or per log family. A Shard is the unit of read and write throughput within a Logstore, supporting 5 MiB/s ingestion and 10 MiB/s read capacity per shard.
Partition keys determine which shard a record routes to. An MD5 hash of the partition key value is computed at ingestion, and the record is written to the shard whose hash range contains the resulting value. Selecting a partition key with high cardinality and even distribution of service instance identifier, trace identifier, or tenant identifier prevents hot-shard conditions that cap effective throughput well below the aggregate Logstore limit. Where no natural partition key exists, omitting it causes records to be distributed in a round-robin manner across all available shards.
Shards can be split to increase capacity or merged to reduce idle cost. Splits divide a hash range into two contiguous sub-ranges; existing data remains in the parent shard, which transitions to read-only, while new writes are accepted by the two child shards. The operation is non-blocking and typically completes within seconds.
Log Service exposes two index types, applied per field within a Logstore. Full-text indexing extracts tokens from log bodies for keyword search using configurable delimiters; field indexing tags individual structured fields with their data type text, long, double, or JSON to support typed predicates and aggregations. Index configuration is mutable post hoc, but rebuilding for historical data requires a re-indexing operation scoped to the desired time range and incurs proportional cost.
Indexed query latency on a single Logstore is generally under one second for time ranges up to 24 hours and scales with the time range and result set size beyond that. Query syntax supports boolean operators, range predicates, wildcard matching with leading-character restrictions, and field-scoped clauses. For analytical workloads, a SQL-92 subset operates over indexed data, including standard aggregation functions, GROUP BY, ORDER BY, and inner JOIN with a single Logstore on the right side. Queries take a two-stage form: a search clause filters the indexed dataset, and a pipe operator hands the result to a SQL stage for aggregation. Placing high-selectivity predicates in the search clause before the pipe reduces the dataset scanned by the SQL stage and directly improves query latency on large Logstores.
Log Service integrates with several downstream targets without an intermediary pipeline. LogShipper delivers records on a scheduled cadence to Object Storage Service in Parquet, JSON, or CSV format, partitioned by time, for long-term cold storage and ad-hoc query through external table mechanisms. A parallel shipping configuration targets MaxCompute for unified batch analytics across logs and business data.
For real-time downstream consumption, the Consumer Library implements a checkpoint-aware consumer group model. Each consumer in a group is assigned a disjoint subset of shards; checkpoint position has persisted server-side per consumer group and per shard, allowing horizontal scaling of consumers up to the shard count and automatic rebalancing on consumer addition or removal. Function Compute can be triggered directly on log arrival for low-throughput event-driven processing, alert routing, lightweight transformation, or webhook dispatch without operating a continuously running consumer.
Disclaimer: The views expressed herein are for reference only and don’t necessarily represent the official views of Alibaba Cloud.
Data Development and Governance with Alibaba Cloud DataWorks
PolarDB Architecture: Compute-Storage Separation on Alibaba Cloud
99 posts | 2 followers
FollowPM - C2C_Yuan - May 2, 2026
PM - C2C_Yuan - May 18, 2026
Alibaba Cloud Community - December 22, 2023
Alibaba Cloud Community - November 10, 2023
Alibaba Cloud Community - December 18, 2023
JawnLim - April 24, 2023
99 posts | 2 followers
Follow
Alibaba Cloud Flow
An enterprise-level continuous delivery tool.
Learn More
Simple Log Service
An all-in-one service for log-type data
Learn More
DevOps Solution
Accelerate software development and delivery by integrating DevOps with the cloud
Learn More
Log Management for AIOps Solution
Log into an artificial intelligence for IT operations (AIOps) environment with an intelligent, all-in-one, and out-of-the-box log management solution
Learn MoreMore Posts by PM - C2C_Yuan