BI and data mining on data platforms

Object Storage Service (OSS) can be used as a data store for data lakes. DLA builds an end-to-end big data platform based on OSS. This platform provides a wide range of features, such as data lake creation, extract, transform, and load (ETL), interactive search, and machine learning.

  • Data lake formation
    • Real-time data lake: Change data capture (CDC) and messages (such as Kafka messages) can be imported into a data lake to create a large number of datasets on which create, read, update, and delete (CRUD) operations can be performed within T+10 minutes.
    • One-click data warehousing within T+1 days: One-click data lake formation is implemented within T+1 days.
    • File upload: After data upload, the metadata discovery feature enables DLA to automatically discover metadata and establish the metadata system.
  • ETL: DLA provides the serverless Spark engine. This engine supports powerful ETL that uses data cleansing to convert raw data at the operational data store (ODS) layer into structured data warehouse (DW) data.
  • Machine learning: DLA provides the serverless Spark engine. This engine supports open source algorithm libraries.
  • Interactive analysis: DLA provides the serverless Presto engine for interactive analysis. This engine supports BI and facilitates data analysis for analysts.

Federated query

  • Federated query: The serverless Presto engine of DLA can access a dozen of data sources to query data.
  • Lightweight data cleansing: The serverless Presto engine of DLA supports lightweight ETL to write OSS data into databases.