Data Lake Analytics (DLA) Ganos is designed and developed based on the cloud-native DLA system. It is a data engine used for the storage and computing of spatio-temporal big data. DLA Ganos can access a variety of Alibaba Cloud storage systems, such as PolarDB, Lindorm (HBase Enhance Edition), and Object Storage Service (OSS) by using the serverless DLA service and built-in Spark computing engine. DLA Ganos integrates the management and computing of multi-source heterogeneous data by using unified spatio-temporal data models and APIs. DLA Ganos also supports complex operations such as association analysis of heterogeneous data sources. The serverless architecture of DLA allows you to use DLA Ganos on demand so that you are only charged for the queries you performed. This enables convenient resource scaling, achieves upgrades without service interruption, and significantly reduces operational costs. For more information about DLA, see What is Data Lake Analytics?.
In typical scenarios, DLA Ganos performs Extract, Transform, and Load (ETL) operations to transfer and collaboratively analyze the spatio-temporal data that is stored in different databases or file systems. DLA Ganos loads GeoTiff files from OSS to generate a Resilient Distributed Dataset (RDD) model. Then, DLA Ganos writes data to a storage system such as Lindorm (HBase Enhance Edition) to store the data. DLA Ganos can also load spatio-temporal data from multiple data sources at the same time for data cleansing and conversion. After the data is analyzed and computed by using machine learning algorithms or tools, DLA Ganos writes the analysis results to a data source. Then, a professional spatio-temporal data publishing system such as GeoServer publishes the computing results as a standard Open Geospatial Consortium (OGC) service for you to query and view. The following figure shows this process.
DLA Ganos adopts a serverless architecture of DLA, so you can use it without bearing the infrastructure and management costs. You do not need to maintain Spark instances separately. You only need to apply for virtual clusters and use them as required and pay on demand. DLA Ganos starts immediately after it is activated and is upgraded without service interruption. It also supports elastic scaling to ensure the quality of services.
- Database-level experience
DLA Ganos has developed a series of APIs based on Spark SQL to analyze spatio-temporal data by using a large number of built-in spatio-temporal operators, which are user-defined functions (UDFs). This allows you to process large amounts of spatio-temporal data by using SQL statements in a similar manner to operations on relational databases.
- Unified modeling of spatio-temporal data
DLA Ganos has developed a unified spatio-temporal data model based on Spark RDD to facilitate the modeling of various spatio-temporal data. DLA Ganos can implement complex operations such as data loading and model conversion. You only need to focus on the business logic.
- Heterogeneous data sources
DLA Ganos can access multiple data sources and analyze data from various heterogeneous data sources. It allows you to analyze data stored in Alibaba Cloud OSS, PolarDB, and Lindorm (HBase Enhanced Edition) and performs association analysis on the data of DLA and other data sources.
Activate DLA Ganos
- Create a virtual cluster. For more information, see Virtual cluster management.
- On the Virtual Cluster Management page, find the created virtual cluster and click Details in the Actions column.
- In the Cluster Attributes section of the cluster details page, select spark_dla_ganos from the Version drop-down list to activate DLA Ganos.