“Data IDE” is a platform product launched by Alibaba Cloud in the field of big data. It provides one-stop big data development, data permission management, offline job scheduling, and many other features. It is dependent on the massive data computing engine MaxCompute (originally ODPS, independently developed by Alibaba Cloud) in the underlayer and provides features that are applicable to multiple scenarios, including offline processing, analysis, cloud data warehouse building, and big data mining. It is offered in an ‘out-of-the-box’ manner and you don’t need to worry much about the cost and complexity of the underlying cluster establishment and O&M.
Data IDE Kit introduces a brand new workflow job design philosophy and has the following features compared with the previous version:
Drag-and-drop workflow interface
The system’s Data Development module provides abundant visual components, including SQL (ODPS SQL), data synchronization, MR (ODPSMR), machine learning, shell, and other job types. Compared with open-source workflow drag-and-drop operations, it provides a more convenient and flexible experience and interaction.
Personalized data favorites and management
The system data management module provides personalized data favorites and management. You can easily add data tables of interest to favorites, manage the lifecycle, basic information and owner of a data table, and view the storage information, partition information, output information and kinship information of the data table.
One-click job publishing across projects
Quick migration and publishing of jobs between different projects are provided under the same primary account. We provide a dual-environment model for customers simulating the ‘development’ and ‘production’ environments and more offline and online production models.
Visual job monitoring
The O&M Center provides a visual job monitoring and management tool and supports displaying the overall job running conditions in a DAG. Exception management is also more convenient. Operations such as “rerun”, “restore”, “suspend”, and “stop” are supported.
The data development usually goes through the following processes under normal conditions:
The figure above shows that the overall data development process includes data generation, data collection and storage, data analysis and computing, data extraction, and data presentation and sharing. All the data development processes framed by dotted lines can be completed on the Alibaba Cloud Data IDE. Descriptions can be found below:
A business system will generate a large amount of structured data every day. The data is stored in the database of the business system, including MySQL, Oracle, and RDS.
Data collection and storage
You need to first synchronize the data of different business systems to MaxCompute (originally ODPS) before leveraging the massive data storage and processing capabilities of MaxCompute (originally ODPS) for analyzing the existing data. The Alibaba Cloud Data IDE platform provides the data synchronization service to synchronize various types of data sources in the business system with MaxCompute (originally ODPS) according to the predefined scheduling period.
Data analysis and processing
Following the above step, you can start the processing (ODPS SQL and ODPS MR), analysis and mining (data analysis and data mining) of data on MaxCompute (originally ODPS) to discover the value of the data.
The result data after analysis and processing should be synchronously exported to the business systems so that the business personnel can utilize the value of the data.
Data presentation and sharing
The results of big data analysis and processing are presented and shared using reports and geographic information systems.