Cloud Data Warehouse is a new-generation data warehouse solution built on the cloud. How to choose a cloud data warehouse that meets the needs of the enterprises and issues to be considered when selecting a cloud data warehouse are concerns of many enterprise managers. This article introduces the basis for selection of cloud data warehouse by referring to research reports from TDWI and Forrester, to help you in the selection of cloud data warehouse.
The cloud data warehouse solution has changed the traditional methods of data platform construction. You can create and start using a data warehouse service within a few minutes without the guidance from platform technical experts. Enterprise data analysts and other non-technical personnel are allowed to access and process large-scale data to quickly gain business insight. Enterprises can focus on business issues at a lower cost without worrying too much about complicated platform technologies. In addition, modern cloud data warehouse services can meet more analysis requirements, such as ETL for massive data, interactive query, machine learning, and unstructured data processing. More and more enterprises are considering using the cloud data warehouse to build their own data analysis platforms.
Forrester, an authoritative market research institution, defines a cloud data warehouse as a secure and scalable self-service data warehouse that is available on demand. The solution accelerates the data analysis process through automated deployment, management, optimization, backup, and recovery, and minimizes the requirements for technical support.
Therefore, how to choose a proper cloud data warehouse and key factors to be considered are concerns of many decision makers. This document shares some best practices of using the cloud data warehouse in combination with TDWI's research report.
In practice, data warehouses gradually have more and more functions. These functions include fixed reports for managers, interactive exploration and analysis for analysts, and predictive analysis for data scientists. Different applications have different requirements for the system in terms of data access methods, processing computing models, and algorithm support.
An effective strategy is to make the data warehouse as a whole system and continuously support mixed workloads, rather than meeting specific service needs. For example, for periodic reports, data needs to be cleaned and converted, star/snowflake models are used to create a data set for report tools. Interactive query must support parallel processing of massive of data to enable low-latency data exploration. Predictive analysis must support different development languages and algorithm models, and be able to cope with iterative computing of massive data.
Cloud-based data warehouses meet such requirements. With the flexibility of services provided by cloud computing, users can focus more on analysis and results, rather than building systems.
In addition, projects deployed on the cloud usually require flexibility and agility. For example, self-help analysis in a short period of time, or even building a prototype analysis system to quickly verify the service concept. For such projects, the use of a cloud-based data warehouse can provide special benefits, because you do not have to design, develop, and deploy platforms and data management frameworks. In addition, the solution can reduce startup costs, accelerate analysis, and reduce or even eliminate maintenance costs.
Before the investment on the data warehouse gains benefits, we must consider the costs. However, during the lifecycle of the data warehouse system, most practitioners in the data industry are not aware of the components of their total costs , which may include:
Different organizations have different tolerance for different types of costs. For mature services, organizations may be willing to invest in infrastructure and can predict that the benefits are higher than the startup costs. Small-size or startup enterprises may not have enough budget for regular costs and want to earn profits in a short period of time.
In this case, a cost model is needed to determine when it is necessary and valuable to use the cloud data warehouse. In some cases, an agile cloud data warehouse solution can shorten the time-to-market for your services and bring business revenue earlier. The increased revenue may exceed or offset system investment.
Cloud-based data warehouses greatly simplify the deployment. First, service vendors have prepared the infrastructure and software in advance, so that users do not have to worry about the complex underlying technical work. Second, users will benefit from the supporting tools provided by the service provider to support the whole data processing process, including data access, analysis, conversion, loading, reporting, and query. These tools and demos can simplify data development. Third, cloud data warehouse vendors provide value-added services by integrating rich functions, such as data management, visualization tools, and predictive analysis.
After eliminating tasks at the underlying infrastructure level, users can focus on data analysis. The standard data development and deployment processes include at least the following tasks:
Fortunately, the cloud data warehouse service providers support these requirements. For example, they provide data integration tools for data access, use ETL tools or ETL data processing and conversion in the data warehouse, and use job scheduling management tools to orchestrate and periodically schedule data processing logic. Therefore, deploying cloud-based BI/Analytics projects using standard processes greatly improves the flexibility of the processing and the accessibility of analysis results.
Traditional business intelligence analysis is mature, but some cloud data warehouse vendors are rapidly integrating advanced analysis functions, including but not limited to:
In the past, you may need a separate advanced analysis and computing platform for these functions. However, now these functions are supported in the new data warehouse. For example:
Therefore, you need to find a cloud data warehouse service that supports richer computing functions to meet the current data analysis needs. In addition, service providers can constantly innovate designs to meet the needs of different users.
One of the risks for hosting an application is that the provider relies on deploying the application in a virtualized environment. This may reduce the overall operating costs of customers. However, applications may be redeployed on different infrastructure at any time and may coexist with other applications. The execution of these applications may affect the performance.
For most organizations, quick analysis and results cannot be achieved by data users, which will affect the wide adoption of data services and project success. If your organization requires predictable performance, specify the performance requirements and acceptable levels, and evaluate the vendor's methods to guarantee or improve performance. You should come up with the following questions:
Confirm with the service vendor that your performance requirements can be met.
If you are considering cloud-based BI and analysis, ensure that you can easily move the data for analysis in the cloud environment. Note the complexity of integrating various types of data sources. These types include flat file data, data in relational databases accessed through SQL, data managed in the new NoSQL environment, geospatial data, and multi-source heterogeneous data such as HDFS files on Hadoo.
To actively manage data source access and data integration, consider the following factors:
In the face of increasing data from different sources, users need more complex and efficient data integration solutions. When selecting a cloud data warehouse, you should find a data integration service with data check and discovery, compression, transmission, data preparation, and efficient data loading.
Another risk of using hosted or cloud-based data warehouses is data security. There are risks in ensuring access security and data protection for two reasons. First, in some cases, the multi-tenant architecture allows multiple customer applications to run in the same environment, leading to the risk of data leakage across application boundaries. Second, the storage on the virtual platform can be distributed across multiple physical machines, which may make users worry about whether the application can capture remaining data during migration.
Obviously, your enterprise must assess security and data privacy protection needs, and make sure that vendors can meet these needs. Cloud-based data warehouse vendors may provide the following methods:
The suggestions listed above help you determine whether the cloud data warehouse is suitable for your organization. Once you decide to apply the cloud for the data warehouse and BI applications, make sure to select a suitable service vendor. In short, some standards described here for evaluating cloud data warehouse services mainly focus on how cloud data warehouse products and services can help improve your BI and analysis projects, including:
Reduce the overall cost of development and operation
Once you have determined the vendor, we recommend that you establish a good cooperation with the trusted cloud data warehouse vendor. This is very important for three reasons:
Cloud data warehouse vendors can organize their implementation experience to align with customers' short-, medium-, and long-term strategies.
Alibaba Clouder - December 13, 2019
Alibaba Cloud MaxCompute - January 18, 2019
MiSand - February 27, 2020
Alibaba Cloud MaxCompute - September 18, 2019
kehuai - May 15, 2020
Alibaba Developer - December 17, 2018
Deploy custom Alibaba Cloud solutions for business-critical scenarios with Quick Start templates.Learn More
Block-level data storage attached to ECS instances to achieve high performance, low latency, and high reliabilityLearn More
ApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.Learn More
TSDB is a stable, reliable, and cost-effective online high-performance time series database service.Learn More
More Posts by Alibaba Cloud MaxCompute