Super detailed strategy! Databricks data insight-enterprise-level fully managed Spark big data analysis platform and case analysis-Alibaba Cloud Developer Community

Open-Source Big Data Community & Alibaba Cloud EMR series live broadcast Phase 4

subject: Databricks data insight-enterprise-level fully managed Spark big data analysis platform and case analysis lecturer: brown Ze, technical expert of Alibaba Cloud, head of the open platform-ecosystem enterprise team of computing platform Division content Framework:

  • Databricks data insight
  • features
  • typical scenario
  • customer Case
  • Demo

live playback: scanning article bottom QR code join DingTalk Group watch a playback

1. Introduction to Databricks data insight

1. Databricks Company Profile 2. What is Alibaba Cloud Databricks data insight product

01\ Databricks Company Profile

(1) ApacheSpark founding company, the largest code contributor of Spark, and the commercial company behind the Spark technology ecosystem.

In 2013, it was founded by the founder of AMPLab's founding team ApacheSpark at the University of California, Berkeley.

(2) core products and technologies, leading and promoting the Spark open source ecosystem

apacheSpark, DeltaLake, Koalas, MLFlow, and OneLakehousePlatform

③ company positioning
market position

02\ Databricks company valuation and financing history

(Source Databricks official website)
① G round in October 2019, valued at $6.2 Billion
② round F in early February 2021, valued at $28 Billion
  • in this round of financing, three cloud service providers, AWS, GCP, MSAzure, and Salesforce, all followed the investment, which is enough to show that cloud vendors attach importance to the development of Databricks.
  • Listing expectation: planned IPO in 2021-multi-party prediction when the Databricks is listed, its valuation may reach 35 billion US dollars, or even as high as 50 billion US dollars

03\ Databricks high-quality Spark big data analysis platform created by Alibaba Cloud

  • the commercial company behind the Apache Spark, the founding team of Spark, and the American technology unicorn
  • it has more than 5,000 customers and 450 partners worldwide, and has strong brand awareness.
  • In 2020, in the data science and machine learning (DSML) platform Magic Quadrant report released by Gartner, it was located in the Leader quadrant.

04\ Databricks + Alibaba Cloud = Databricks data insight

core products:
Product engines and services:
  • 100% compatible with open-source Spark and optimized performance through joint research and development by Alibaba Cloud and Databricks
  • provide commercial SLA guarantee and 7*24 hours Databricks expert support services

core components of DDI product capabilities

key product information and advantages

2. Features of DDI

1. Overall architecture 2. Engine capability 3. Performance 4. Function 5. Cost

01 \Alibaba Cloud Databricks data insight (DDI) architecture

02 \engine: enterprise-level performance optimization to improve computing engine efficiency and data read/write efficiency

enterprise-level high performance, stability, and reliability

03 \enterprise Databricks Runtime vs Community Edition Open Source Spark

04 \computing and storage separation architecture, HDFS vs OSS cost comparison

05 \accelerate OSS access based on JindoFS to optimize data access performance

06 \interactive analysis Notebook, gathering data

optimized Apache Zeppelin

07 \data development job submission and workflow scheduling

  • supports jar package submission and job scheduling.
  • Supports Spark, Spark Streaming, and Notebook.
  • Mixed Scheduling of workflows of different job types
  • supports scheduling, O & M, audit logs, and version control.

08 \rich data source support

09 \metadata management

three metadata selection methods

3. Typical scenarios

1. Customer pain points and how to solve DDI 2. Lambda architecture to batch and stream architecture 3. Evolution of Lakehouse architecture 4. Combination of DDI products in Alibaba Cloud

01 \common pain points of open-source big data platform customers

02\ Databricks data insight helps customers improve production efficiency in four major scenarios

03\ Delta Lake project background and problems to be solved

04 \big data development enters the era of Lake House

05 \use DDI to build a batch-stream integrated data warehouse to simplify the complex architecture

06\ DDI combination in Alibaba Cloud products

07\ Databricks typical architecture of Data Insight

deep integration of DDI with Alibaba Cloud products (typical scenario)

data acquisition
Data ETL
BI Report data analysis and interactive analysis
  • supports Ad hoc query, Notebook visual analysis, and seamless integration with multiple BI analysis tools.
AI data exploration
Connect upstream and downstream networks

4. Customer case introduction in typical scenarios

1. Case of self-built cloud migration of Jizhi technology (STEPONE) 2. Case of data analysis of industrial manufacturing head Company

customer case 01: Migration of STEPONE to the cloud by Jizhi technology (Databricks)

this architecture describes how to use Databricks data insight to solve big data computing problems:

Customer cost-benefit analysis

  • fully managed Spark clusters are free of O & M, saving labor costs (1 O & M +1 big data, and performance tuning).
  • Compared with self-built machines, the resources are three times more. In addition, compared with open-source spark (estimated to be three times), the overall performance of Databricks Runtime is improved by nine times.
  • Notebook interactive analysis and DAG workflow scheduling to improve data development and analysis experience
  • unified technical solutions, separation of computing and storage, OSS storage saves customer storage costs, and paves the way for future data Lake and multi-computing architectures
  • Delta Lake solves the problem of updating customer incremental data.

Customer Case 02: industrial manufacturing head air conditioning company-big data analysis solution architecture

  • data collection/storage: receives real-time streaming data and bulk data from external cloud storage.
  • Data ETL: continuously and efficiently processes incremental data, supports data rollback and deletion, and provides ACID transaction guarantee.
  • BI data analysis and interactive analysis: supports query, visual analysis of Notebook, and seamless integration with multiple BI analysis tools.
  • Data Science: supports machine learning and deep learning
  • offline: for example, the upstream is connected to Kafka, OSS, and EMR HDFS, and the downstream is connected to Elasticsearch, RDS, and OSS.

For more exciting content, welcome to scan the group QR code at DingTalk bottom of the article, and join the group to watch live playback!

For more information about Databricks data insight, you can log on to the following link or click to read the full-text product details page:

https://www.aliyun.com/product/bigdata/spark (currently, RMB 599 is available for the first purchase and trial. Welcome to try it out!)

alibaba's open-source big data technology team was established Apache Spark China technology community. It regularly pushes wonderful cases and broadcasts by technical experts to create a pure Spark atmosphere. Welcome to the public account!

Scan the QR code below to enter the Databricks data insight product communication DingTalk the group to participate in the communication and discussion. Join the group and watch the live playback directly in the group!

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now