×
Community Blog Deciphering Data to Uncover Hidden Insights – Data Modeling

Deciphering Data to Uncover Hidden Insights – Data Modeling

In this article series, we will be exploring data analytics for businesses using Alibaba Cloud QuickBI and sample data from banking and financial services.

By Ranjith Udayakumar, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

This multi-part article talks about how to collect data, wrangle the data, ingest the data, model the data, and visualize the data from three viewpoints (conceptual, practical, and best practice).

In the first article of this series, we have seen how to understand data conceptually through an example from the Banking, Financial services and Insurance (BFSI) domain. In the second article, we have learned how to wrangle the data (i.e. cleaning the data) according to your business scenario to Alibaba Cloud Quick BI.

In this article, I will share how to model the data according to your business scenario in Quick BI. We will be using Quick BI for this tutorial, so please ensure that you have registered for an Alibaba Cloud account. If you haven't, sign up for a free account through this link.

Model the Data (Conceptual)

A data model for one line of business is hardly appropriate for another line of business. Often times, the needs of different businesses, or even departments, do not closely align together. Because of this, understanding and analyzing how an organization should collect, update, and store data becomes a critical problem.

In the previous articles, we have seen how to collect, cleanse, and organize our data. However, we're still missing an essential part of big data analysis – data modeling. Data modeling has become a critical skill for the business analysts who are involved with discovering, analyzing, and specifying changes to how software systems create and maintain information. To better understand data modeling, let's quickly recap the concepts of OLTP and OLAP.

What Is OLTP?

OLTP stands for Online Transactional Processing and is designed to serve as a persistent state store for front-end applications. They can quickly look up transactional procedures like INSERT, UPDATE, or DELETE.

Some common tasks asked of OLTP systems include:

  1. Finding the age of the current user when given a user ID.
  2. Update the billing addresses for a set of clients.
  3. Delete the users who are not accessing the page more than 1 year.

These types of problems require a system that can look up and update one or more columns within one or many rows. The strength of OLTPs is that they support fast writes. A typical workload for OLTP is both frequent reads and writes, but the reads tend to be more of looking up a specific value rather than scanning all values to compute an aggregate.

Common OLTP systems are:

  1. MySQL
  2. PostgreSQL

Alibaba Cloud provides the ApsaraDB for RDS service for OLTP systems, including both MySQL and PostgreSQL databases.

What Is OLAP?

In contrast to an OLTP database, an OLAP database is designed to process large datasets quickly to answer questions about data.

Common use cases for an OLAP database are:

  1. What's the average cost spent on maintenance?
  2. What's the user's overall active hours?

An OLAP database is optimized for scanning and performing computations across many rows of data for one or multiple columns. Since OLAP is optimized for analyzing data, basic transactional procedures like writes or updates tend to be done in infrequent batches, typically once a day or an hour. OLAP shines when it comes to reads and analytical calculations like aggregation. Several well-known OLAP systems are:

  1. AnalyticDB
  2. Teradata

Alibaba Cloud provides OLAP services with HybridDB and DataWorks.

What is Data Modeling

The term data model refers to two very different but related concepts: a description of data structure and the way data are organized.

Types of Data Modeling

  1. Conceptual Data Models: High-level, static business structures and concepts.
  2. Logical Data Models (LDMs): Entity types, data attributes and relationships between entities.
  3. Physical Data Models (PDMs): The internal schema database design.

Data Modeling in Business Intelligence

Data modeling is the key to success in Business Intelligence (BI). It is paramount that the process is business-centered. Data modeling in BI is a way of mapping out all the different data sources and how these sources of data will fit together and flow into one another. We need to get the data from the places where a software or application stores the information into a centralized place (Data Warehouse) for querying and analyzing the data to derive business values.

What Is Data Warehousing

Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making.

Facts: A fact table consists of facts of a business process. A fact table record captures a measurement or a metric.

Dimensions: A dimension table contains the textual descriptor of the business. The fields of dimension table are designed to satisfy these two important requirements:

  1. Query constraining / grouping / filtering.
  2. Report labeling

For example, if we were to examine "Sales Revenue by Product", we would most probably see this structure: Sales Revenue (Fact) and Product (Dimension).

Schemas in Data Warehousing

Schema is a logical description of the entire database. It includes the name and description of records of all record types including all associated data-items and aggregates. Like a database, a data warehouse also requires maintaining a schema. A database uses relational model, while a data warehouse uses Star and Snowflake schema.

Star Schema: In Star Schema, the dimension is represented with only one-dimension table. Each dimension table is connected to only one Fact Table in star like structure.

1

Snowflake Schema: Unlike Star schema, the dimensions table in a snowflake schema are normalized. The normalization splits up the data into additional tables.

2

Model the Data (Practical)

Before entering the practical session of modelling the data, we need to know about the process flow of Quick BI.

The basic work flow of Quick BI is as follows:

  1. Create a data source.
  2. Create a dataset.
  3. Create a workbook, worksheet, or dashboard.
  4. Building a data portal.
  5. Subscribe for timely reports.

Note: Quick BI supports OLAP Data modelling, semantics such as dimensions (dates and locations), measures, star schema, and snowflake schema. Allows you to define calculated fields. The product allows you to edit the existing SQL script of the data source to customize your dimensions and measures.

UseCase-1: ATM Analytics

We will look at our first use case, ATM Dataset. We already ingested the data to the Quick BI. We are going to model the data and for that you need to save the data as dataset.

Procedure for creating data set from file:

  1. Select the Cube Icon
  2. Enter the name and specify the location
  3. Click Ok

    3

  4. Dataset is created

Modeling the data:

We are going to model the data by editing the dataset accordingly to meet the requirements of business scenario, such as joining the two tables or creating a column.

  1. Select the Dataset
  2. Click the Edit Icon

    4

  3. The dataset management page is displayed.

    5


Now we are going to model the data. I will explain about various things like changing the data type, creating a calculated column, and creating hierarchy.

Changing the Dimension or Measure Name:

  1. Click the Gear Icon
  2. Click Edit to Change the Name and to add notes

    6

    7

  3. Click Ok
  4. Name is Changed

Basic Functions:

  1. Click the Gear Icon

    8

  2. Duplicate Dimension or Measure (It will create a copy of the column)
  3. Hide in Analysis (It will hide the column in report interface)
  4. Cancel all hidden (It will remove all the hidden columns and make it available for reporting)
  5. Delete (It will delete the column)

Creating a Calculated Column (Dimension)

  1. Click the Gear Icon -> Create a Calculated Column or + icon

    9

  2. Search the function on the right-side pane for reference
  3. Name the Calculated column
  4. Write the expression (you can add column by simply double clicking it in the left side pane)
  5. Click ok

Joining the dataset:

Let's look at the ATM_Master Details

  1. Ingest the ATM Master Details data into Quick BI
  2. Click the Join Icon
  3. Select the Dataset Field From ATM
  4. Select the Other Dataset ATM_Master Details
  5. Select the Join on Field
  6. Click Ok

    10

    11


Changing the Dimension Type:
  1. Click the Gear Icon
  2. Select the Change Dimension Type
  3. Select Location
  4. Click the Country

    12

    13


Creating a Calculated Column (Measure):
  1. Click the Gear Icon -> Create a Calculated Column or + icon

    14

  2. Search the function on the right-side pane for reference
  3. Name the Calculated column
  4. Write the expression (you can add column by simply double clicking it in the left side pane)
  5. Click ok

    15


Creating the Hierarchy:
  1. Click the Gear Icon
  2. Select Create Level

    16

  3. Hierarchy Level is Created
  4. To add other column in hierarchy, Select the column Gear icon
  5. Click to Move to

    17

  6. Column is added to Hierarchy.

Changing the Aggregation Type:

  1. Click the Gear Icon
  2. Select Aggregations
  3. Select Any one from the list (e.g. Average)

    18


Changing the Number Format:
  1. Click the Gear Icon
  2. Select Number Format
  3. Select Any one from the list (e.g. #,###0)

    19


UseCase-2: Customer 360

We can apply the same concepts to our next use case, Customer360. The steps are similar, we just need to replace the ATM data to perform all the basic tasks for data modelling. With this understanding I think we can model the data to our business scenario.

Model the Data (Best Practices)


  1. Validate the data types.
  2. Create Calculated columns whenever necessary.
  3. Change the Locations as Location Type in QuickBI otherwise it will not detect it.
  4. Numbers should be in integer.
  5. Format string as per the business scenario.
  6. Format the date as per the business scenario.
  7. Star Schema is preferable than snowflake schema.
  8. Use custom SQL to model the data.

Conclusion

If you followed the steps correctly, you should have successfully modeled your data in Quick BI as per the demands of your businesses. In my opinion, this is the toughest part in the process of deciphering data to uncover hidden insights. We will explore how we can visualize our data and narrate data stories in future articles of this multi-part series.

0 0 0
Share on

Alibaba Clouder

2,605 posts | 747 followers

You may also like

Comments