×
Community Blog Hot Data vs. Cold Data: Why It Matters?

Hot Data vs. Cold Data: Why It Matters?

In this article, we will introduce the concepts of hot and cold data, and will discuss the importance of dealing with them appropriately when it comes to storage and migration.

By Afzaal Ahmad Zeeshan, Alibaba Cloud Community Blog author and Alibaba Cloud MVP.

Thankfully, those days are long gone when you had to excessively spend your dollars to store your data on the data server even when you do not use it that frequently or so to say anytime sooner. With the massive technical revolution and improvement almost in every field, the methods of storing, accessing and transferring information have changed seamlessly ¨C so has the ways of collecting, storing and protecting data. While moving along in this article first I will talk about the basic concepts of multi-temperature (is that what we call it from now on?) data storage and then we will gravitate the discussion towards Alibaba Cloud Object Storage Service (OSS) and how it provides different classification to cater the requirements of hot data and cold data without incurring excessive cost, because it's imperative to an organization's bottom line that it should not pay more than what it consumes.

Also, one major thing that I sometimes wonder is, that the terms of "hot" and "cold" are misunderstood in several scenarios. To better understand the purpose of the term, "hot" think of it as a viral or trending subject. Everybody wants to visit and to revisit it again, and again.

And yes, that brings us to the term, "cold", well that is basically just something that is not hot!

Classification of Data ¨C Hot vs. Warm vs. Cold

To begin with, let's first talk how data and its storage mechanism have been categorized based on different interest or access priorities, metaphorically. The data which is accessed most frequently, is thus stored in the nearest or closest spots from the accessing points such as solid state or flash drives and CPU can be called hot data. Whereas, the data which comparatively is less accessible or required is termed as warm data, and the data which has most likely very rare or no chances to be accessed and placed on the slowest storage medium is termed as cold data.

The real-world examples are

  1. Product catalogue, online transactions and video streaming.
  2. Weekly, monthly or yearly reports and newsletters, stats charts etc.
  3. Old projects and receipts for financial audits or anything else which is of some value but will not be accessed frequently.

These are the examples of hot, warm and cold data respectively. However, probably for the generalization, warm data type is the somehow combined either with hot or cold data depending upon its inclination and usage. I will show how this is not the case on Alibaba Cloud, as Alibaba Cloud provides 3 tiers for data storage and can help us further expand the concept of hot vs warm vs cold data.

Therefore, depending upon the cold and hot nature of data, the major cloud storage vendors have tailored their storage plans; Alibaba Cloud Object Storage Service (OSS). Other cloud vendors such as Microsoft Azure, AWS and Google Cloud Platform have their own services and their respective titles to support these features.

Per say, hot storage will require premium storage and cloud resources due to its nature to be resource-intensive and highly in demand in nature. Moreover, business-oriented organizations require no data delay in their users' queries, and for this, data needs to be in the hottest tier using solid-state-drives for the performant transactional rate.

Similarly, since cloud storage also provides options of cheap (old, magnetic) hard disk drives that have a huge savings on the costs but with a performance penalty. Customers can purchase this service in order to have their data backed up and stored. In my previous article, I discussed backups and storage mediums for database backup. If you remember, we did try to look at the Alibaba Cloud OSS as a platform of data storage. If we would like to have our data backups kept for a couple of weeks, then maybe the coldest storage class might not be a good option-see the "Gotcha!" section at the bottom of this post.

Alibaba Cloud Object Storage Service

Almost every new software application requires data to be written and read from. Every small to large-scaled business application require data, per say, to render user information or making a critical financial transaction. If we remove the category of the hot vs. cold data, even then the data structure has some roles that are played:

  1. Blob (unstructured) data
  2. JSON documents
  3. Multimedia files

As the application becomes stable, the data requirements grow rapidly; complex views and queries, data size, backup strategies, scalability, consistency and latency. Over time, these expanding requirements have challenged cloud storage providers to take over, as teams want to focus on the core business growth rather than taking care of managing data and its issues, such as replication, high-availability, caching and pricing.

Likewise, out of couple of data services provided by Alibaba Cloud, OSS (Object Storage Service) is one of the highly recommended data storage service which offers you a wide range of features to deal with your data; free data upload, reliable storage, region-based backups, migration and cost-effective migration. The best part? You can always configure how your data is stored and how you are charged for that.

1

This is my home page for the Alibaba Cloud OSS, you can see there are 3 buckets in this service. I am using one of them to store the data, that doesn't have to be access too frequently. Then I have another bucket, which contains most of the content that I have hosted on the cloud. Finally, I am having a static website that was developed using React and is being hosted on Alibaba Cloud OSS, as a static website.

All three of these buckets have their own different configurations, and the pricing models as well as their replication, or high-availability settings are different, that are chosen for their own use cases.

2

This is an example of how you can create a bucket in Hot region, this bucket would be created with resources that provide highest performance when accessed from. Similarly, during the creation of the buckets, you can mark the bucket as Archive storage bucket, and it would be configured to manage such.

3

Alibaba Cloud OSS also provides you with the basic explanation of the bucket storage class that you are choosing. This can help you in making the decision for the storage classes.

OSS Classification Catering Cold and Hot Data Scenarios

OSS introduces mainly three types of data classification. These types are specifically designed to facilitate the varying requirements of cold and hot data:

Standard

Standard object storage service promises to provide almost zero delay, high reliability with almost 99.999999999999% (twelve 9s) data availability. Standard storage plan is highly performant data storage plan which is recommended to use for frequent data access; for the scenario which cover hot data handling such as online banking applications, online picture and video editing or sharing platforms etc. Possibly more importantly, you should consider the top tier key features of OSS Standard Storage and oversee if your hot data requirements demand something identical.

  • Standard storage is highly recommended for storing huge data with frequent data access requirement; social networking platforms, media processing and big data reporting and analytics.
  • Various data redundancy mechanisms are supported by standard package. To maximize the availability and overall productivity, it stores data on multiple devices even in the same or different regions.
  • Moreover, OSS also provides zone-redundant storage mechanism to provide multiple copies of data within the same region across different zones to guarantee maximum availability of data in case of any hardware failure or crash in one zone.
  • There is no minimum requirement for data objects as per which charges will paid, also, there is no minimum storage period until which you are bound to store data there.

Infrequent Access

As name implies, it is designed to fulfil the accessibility requirements for comparatively less frequently accessed data such as enduring data records and other types of long-lived data, therefore can be considered to meet the specification of warm data storage, partially. Infrequent Access OSS plan is cheaper than standard one. It provides real-time data access, but there will be data retrieval charges as per the unit size of data object. Also, in IA storage plan, the minimum billable data object size is 64KB meaning that you will have to pay for 64KB data size even if object's actual size is less than it, which is not the case in standard plan.

Archive

To fulfil the requirements of cold data which is highly likely not to be accessed over a couple of days or weeks, OSS comes up with Archive storage. In benchmarks, some delay can be expected for data to be restored before being available to be read. There are a few conditions to store data in archive storage:

  • Storage period for data objects as per Archive package is two months at minimum.
  • The minimum standard billable size of archived data objects is 64KBs, which means, even smaller data objects will be charged as same as for 64Kbs.
  • Available support for highly-efficient Image processing service ¨C IMG, the one minute of restore time will be required for that in the start.

Storage Migration from Cold to Hot and reverse

I had mentioned that you cannot change the storage class of a bucket from one type to another once the bucket has been created-that is true and applicable. Logical data conversion among all three storage classes as per the dynamic requirements of data (hot to cold or contrary) is a potent feature of OSS. This transition is part of OSS lifecycle management. Currently, it supports following conversions automatically:

From standard data plan to infrequent access or achieve plan. This is the most happening conversion though; it is most likely that data which requires to be most frequently accessible over a period and would need move in cold storage until the next year, per say any such event or so.

From infrequent access to archive. For a slightly different situation than above, for such a requirement that data accessibility changes from less frequent access to no access for a quite long period. Just to note that, after the object type conversion, pricing calculation happens as per the converted object type.

Check out the normal flow of this conversion here.

Cold Storage, Gotcha!

Cold storage classes often come with some minimum storage days requirement. On the documentation page, you can find the minimum number of days required before you can delete an object. For Infrequent Access (IA) class, the minimum number of days is 30 days, and for Archive class the minimum number of days is 60 days. If you delete the objects before these days, you will be charged some fees.

One more thing to note here is that there will be no caching for your data in cold regions. So, each request would have to query the object and return it as it was accessed for the first time-a huge performance loss here.

Finally, you cannot change the storage class once the bucket has been created. Fear not, you can always copy the data from one bucket to another-but remember, there might be an early delete penalty.

0 0 0
Share on

afzaalvirgoboy

9 posts | 1 followers

You may also like

Comments

afzaalvirgoboy

9 posts | 1 followers

Related Products