Hadoop, an open-source big data technology, is just ten years old this year. In the first decade of big data, Hadoop has successfully made big data the most promising technology. This trend not only affects the trend of information technology, but also becomes a hot topic in business.
The reason for this is, on the one hand, with the popularity of Internet, cloud computing and smart mobile devices, the number of users of large Internet companies such as Google, Facebook and Twitter has shown explosive growth, in order to cope with the scale of global users, these well-known Internet technology companies have invested in big data technology one after another, making big data the indicator of top technology and instantly becoming the hot hot hot hot fried chicken.
On the other hand, these Internet companies not only adopt open-source big data technologies such as Hadoop, but also employ software experts to develop big data technologies that meet their own needs, and then open source the code of these big data software. In this way, it not only attracts more experts to join the development, but also gives back to the development community. This effect also makes the development of open-source big data technology in full swing, up to now, there are more than 100 open-source software related to Hadoop, forming a huge Hadoop ecosystem.
How will Big Data develop in the next second decade? In the middle of this year, I attended the Strata & Hadoop World technology conference, which is quite famous in the field of big data. Originally, I was looking forward to this technology conference named Hadoop, which mainly discussed topics, it should focus on the open-source big data technology in the Hadoop ecosystem; However, during the two-day conference keynote speech, whether it was international big factories such as Google and Microsoft, or Baidu, Alibaba, big Chinese Internet companies such as Ant Financial and Xiaomi are talking about AI topics such as Artificial Intelligence (Artificial Intelligence,AI), Machine Learning (Machine Learning,ML), and Deep Learning (Deep Learning,DL).
Even though there was still a large proportion in the afternoon agenda of those two days, topics such as real-time big data analysis and streaming computing were discussed, however, the main show of the conference-usually the keynote speech of technology companies to show their strength, is to talk about AI, ML, DL and other technical issues together.
In the eyes of these big data technology leaders, it is obvious that artificial intelligence, machine learning, and deep learning are the next steps of big data, and are also the battlegrounds for big data in the second decade.
However, after moving towards the second decade, big data is no longer important? In fact, these technologies are still important, but the development in the next decade will be AI-oriented big data. As for this part, we can observe it from the Bo Kelai AMPLab lab that gave birth to Spark and Mesos.
In the second half of the first golden decade of big data, Spark, which became popular with micro-batch streaming computing technology, led the trend. The birthplace of Spark-AMPLab, Bo Kelai University, but it will turn off at the end of 2016. AMPLab is hosted by two professors with experience in software entrepreneurship. During the current 6-year plan, it has promoted many open-source software research projects, the most famous of which is the distributed resource management system Mesos, streaming Computing platform Spark, distributed memory storage system Alluxio (formerly known as Tachyon), are in the leading position in some fields.
Since AMPLab has made such an important research contribution, why does it end? Due to the tradition of Bo Kelai laboratory, it has generally been put into research for 5 to 6 years to solve an important problem. Now AMPLab has completed its phased mission: to create open-source big data analysis technology. Next, they will use the newly established laboratory RISELab to solve the next new problem of big data.
What is the new problem? The clue can be seen from the laboratory name Real-time Intelligent Secure Execution. In the previous stage, AMPLab developed from big data batch data processing technology to big data analysis technology, while in the next stage, RISELab needs to overcome the problem of real-time data processing, develop the application of real-time decision-making. Their goal is to develop a new generation of big data technology that is 100 times faster than Spark in response speed and 1000 times faster in output, and combine online machine learning and more automated algorithms, make real-time decisions from real-time data without sacrificing efficiency while ensuring data encryption security and privacy.
No one knows whether the RISELab can achieve its goals in the next six years. However, it is certain that the next step of big data is to take AI-oriented new generation of big data.