Alluxio is a cloud-oriented open source data orchestration technology for data analytics and AI. It bridges the gap between data-driven applications and storage systems and brings data from the storage tier closer to data-driven applications. This makes data access easier and enables applications to connect to numerous storage systems over a common interface.

Background information

In the big data ecosystem, Alluxio lies between data-driven applications or frameworks, such as Apache Spark, Presto, TensorFlow, Apache Hive, or Apache Flink, and various persistent storage systems, such as Hadoop Distributed File System (HDFS) or Alibaba Cloud Object Storage Service (OSS). Alluxio provides a unified client API and a global namespace to enable upper-layer computing applications or frameworks to access data stored in the persistent storage systems.

Benefits

  • Provides memory-speed I/O throughput and reduces the costs of data-driven applications that support auto scaling.
  • Simplifies access to data in cloud storage and object storage systems.
  • Simplifies data management. Alluxio provides a single point of access to multiple data sources.
  • Supports easy application deployment.

For more information about Alluxio, see Alluxio.