Accelerating Data Analytics on Ceph Object Storage with Alluxio

Understand the benefits Alluxio brings to analytics on object storage.
        - Derive timely insights from data with memory-speed access
        - Enable data sharing between applications without sacrificing performance
        - Reduce costs with efficient memory utilization.

Accelerating On-Demand Data Analytics with Alluxio

This whitepaper consists of two portions. The first is a high level overview of the advantages of using Alluxio as a core technology with on-demand clusters. The second portion is intended for engineers; it provides a detailed step-by-step guide to deploying an on-demand cluster with Alluxio and instructions for running a sample workload on the cluster. At the end of the paper you will have a good understanding of how to deploy this architecture and the value Alluxio brings to the stack.

Case Studies

Case Study
Accelerate Access to Remote Storage

As the largest Chinese language Internet search provider, Baidu is very experienced with stressing their production data serving systems. In this case study, Shaoshan Liu -- senior architect at Baidu -- shares his experiences with Alluxio in production, and how the technology has led to dramatic performance gains. With Alluxio, batch queries are transformed into interactive queries. This enables Baidu to discover insights interactively leading to increases in productivity by 10 fold and improvements in customer experience.

Case Study
Share Data Across Jobs at Memory Speed

Barclays Data Scientist Gianmario Spacagna and Harry Powell, Head of Advanced Analytics, describe how they iteratively process raw data directly from the central data warehouse into Spark and how Alluxio is their key enabling technology.

Case Study
Manage Data Across Different Storage Systems

At Qunar, we have been running Alluxio in production for over 9 months, resulting in 15x speedup on average, and 300x speedup at peak service times. In addition, Alluxio’s unified namespace enables different applications and frameworks to easily interact with our data from different storage systems.

Case Study
Scalable Genomics Data Processing Pipeline with Alluxio, Mesos, and Minio

By leveraging Alluxio, Mesos, Minio, and Spark we have created an end-to-end data processing solution that is performant, scalable, and cost optimal. We use Alluxio as the unified storage layer to connect disparate storage systems and bring memory performance, with Minio mounted as the under store to Alluxio to keep cold (infrequently accessed) data and to sync data to AWS S3. Apache Spark serves as the compute engine.