Open Source Alluxio 1.5.0 Release Highlights

Adit Madan Andrew Audibert Bin Fan Jiri Simsa Jul 5th, 2017

Open source Alluxio 1.5.0 has been released with a large number of new features and improvements. Alluxio allows any application to access data from any storage system transparently and at memory speed. Interoperability with other technologies in the ecosystem is an important step for enabling this, and in the 1.5.0 release, we have improved the accessibility of Alluxio in several key ways.

  • Alluxio Docker Integration
  • Alluxio Golang Client
  • Alluxio on Ceph using S3A
  • Mount Specific Configuration Properties

Alluxio Docker Integration

Alluxio 1.5.0 adds documentation and scripts to make it easy to run Alluxio inside Docker containers. Alluxio configuration parameters can be passed using -e arguments, and logs are written to stdout so that they show up in the output of docker logs. The example below illustrates how to run dockerized Alluxio on top of HDFS.

cd alluxio-1.5.0/integration/docker
docker build -t alluxio .

docker run -d --net=host \
           -e ALLUXIO_UNDERFS_ADDRESS=hdfs://HdfsMaster:9000/ \
           alluxio master

docker run -d --net=host --shm-size=10GB \
           -e ALLUXIO_MASTER_HOSTNAME=AlluxioMaster \
           -e ALLUXIO_WORKER_MEMORY_SIZE=10GB \
           -e ALLUXIO_UNDERFS_ADDRESS=hdfs://HdfsMaster:9000/ \
           alluxio worker

See the docs for a step by step tutorial on running Dockerized Alluxio on an EC2 instance.

Alluxio Golang Client

Previously, Alluxio introduced a proxy process, which by default runs alongside every Alluxio master and worker and provides a REST API equivalent to Alluxio’s native file system API. In version 1.5.0, Alluxio introduces a Go client for interacting with Alluxio based on the REST API. This client is available in its own repository in order to facilitate its import through the “go get” mechanism.

Besides providing a mechanism for communicating with Alluxio from Go environments, the client implementation also serves as an example of how straightforward it is to create a language binding for Alluxio based on the REST API.

Note that communicating with Alluxio through the REST API requires extra network hops and / or memory copies and is therefore expected to be less performant than the native Java client. On the other hand, any improvements to the native Java client benefit all REST API based clients, meaning the Go client and any other client developed against the REST API will always have the latest features.

The example below illustrates how to interact with Alluxio using a Go program:

package main

import (
"fmt"
    "log"

    alluxio "github.com/Alluxio/alluxio-go"
    "github.com/Alluxio/alluxio-go/option"
)

func main() {
    fs := alluxio.NewClient(<proxy-host>, <proxy-port>, <timeout>)
    ok, err := fs.Exists(<path>, &option.Exists{})
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(“path %v exists: %v”, <path>, ok)
}

Alluxio on Ceph using S3A

In 1.5.0, Alluxio can connect to Ceph under storages using the S3A connector. The S3A connector provides significant functionality and performance improvements over the Swift connector.

As shown in the graph below, the S3A connector demonstrates up to 3x gains in read performance when reading one gigabyte files.

S3A vs Swift Connector

Mount Specific Configuration Properties

One major benefit of using Alluxio is to unify different under storage systems (e.g., S3, HDFS, GCS) into one Alluxio namespace, each under a separate mount point similar as how devices are mounted on local file systems. Since version 1.5.0, Alluxio supports setting (potentially different) configuration properties for each mount point, in addition to respecting the global configuration setting for this type of under storage system. After configuring and mounting different under storage systems, accessing these systems is completely transparent to Alluxio file system applications. As a result, Alluxio helps system admins hide complexity and improve the ease of managing storage.

To illustrate this feature by an example, a user Alice has multiple S3 buckets on AWS and she wants to access the data stored across different buckets. Previously, Alice could only mount into Alluxio S3 buckets which shared the same system wide authentication key, whereas now Alice can mount each bucket individually using separate authentication keys, like

$ bin/alluxio fs mount /mnt1 s3a://alice-bucket1/ --option aws.accessKeyId=<accessKey1> --option aws.secretKey=<secretKey1>
$ bin/alluxio fs mount /mnt2 s3a://alice-bucket2/ --option aws.accessKeyId=<accessKey2> --option aws.secretKey=<secretKey2>

After this, any authenticated Alluxio user can access /mnt1 and /mnt2 freely, without even noticing they are from two different buckets and accessed using different authentication keys. Thus Alice can share her Alluxio deployment with Bob to access her buckets without giving Bob any bucket permissions or distributing her keys to Bob.

And Many More!

This blog only highlighted a few of the new features and improvements in Alluxio 1.5.0. For a more comprehensive list, check out the release notes.

You can easily get started with Alluxio open source or community edition today by following the quick start guide.