Getting Started

For those that are new to Alluxio, this guide is a good place to start. For additional installation methods, visit our documentation on installing Alluxio with Alluxio Manager.

Introduction

We will install Alluxio locally via the Alluxio Manager, which will also be hosted locally. Once we have installed Alluxio and the Alluxio manager, we will run through some basic cluster operations.

  1. Verify Prerequisites.
  2. Install Alluxio Manager.
  3. Install Alluxio locally.
  4. Perform basic tasks via Alluxio Shell.
  5. Mount a public Amazon S3 bucket in Alluxio.
  6. Accelerate data access.
  7. Stop Alluxio.

Prerequisites

Alluxio components have specific requirements which you must meet before proceeding.

Install Alluxio Manager

The easiest way to get started is through Alluxio Manager, a web app that enables you to manage Alluxio clusters. It offers a convenient, user-friendly way of deploying Alluxio across specified nodes without having to install manually, update existing automation scripts/recipes, or rely on any 3rd party tools.

Alluxio Manager doesn’t replace any of the functionality provided by your IaaS or private cloud management console. It does not create, launch, or shutdown compute instances. It’s concerned instead with installing, starting, and stopping the Alluxio components running on those instances.

  1. Download Alluxio Manager for your operating system. You’ll need to log in or create an account if you don’t already have one.
  2. For Linux/OS X, make the downloaded binary executable; no modification is necessary on Windows.
    $ chmod 755 ./alluxio-manager
    
  3. You should have received an email with a license file at the time of download. Place it in the same directory as the Alluxio manager binary.
  4. Execute the provided binary from a terminal window to open the manager in your default browser.
    $ ./alluxio-manager -license-file=alluxio-manager-license.json
    
  5. Log in with the default user credentials: admin / admin.

alluxio-manager-login.png

Install Alluxio Locally

Alluxio creates distributed filesystem across one or more machines which consitute your Alluxio cluster. For this introduction, we’ll install Alluxio locally, on the same machine hosting the manager. The Alluxio components will all be installed on your one machine, and the filesystem will be ‘distributed’ across local storage only.

  1. Create a cluster. Select ‘cluster’ from the main menu, then click ‘+ cluster’.
  2. Initial configuration. Choose a name for your cluster followed by ‘local’ for the cluster type.
  3. Host configuration. Alluxio Manager will attempt to ssh to the specified hostnames before advancing to the next step. Enter your username, and to keep things simple, use password as the authentication method.
  4. Alluxio configuration. Select ‘community’ for the edition and keep the defaults for the remaining sections. Note that an ‘alluxio’ directory will be made under your home directory.
  5. Host check. This step makes sure that all hosts meet the prerequisites needed for successful installation.
  6. Agent installation. If everything goes well, the Alluxio agent will be installed and running.
  7. Alluxio installation. If everything goes well, the Alluxio services will be installed and running.
  8. Next steps. A success message will be displayed indicating that Alluxio has been installed across your cluster and all services are running. If you receive an error message, see troubleshooting Alluxio manager. To verify at the terminal: ps aux | grep -v grep | grep alluxio.

Using the Alluxio Shell

Now that Alluxio is running, we can examine the Alluxio filesystem from the command line with the Alluxio shell. In this section we’ll cover basic file system operations including how to copy files into Alluxio and persist them to under storage.

  1. Change directory to the Alluxio install directory.
    $ cd ~/alluxio
    
  2. You can invoke the Alluxio shell with the following command, which will list all of the available command-line operations.
    $ ./bin/alluxio fs
    
  3. Let’s list all the files in Alluxio with ls.
    $ ./bin/alluxio fs ls /
    
  4. Unfortunately, we don’t have any files in Alluxio. We can solve that by copying a file into Alluxio using copyFromLocal.
    $ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties.template /alluxio-site.properties.template
    Copied conf/alluxio-site.properties.template to /alluxio-site.properties.template
    
  5. After copying the license file, we should be able to see it in Alluxio. List the files in Alluxio again with ls. The output shows the file that exists in Alluxio, as well as some other useful information, like the size of the file, the date it was created, and the in-memory status of the file.
    $ ./bin/alluxio fs ls /
    -rw-r--r--     ubuntu         ubuntu         1.2KB  10-11-2016 15:21:03:764  In Memory      /alluxio-site.properties.template
    
  6. You can also view the contents of the file using the cat command.
    $ ./bin/alluxio fs cat /alluxio-site.properties.template
    ...
    
  7. With the default configuration, Alluxio uses the local file system as its UnderFileSystem (UFS). The default path for the UFS is ./under-storage. We can see what’s in the UFS as follows:
    $ ls ./under-storage/
    
  8. The directory doesn’t exist! By default, Alluxio will write data only into Alluxio space, not to the UFS. We can tell Alluxio to persist the file from Alluxio space to the UFS using the shell command persist.
    $ ./bin/alluxio fs persist /alluxio-site.properties.template
    persisted file /alluxio-site.properties.template with size 1193
    
  9. Now, if we examine the UFS again, the file should appear.
    $ ls ./under-storage
    alluxio-site.properties.template
    

Exploring the Web UI

Alluxio has a user-friendly web interface enabling users to watch and manage the system. The master and workers all serve their own web UI. The default port for the web interface is 19999 for the master and 30000 for the workers.

If we browse the Alluxio file system in the master’s web UI we can see the license file we copied earlier, as well as other useful information. Notice the ‘persistence state’ column shows the file is persisted.

Mount a Storage System

Alluxio unifies access to different storage systems with the unified namespace feature, which enables users to mount different storage systems into the Alluxio namespace and access the files across those systems seamlessly.

  1. Create a directory in Alluxio to store your mount points.
    $ ./bin/alluxio fs mkdir /mnt
    Successfully created directory /mnt
    
  2. Mount an existing sample S3 bucket to Alluxio. We have provided a sample S3 bucket for you to use in this guide.
    $ ./bin/alluxio fs mount -readonly alluxio://localhost:19998/mnt/s3 s3a://alluxio-quick-start/data
    Mounted s3a://alluxio-quick-start/data at alluxio://localhost:19998/mnt/s3
    
  3. Now the S3 bucket is mounted into the Alluxio namespace. We can list the files from S3, through the Alluxio namespace using the familiar ls shell command.
    $ ./bin/alluxio fs ls /mnt/s3
    -r--------     <owner>           <group>           87.86KB   10-11-2016 15:26:29:902  Not In Memory  /mnt/s3/sample_tweets_100k.csv
    -r--------     <owner>           <group>           933.21KB  10-11-2016 15:26:30:143  Not In Memory  /mnt/s3/sample_tweets_1m.csv
    -r--------     <owner>           <group>           149.77MB  10-11-2016 15:26:30:377  Not In Memory
    
  4. With Alluxio’s unified namespace, you can interact with data from different storage systems seamlessly. For example, with the ls shell command, you can recursively list all the files that exist under a directory. The following output shows all the files under the root of the Alluxio file system, from all of the mounted storage systems. The alluxio-site.properties.template file is in your local file system, while the files under /mnt/s3/ are in S3.
    $ ./bin/alluxio fs ls -R /
    -rw-r--r--     ubuntu         ubuntu         1.2KB   10-11-2016 15:21:03:764  In Memory      /alluxio-site.properties.template
    drwxr-xr-x     ubuntu         ubuntu         1.00B     10-11-2016 15:25:56:913  Directory      /mnt
    dr-x------     <owner>        <group>        4.00B     10-11-2016 15:26:18:536  Directory      /mnt/s3
    -r--------     <owner>        <group>        87.86KB   10-11-2016 15:26:29:902  Not In Memory  /mnt/s3/sample_tweets_100k.csv
    -r--------     <owner>        <group>        933.21KB  10-11-2016 15:26:30:143  Not In Memory  /mnt/s3/sample_tweets_1m.csv
    -r--------     <owner>        <group>        149.77MB  10-11-2016 15:26:30:377  Not In Memory
    
  5. You can see the newly mounted files and directories in the Alluxio web UI as well.

Accelerating Data Access

Alluxio leverages memory to accelerate data access. This exercise is designed so you can experience this acceleration first hand.

First, let’s take a look at the status of a file in Alluxio, mounted from S3.

$ ./bin/alluxio fs ls /mnt/s3/sample_tweets_150m.csv
-r--------     <owner>           <group>           149.77MB  10-11-2016 15:26:30:377  Not In Memory  /mnt/s3/sample_tweets_150m.csv

The output shows that the file is not in memory. This file is a sample of tweets. Let’s see how many tweets mention the word ‘kitten’.

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c kitten
889

real	0m22.857s
user	0m7.557s
sys	0m1.181s

Now, let’s see how many tweets mention the word ‘puppy’.

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c puppy
1553

real	0m25.998s
user	0m6.828s
sys	0m1.048s

As you can see, it takes a lot of time to access the data for each command. Alluxio can accelerate access to this data by using memory to store the data. However, the cat shell command does not cache data in Alluxio memory. There is a separate shell command, load, which tells Alluxio to store the data in memory.

$ ./bin/alluxio fs load /mnt/s3/sample_tweets_150m.csv

After loading the file, check the status with the ls command. The output shows that the file is now in memory. Now that the file is memory, reading the file should be much faster now.

$ ./bin/alluxio fs ls /mnt/s3/sample_tweets_150m.csv
-r--------     <owner>           <group>           149.77MB  10-11-2016 15:26:30:377  In Memory      /mnt/s3/sample_tweets_150m.csv

Let’s again count the number of tweets with the word ‘puppy’.

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c puppy
1553

real	0m1.917s
user	0m2.306s
sys	0m0.243s

As you can see, reading the file was very fast, only a few seconds! And, since the data is in Alluxio memory, you can easily read the file again just as quickly. Let’s observe this by counting how many tweets mention the word ‘bunny’.

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c bunny
907

real	0m1.983s
user	0m2.362s
sys	0m0.240s

Stop Your Cluster

Alluxio can be stopped and started at the cluster level. Stopping means that all Alluxio services on all nodes, in this case your local computer, will be stopped. All data will remain available after the cluster is restart so long as none of the nodes in the cluster were rebooted in the meantime.

  1. From the dropdown in the top navigation bar, select a cluster.
  2. From the more menu on the overview tab, select ‘stop’.

Next Steps

Congratulations on successfully installing Alluxio on your local computer using Alluxio Manager and performing some basic operations!

There are several next steps available. You can learn more about the various key features of Alluxio. You can also deploy Alluxio on a cluster, transparently mount storage systems with the Alluxio unified namespace, or configure your applications to work with the Alluxio file system API.

Need Help?

Contact Support