Logging Data Part 3: Getting to the Show

We’ve briefly mentioned our implementation of Infiniband in both of the previous Logging Data posts without giving a thorough explanation of its function and capabilities within our architecture. In this latest installment, we’ll be doing just that, along with discussing our corresponding Hadoop framework.

We’ve briefly mentioned our implementation of Infiniband in both of the previous Logging Data posts without giving a thorough explanation of its function and capabilities within our architecture. In this latest installment, we’ll be doing just that, along with discussing our corresponding Hadoop framework.

For those who may not be familiar, Infiniband is a network interconnect technology, however, it differs from 1 GB/s or 10 GB/s ethernet in that it can operate at speeds up to 56 GB/s.

In our case, we deploy the technology to interconnect our separate Gluster and Hadoop clusters to give us a hefty 40 GB/s data transfer rate; known as QDR in the Infiniband world. This provides a tremendous boost to the rate at which we can access our data.

Using Infiniband, Gluster sustained disk burn speeds in excess of 18 GB/s in aggregate read tests following the implementation. This burn test was performed via IOzone in cluster mode using 48 client nodes. That kind of speed is a huge boon for subsequent data access by our data scientists for tracking and testing purposes.

Additionally, Infiniband allows Gluster to utilize RDMA (Remote Direct Memory Access) for data transfer, albeit in an experimental way. RDMA is a key technology driving Infiniband’s incredibly low latency. A comparable technology is TCP/IP, but while that earlier suite is a three-copy process, RDMA is a single-copy process – a big step up. For those not wanting to abandon TCP/IP, fear not, Infiniband supports TCP/IP. This support is known as IPoIB. In fact, this support so stable, we use IPoIB for all our Gluster communications.

In order to realize many benefits of Infiniband within our Hadoop infrastructure, we had to custom compile the Infiniband software stack; OFED, for use on our Hadoop nodes. The reason? Debian 7.0 “Wheezy” only has up to OFED 1.4.2 in the stable repositories while OFED 3.5 was the latest release during our deployment of Gluster. As such, Hadoop nodes had to be upgraded to Wheezy, with the 3.2 kernel being the first to include Infiniband modules, and the OFED 3.5 release no longer building them for us.

Now that we’ve covered the vast majority of the data transfer aspects, let’s move to our Hadoop architecture. At a high level:

  • We run Hadoop (version 1.1.2; aka MapReduce 1)

    • MapReduce framework

    • 36 dedicated nodes

      • 24 CPU cores each

      • 64 GB RAM each

These dedicated nodes, combined with Infiniband connectivity, allow for every node to access every piece of data within Gluster at any time. Our specific implementation removes arbitrary Hadoop node “hot spots” and provides for independent scaling of data warehouse and compute resources.

Getting Hadoop talking with Gluster required heavy modification to the Gluster shim RedHat provides. On top of modifying file metadata retrieval, we added:

  • Proper access control

  • Proper user ownership of resultant job files via inheritance

By modifying the methods of metadata retrieval used by the original Gluster shim, we essentially end up lying to Hadoop. Since we know what our Gluster cluster looks like, we can tell Hadoop explicitly instead of having the shim constantly pulling this information from disk for every file Hadoop wants to use. This change drastically sped up our Hadoop jobs when compared to the original shim.

Given the heavy modifications we performed, one might ask what the benefits of Gluster in a Hadoop setting are; they’re substantial:

  • We are able to shut down the namenode and datanode daemons that are required by HDFS

  • Any new Hadoop compute nodes we add are immediately useful to the cluster as they can immediately see all data in Gluster

    • This is compared to HDFS, which requires data rebalancing, or copying temporary input data from another compute node, before a new node will perform any job tasks

Overall, this kind of architecture serves our needs very well, although like any respectable operations team, we’re always working on improvements, such as:

  • Doubling our Gluster deployment

  • Building new tools that interface with Gluster

  • Adapting old tools to the quirks of Gluster

    • Namely, avoiding stat() calls whenever possible as these are slow within the context of Gluster

Shown again for some context, here’s a view of our entire analytical data pipeline:

The next installment of this series will focus on some of the tools we’ve created that streamline data access for our data solutions team:

Logging Data Part 4: New Tools for the Trade

Stay tuned!

Published
Categorized as Blog Tagged