The Great Wall of Data: Chitika Visualizes 10+ TB with Minecraft

As Chitika displays hundreds of millions of ads every day, a tremendous amount of data need to be stored to fulfill business requirements. In our case, that figure amounts to roughly 1 to 1.2 TB per day. The Chitika Data Infrastructure and Engineering teams each have several Minecraft aficionados among them, and our recent side project visualizes roughly 10 terabytes of this data as large towers of 8-bit 3D-rendered blocks. We call it, aptly, the Great Wall of Data.

If you’ve had the fortune (or some would say misfortune) of watching the 1995 “cyberpunk thriller” Hackers, you probably remember the film’s representation of accessing data storage. For the uninitiated, this involved the characters terminals showing them flying around towers made of fluorescent code rendered in brilliant 1995 CGI animation.

The Chitika Data Infrastructure and Engineering teams each have several Minecraft aficionados among them, and our recent side project is like if you took that Hackers concept, actually designed it to be functional, and built the whole thing out of 8-bit 3D-rendered blocks. We call it, aptly, the Great Wall of Data.

A Background

Operating an ad network requires the ability to not just serve ads based on a series of conditions, but also track revenues and the results of given experiments run over the network on a daily basis. As we display hundreds of millions of ads every day, a tremendous amount of data need to be stored to fulfill these requirements. In our case, that figure amounts to roughly 1 to 1.2 TB per day. After compression, which gets about 80% reduction in used space, the daily total comes close to 200GB per day.

At this size and level of complexity, when something goes wrong, investigating exactly where and how some piece of data collection has gone awry can be tricky at the onset. So we thought of a better way to at least get an initial assessment – let’s visualize every piece of datum we catalogue so we know not just what went wrong, but where and when it did – in terms of server and data center location, business sector, time, and date.

OK, so it’s a little easier said than done, but when in doubt – Minecraft.

A Key:

  • Each block = one interval of data from one server – these load in six-minute chunks
  • Block color = a particular data center
  • Each column = one server of one business unit
  • Each row of blocks in a column = one six-minute interval of data across servers
  • Each row of obsidian (black stone) = an hour marker
  • Each row of columns = one day of data
  • The location of the column = a different business unit

We built the environment using a series of different tools and techniques:

After using the final version of the Great Wall for only a couple weeks it’s already proving its worth. We’ve noticed small missing or misplaced blocks of data on given columns that point to data issues we were able to quickly fix. Under normal circumstances these anomalies may have been left undetected until they actually caused a problem. Now, we can physically seek them out.

Plus, we can legitimately say Minecraft is integral to our duties at work. Win.

Take a Tour

We currently run the overviewer to update the web tiles every 6 minutes. This gives us a web based, near real-time view into the game world. Have a peek at:

http://greatwall.chitika.com/

Also, thanks to the plethora of mods available to bukkit-based servers, we can safely allow access to the actual game world. Feel free to come fly around our data.

Simply point your vanilla Minecraft 1.7.9 instance to:

greatwall.chitika.com:25585

Once in game, you can issue the /fly command to give yourselves creative mode flight.

Enjoy – and share your thoughts on the world by tweeting @Chitika!

Published
Categorized as Blog Tagged