Why we’re using HBase: Part 2

The first part of this article is about our success with the technologies we have chosen. Here are some more arguments (by no means exhaustive :P) about why we think HBase is the best fit for our team. We are trying to explain our train of thought, so other people can at least ask the questions that we did, even if they don’t reach to the same conclusion.

How we work

We usually develop against trunk code (for both Hadoop and HBase) using a mirror of the Apache Git repositories. We don’t confine ourselves to released versions only, because we implement fixes, and there are always new features we need or want to evaluate. We test a large variety of conditions and find a variety of problems – from HBase or HDFS corruption to data loss etc. Usually we report them, fix them and move on. Our latest headache from working with unreleased versions was HDFS-909 that causes the corruption of the NameNode “edits” file by losing a byte. We were comfortable enough with the system to manually fix the “edits” binary file in a hex editor so we could bring the cluster back online quickly, and then track the actual cause by analyzing the code. It wasn’t a critical situation per se, but this kind of “training” and deep familiarity with the code gives us a certain level of trust regarding our abilities to handle real situations.

It’s great to see that it gets harder and harder to find critical bugs these days, however, we still brutalize our clusters and take all precautions when it comes to data integrity1. Read the rest of this entry »

Why we’re using HBase: Part 1

Our team builds infrastructure services for many clients across Adobe. We have services ranging from commenting and tagging to structured data storage and processing. We need to make sure that data is safe and always available; the services have to work fast regardless of the data volume.

This article is about how we got started using HBase and where we are now. More in depth reasoning can be found in the second part of the article

Lucky shot

If one would have asked me a couple of days ago why or how we chose HBase, I would have answered in a blink that it was about reliability, performance, costs, etc.(a bit brainwashed after answering “correctly” and “objectively” too many times). However, as the subject has become rather popular lately1, I reflected deeper about “how” and “why”.

The truth is that, in the beginning, we were attracted to working with bleeding edge technology and it was fun. It was a projection of the success we were hoping to have that motivated us. We all knew stories about Google File System, Bigtable, GMail and what made them possible. I guess we wanted a piece of that, and Hadoop and HBase were one logical step to reach that. Read the rest of this entry »

