Archive for May, 2010
Deploying and configuring Hadoop and HBase across clusters is a complex task. In this article I will show what we do to make it easier, and share the deployment recipes that we use.
Before going into how we do things, here is the list of tools that we are using, and which I will mention in this article. I will try to put a link next to any tool-specific term, but you can always refer to its specific home-page for further reference.
- Hudson – this is a great CI server, and we are using it to build Hadoop, HBase, Zookeeper and more
- The Hudson Promoted Builds Plug-in – allows defining operations that run after the build has finished, manually or automatically
- Puppet – configuration management tool We don’t have a dedicated operations team to hand off a list of instructions on how we want our machines to look like. The operations team helping us just makes sure the servers are in the rack, networked and powered up, but once we have a set of IPs (usually from IPMI cards) we’re good to go ourselves. We are our own devops team, and as such we try to automate as much as possible, where possible, and using the tools above helps a lot.