What is Hadoop?

Posted on September 7, 2014


I’ve been doing a bit of reading lately and noticed a lot of mentions regarding Hadoop, which is an Apache project. I was interested in learning what this was all about, especially since it seems to be gaining a lot of momentum lately.

According to Apache:

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.


The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

So it’s basically an open source (100%) project that works to process large data sets on clusters of computers, to keep it simple.

The project was originally created in 2005 by Mike Cafarella and Doug Cutting, who named the project Hadoop after his sons toy elephant. Pretty clever! They actually created the project due to the explosion of data across the web, knowing that the current technology could not handle this amount of data. Since the project is free and open source, this makes it pretty popular for companies looking to store and process large amounts of data, and this also means there will be an explosion of new Hadoop-related jobs in the near future!

It’s probably not a bad idea to start learning Hadoop if you’re looking for a new career 😉

Hadoop consists of the following:

  • Hadoop Common
  • Hadoop MapReduce
  • Hadoop Distributed File System (HDFS)
  • Hadoop YARN

While I can’t say that I will be using Hadoop anytime soon, I was really curious as to why it was causing such a big commotion on Twitter recently. Seems like everyone I follow is talking about it.

If you’re interested in checking out Hadoop, you can download it at the official site here.

Learn more about clusters here.

Posted in: Uncategorized