Big data and open source go hand in hand

An introduction to big data from

Image by : 

Cory Doctorow. Modified by CC BY-SA 2.0.

Big data. It has certainly been a buzzword in recent years, but what is it really, and how are organizations leveraging open source tools to turn raw data into actionable insights?

At, a core piece of our mission is to keep you informed about trends and technologies where open source is making a difference. To help with that, we've created a new resource page which brings you up to speed with big data and some of the open source tools which businesses, governments, and organizations of all types are leveraging to make sense of huge quantities of bits and bytes.

If you've been wondering what big data is, how you can make use of it, and how it's changing the way we look at the world by bringing us information never before possible, we're here to help. In addition to bringing some sense to big data, we also look at:

  • How is open source making big data discoveries possible?
  • What is the MapReduce algorithm, and how does is make distributed computing possible?
  • What is Apache Hadoop, and how has it become the mainstay of many data scientists' processing needs?
  • What is Apache Spark, the new kid on the block, and how does it fit into the big picture of data processing?

We hope you'll check it out. If you find our resource helpful, please feel free to share it with your friends, family, and colleagues. And if you've got a big data question, let us know so we can continue to improve and build out this resource.


Big Data Queen

Big data is surely a big deal. When considering a big data strategy, I think it's worth mentioning HPCC Systems from LexisNexis. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems and can help companies derive actionable insights from their data.

HPCC Systems provides proven solutions to handle what are now called Big Data problems, and have been doing so for more than a decade. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. More info at

Vote up!
Vote down!