Using Spark DataFrames for large scale data science

Spark's new DataFrame API is inspired by data frames in R and Python (Pandas), but designed from the ground up to support modern big data and data...Read more

Why OpenStack is different from other open source projects

How does OpenStack differ from other large, popular open source projects and how do these differences affect the way the project is growing and...Read more

The three open source projects that transformed Hadoop

Initially, Hadoop implementation required skilled teams of engineers and data scientists, making Hadoop too costly and cumbersome for many...Read more

World record set for 100 TB sort by open source and public cloud team

How Databricks set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-bytes, in 23 minutes with open source software Apache...Read more

Top 10 open source projects of 2014

Top 10 open source projects of 2014 with lightbulb
Annual list of top 10 open source projects covered on Opensource.com in 2014. From cloud computing to containers to project management, this year's...Read more

Why change is hard for any open source community

From the Apache Quill series: A lightning talk recap about how the Apache Foundation has always done things a certain way at ApacheCon Budapest by...Read more

Thank you for a record year (and looking ahead to 2015)

2014 has been a record year for Opensource.com. And, we couldn't have done it without you: our readers, writers, community moderators , editors,...Read more

An introduction to Apache Hadoop for big data

Introduction to Apache Hadoop, an open source software framework for storage and large scale processing of data-sets on clusters of commodity...Read more