In this week's edition of our open source news roundup, we take a look at helping earthquake victims in Nepal, OpenMRS and the fight against Ebola, Apple's Siri to leverage Apache Mesos, and more. Open source news roundup: April 25 - May 1, 2015
Apache Spark is an open source cluster computing framework. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s in-memory primitives provide performance up to 100 times faster for certain applications.
Chris Mattmann is a frequent speaker at ApacheCon North America and has a wealth of experience in software design and the construction of large-scale data-intensive systems. His work has infected a broad set of communities, ranging from helping NASA unlock data from its next generation of earth... Read more
ApacheCon is coming up, and within that massive conference there will be a glimmering gem: a forum dedicated to Spark. The Spark Forum will have speakers from the Hive project, the Pig project, and the Sqoop project. Plus, two talks about Spark Streaming—one will be introductory, and the other... Read more
University of Southern California postdoctoral fellow and NASA/JPL researcher Annie Bryant Burgess explains how her PhD is related to her involvement in open source, and tells us what Apache Tika has to do with studying polar data.
Spark's new DataFrame API is inspired by data frames in R and Python (Pandas), but designed from the ground up to support modern big data and data science applications.
How does OpenStack differ from other large, popular open source projects and how do these differences affect the way the project is growing and maturing?
Initially, Hadoop implementation required skilled teams of engineers and data scientists, making Hadoop too costly and cumbersome for many organizations. Now, thanks to a number of open source projects, big data analytics with Hadoop has become much more affordable and mainstream. Here's a look at... Read more
The Opensource.com Weekly Top 5: the best and brightest burning star articles from this week: January 26 - 30
How Databricks set a new world record for sorting 100 terabytes (TB) of data, or 1 trillion 100-bytes, in 23 minutes with open source software Apache Spark and public cloud infrastructure EC2.