Apache Spark

Apache Spark is an open source cluster computing framework that is frequently used in big data processing.

How to process real-time data with Apache tools

Open source is leading the way with a rich canvas of projects for processing real-time events.

Person standing in front of a giant computer screen with numbers, data

How to analyze log data with Python and Apache Spark

Case study with NASA logs to show how Spark can be leveraged for analyzing data at scale.

metrics and data shown on a computer screen

How to wrangle log data with Python and Apache Spark

Case study with NASA logs to show how Spark can be leveraged for analyzing data at scale.

20 innovative Apache projects

As the Apache Software Foundation turns 20, let's celebrate by recognizing 20 influential and up-and-coming Apache projects.

An introduction to data processing with Cassandra and Spark

This article is co-authored by Jon Haddad. There's been a huge surge of interest around the Apache Cassandra database due to the increasing uptime and performance demands of…

A guide to Apache's Spark Streaming

Apache Spark is an open source cluster computing framework. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s in-memory primitives provide performance…

Subscribe to Apache Spark