20 innovative Apache projects

20 innovative Apache projects

As the Apache Software Foundation turns 20, let's celebrate by recognizing 20 influential and up-and-coming Apache projects.

Image by : 

opensource.com

x

Get the newsletter

Join the 85,000 open source advocates who receive our giveaway alerts and article roundups.

As the world's largest and one of the most influential open source foundations, the Apache Software Foundation (ASF) is home to more than 350 community-led projects and initiatives. The ASF's 731 individual members and more than 7,000 committers are global, diverse, and community-driven.

The ASF was founded on March 26, 1999, and to celebrate its 20th anniversary, applaud its all-volunteer community for their Herculean efforts, and thank the billions of users who make the projects under the ASF umbrella successful, we've assembled the following list of 20 ubiquitous or up-and-coming Apache projects.

1. Apache HTTP Server: Web/servers

Apache HTTP Server, the most popular open source HTTP server on the planet, shot to fame just 13 months after its inception in 1995. It remains prevalent today because it provides a secure, efficient, and extensible server that delivers HTTP services, according to the latest HTTP standards, for modern operating systems, including Unix, Microsoft Windows, and MacOS,

The Apache HTTP Server played a key role in the early growth of the World Wide Web; its rapid adoption over all other web servers combined was also instrumental in the wide proliferation of e-commerce sites and solutions. The Apache HTTP Server project was the ASF's flagship project at its launch, and its open, community-driven, meritocratic development process, known as the "Apache Way," has been emulated by all subsequent Apache projects.

2. Apache Incubator: Innovation

Apache Incubator is the ASF's nexus for innovation, serving as the entry path for projects and codebases hoping to become part of the ASF's official efforts. All code donations from external organizations and existing projects go through the incubation process to ensure they comply with the ASF's legal standards and develop diverse communities that adhere to the ASF's guiding principles.

Incubation is required of newly accepted projects until their infrastructure, communications, and decision-making process have stabilized in a manner consistent with other successful ASF projects. While incubation is neither a reflection of the completeness or stability of the code nor an indication whether the project has been fully endorsed by the ASF, its rigorous process of mentoring projects and their communities according to the Apache Way has graduated nearly 200 projects in the Incubator's 16-year history. Today 51 "podlings" are undergoing development in the Apache Incubator across an array of categories, including annotation, artificial intelligence, big data, cryptography, data science/storage/visualization, development environments, edge computing, Internet of Things (IoT), email, JavaEE, libraries, machine learning, and serverless computing.

3. Apache Kafka: Big data

The Apache footprint as the foundation of the big data ecosystem continues to grow with 50 active projects, from Accumulo to Hadoop to ZooKeeper, and two dozen more in the Apache Incubator. Apache Kafka's highly performant, distributed, fault-tolerant, real-time publish-subscribe messaging platform powers big data solutions at Airbnb, LinkedIn, MailChimp, Netflix, the New York Times, Oracle, PayPal, Pinterest, Spotify, Twitter, Uber, Wikimedia Foundation, and countless other businesses.

4. Apache Maven: Build management

Spinning out of the Apache Turbine servlet framework project in 2004, Apache Maven has risen to the top as the hugely popular build automation tool that helps Java developers build and release software. Stable, flexible, and feature-rich, Maven streamlines continuous builds, integration, testing, and delivery processes with an impressive central repository and robust plugin ecosystem, making it the go-to choice for developers who want to easily manage a project's build, reporting, and documentation.

5. Apache CloudStack: Cloud

Super-quick to deploy, well-documented, with an easy production environment, one of Apache CloudStack's biggest draws is that it "just works." Powering some of the industry's most visible clouds—from global hosting providers to telcos to the Fortune 100's top 5% and more—the CloudStack community is cohesive, agile, and focused, leveraging 11 years of cloud success to enable users to rapidly and affordably build fully featured clouds.

6. Apache cTAKES: Content

Developed in real-world use at the Mayo Clinic in 2006, cTAKES was created by a team of physicians, computer scientists, and software engineers seeking a natural language processing system for extracting information from electronic medical records' clinical free-text. Today, Apache cTAKES is an integral part of the Mayo Clinic's electronic medical records and has processed more than 80 million clinical notes. Apache cTAKES is a growing standard for clinical data management infrastructure across hospitals and academic institutions including Boston Children's Hospital, Cincinnati Children's Hospital, Massachusetts Institute of Technology, University of Colorado Boulder, University of Pittsburgh, and University of California San Diego, and companies such as Wired Informatics.

7. Apache Ignite: Data management

Apache Ignite is used for transactional, analytical, and streaming workloads at petabyte scale for the likes of American Airlines, ING, Yahoo Japan, and countless others on-premises, on cloud platforms, or in hybrid environments. Apache Ignite's in-memory data fabric provides an in-memory data grid, compute grid, streaming, and acceleration solutions across the Apache big data system ecosystem, including Apache Cassandra, Apache Hadoop, Apache Spark, and more.

8. Apache CouchDB: Database

Thousands of organizations, such as the BBC, GrubHub, and the Large Hadron Collider, use Apache CouchDB for seamless data flow between every imaginable computing environment, from globally distributed server clusters to mobile devices to web browsers. Its Couch Replication Protocol allows you to store, retrieve, and replicate data safely on-premises or in the cloud with very high performance and reliability. Apache CouchDB does all the heavy lifting so you can sit back and relax.

9. Apache Edgent (incubating): Edge computing

The boom of IoT—with personal assistants, smartphones, smart homes, connected cars, Industry 4.0, and beyond—is producing an ever-growing amount of data streaming from millions of systems, sensors, equipment, vehicles, and more. The demand for reliable, efficient real-time data has driven the need for the "empowered edge," where data collection and analysis are optimized by moving away from centralized sources towards the edges of the networks where much of the data originates. Companies like IBM and SAP are leveraging Apache Edgent to accelerate analytics at the edge across the IoT ecosystem. Apache Edgent can be used in conjunction with many Apache data analytics solutions such as Apache Flink, Apache Kafka, Apache Samza, Apache Spark, Apache Storm, and more.

10. Apache OFBiz: Enterprise resource planning

Whereas most ASF projects are about running or creating infrastructure, the foundation recognizes the importance of running and handling a business. Apache OFBiz is a comprehensive suite of business applications to help manage everything from accounting and CRM through warehousing and inventory control. The Java-based framework provides the power and the flexibility to serve as the core of B2B and B2C business management and is easily expandable and customizable. Apache OFBiz is a complete ERP solution—flexible, free, and fully open source—and services users from United Airlines to Cabi.

11. Apache Spatial Information System (SIS): Geospatial

The US National Oceanic and Atmospheric Administration, Vietnamese National Space Center, and numerous spatial agencies, governments, and others rely on Apache SIS to create intelligent, standards-based, interoperable geospatial applications. The Apache SIS toolkit handles spatial data, location awareness, and geospatial data representation and provides a unified metadata model for file formats used for real-time smart city visualization, geospatial dataset discovery, state-of-the-art location-enabled emergency management, earth observation, and information modeling for extraterrestrial bodies such as Mars and asteroids.

12. Apache Syncope: Identity management

Apache Syncope manages digital identity data in enterprise applications and environments to handle user information such as username, password, first name, last name, email address, etc. Identity management involves user attributes, roles, resources, and entitlements that control who has access to what data, when, how, and why. Apache Syncope users include the Italian Army, the University of Helsinki, University of Milan, and the Swiss SWITCH university network.

13. Apache PLC4X (incubating): IoT

Connectivity and integration across many Industrial IoT edge gateways are often impossible with closed-source, proprietary legacy systems that have incompatible protocols. Apache PLC4X provides a universal protocol adapter for creating Industrial IoT applications through a set of libraries that allow unified access to any type of industrial programmable logic controllers (PLCs) using a variety of protocols with a shared API. In addition, the project is planning integrations modular to Apache IoT projects that include Apache Brooklyn, Apache Camel, Edgent, Apache Kafka, Apache Mynewt, and Apache NiFi.

14. Apache Commons: Libraries

With 42% or more of Apache projects written in Java (that's 62+ million lines of code!), it's both helpful and necessary to have a set of stable, reusable open source Java software components available to all Apache projects and external users. Apache Commons provides a suite of dozens of stable, reusable, easily deployed Java components and a workspace for Commons contributors to collaborate on the development of new components.

15. Apache Spark: Machine learning

Big data is growing exponentially each year, accelerated by industries such as agriculture, big business, fintech, healthcare, IoT, manufacturing, mobile advertising, and more. Apache Spark's unified analytics engine for processing and analyzing large-scale data helps data scientists apply machine learning insights and an array of libraries to improve responsiveness and produce more accurate results. Apache Spark runs workloads 100x faster on Apache Hadoop, Apache Mesos, and Kubernetes (whether standalone or in the cloud), and enables them to access diverse data sources, including Apache Cassandra, Apache Hadoop HDFS, Apache HBase, Apache Hive, and hundreds of others.

16. Apache Cordova: Mobile

Apache Cordova is the popular developer tool used to easily build cross-platform, cross-device mobile apps using a "write-once-run-anywhere" solution, which enables developers to create a single app that appears the same across multiple mobile device platforms. Apache Cordova acts as an extensible container and serves as the base that most mobile application development tools and frameworks are built upon, including mobile development platforms and commercial software products by BlackBerry, Google, IBM, Intel, Microsoft, Oracle, Salesforce, and many others.

17. Apache Tomcat: Java/servers

Starting off as the Apache JServ project designed to allow for Java "servlets" to be run in a web environment, Tomcat grew to become a full-fledged, comprehensive Java application server and was the de-facto reference implementation for the Java specifications. Since 2005, Apache Tomcat has formed the foundation of numerous Java-based web infrastructures such as eBay, E-Trade, Walmart, and the Weather Channel.

18. Apache Lucene Solr: Search

Adobe, AOL, Apple, AT&T, Bank of America, Bloomberg, Cisco, Disney, E-Trade, Ford, The Guardian, the Department of Homeland Security, Instagram, MTV Networks, NASA Planetary Data System, Netflix, SourceForge, Verizon, Walmart, Whitehouse.gov, Zappos, and countless others turn to Apache Lucene Solr to quickly and reliably index and search multiple sites and enterprise data such as documents and email. Popular features include near-real-time indexing, automated failover and recovery, rich document parsing and indexing, user-extensible caching, design for high-volume traffic, and much more.

19. Apache Wicket: Web framework

Many followers prize the Apache Wicket component-based web application framework for its "plain old Java object" (POJO) data model and markup/logic separation not common in most frameworks. Developers have been using Apache Wicket since 2004 to quickly create powerful, reusable components using object-oriented methodology with Java and HTML. Wicket powers thousands of applications and sites for governments, stores, universities, cities, banks, email providers, and more, including Apress, DHL, SAP, Vodafone, and Xbox.com.

20. Apache Daffodil (incubating): XML

Governments handle massive amounts of complex and legacy data across security boundaries every day. For such data to be consumed, it must be inspected for correctness and sanitized of malicious data. While traditional inspection methods are often proprietary, incomplete, and poorly maintained, Apache Daffodil streamlines the process with an open source implementation of the Data Format Description Language specification (DFDL) that fully describes a wide array of complex and legacy file formats down to the bit level. Daffodil can parse data to XML or JSON to allow for validation, sanitization, and transformation and also serialize or ''unparse'' back to the original file format, effectively mitigating a large variety of common vulnerabilities.

Looking to the future

The Apache Software Foundation is a leader in community-driven open source software and continues to innovate with dozens of new projects and their communities. Apache projects are managing exabytes of data, executing teraflops of operations, and storing billions of objects in virtually every industry. Apache software is an integral part of nearly every end-user computing device, from laptops to tablets to phones. The commercially friendly and permissive Apache License v2.0 has become an open source industry standard.

As the demand for quality open source software continues to grow, the collective Apache community will continue to rise to the challenge of solving current problems and ideate tomorrow's opportunities through the Apache Way of open development.

About the author

Sally Khudairi - Sally Khudairi is Vice President of Marketing & Publicity at The Apache Software Foundation (ASF) where, in 2002, she was elected its first female and non-technical Member. Over her 25-year career in the Web, Khudairi has been lauded as a dynamic communications strategist and expert in next-generation innovations, and has played an integral role in building campaigns for some of the industry’s most prominent standards and organizations. Prior to launching the ASF in 1999, Khudairi was...

About the author

Jim Jagielski - Jim is a well known and acknowledged expert and visionary in Open Source, an accomplished coder, and frequent engaging presenter on all things Open, Web and Cloud related. As a developer, he’s made substantial code contributions to just about every core technology behind the Internet and Web and in 2012 was awarded the O’Reilly Open Source Award and in 2015 received the Innovation Luminary Award from the EU. He is likely best known as one of the developers and co-founders of the Apache Software...