Open source matters in data analytics: Here's why

Open source is critical in data analysis while providing long-term benefits for the users, community members, and business.
5 readers like this.
metrics and data shown on a computer screen

It's been a little over a year since I wrote my article on, introducing the Cube community. As I worked with our community members and other vendors, I've become more convinced of the benefits of open source in data analytics. I also think it's good to remind ourselves periodically why open source matters and how it provides long-term benefits for everyone.

Benefits of open source for users and customers

One of the first things I heard from the Cube community was that they often received better support in chat from other community members than they did with proprietary software and a paid support plan. Across many open source communities, I find people who are motivated to help other (especially new) community members and see it as a way of giving back to the community.

You don't need permission to participate in open source communities. A good open source community isn't for developers only, and people feel there's a culture of trust and feel comfortable enough to have open discussions on chat platforms, forums, and issue trackers. This is especially important for non-developers, such as data engineers or analysts in the data analytics space.

Of course, with open source software, there's the ability to see and contribute directly to the codebase to fix bugs or add new features. Using an example from the Cube community, GraphQL support was one of our highlights last year, and our community members contributed to this feature.

There are plenty of benefits to an active community. Even in cases where the vendor cannot release a fix in a timely manner, you can still make the changes yourself and own the runtime while you wait for an "official" fix. Community members and users also don't like being locked in to a vendor's whims, and there's no pressure to upgrade when using open source software. 

Open source communities leave many "bread crumbs" in different tools like GitLab, GitHub, Codeberg, YouTube, and so on, making it a lot easier to gauge not just the volume of activities but also the level of community engagement and culture. So even before trying out the software, you can get a good sense of the community's health (and, by extension, the company) before deciding if this is a technology you want to invest in.

[ Related read How we track the community health of our open source project ]

Benefits of open source for the company

There's no better way to lower the barrier to adoption of your software than being open source. Early on, this helps grow adoption among the technical audience. Early adopters then often become some of your most loyal fans for years to come.

Early adopters are also catalysts for speeding up your development. Their feedback on your product and feature requests (for instance on your issue trackers) will provide insight into real-world use cases. In addition, many of the open source enthusiasts participate in co-development efforts (for example, on your repositories) for new features or bug fixes. Needless to say, this is precious for companies in the early days when there is a shortage of resources in development and product teams.

As you tend to your community, you will help it grow and diversify. Increased diversity isn't just in demographics or geography. You want users from new industries, or users with different job titles. Using the Cube community as an example, I mostly talked to application developers a year ago, but now I’m meeting with more people that are data consumers or users.

The collaborative culture in good open source communities lowers the barrier to entry not just for developers but also for others who want to ask questions, share their ideas or make other non-technical contributions. You get better access to diverse perspectives as your company and community grows.

Being open source makes it easy to collaborate with other vendors and communities, not just with individual community members. For example, if you want to work with another vendor on a database driver or integration, it's a lot simpler when you can just collaborate across open source repositories.

Community matters

All these benefits lead to lowering the barriers to entry for using your software and collaboration. The open source model will not only help individual software or companies, but it can help accelerate the growth of our entire ecosystem and the industry. I hope to see more open source companies and communities in the data analytics space and for all of us to continue this journey.

What to read next
User profile image.
Ray is a Community Manager at PingCAP where he is helping to grow the TiDB community. Prior to PingCAP, Ray managed open source communities at Cube Dev, GitLab and the Linux Foundation. He has over 15 years of experience in the high-tech industry in roles ranging from software engineer, product manager, program manager, and team lead at companies such as EDS, Intel, and Medallia.

1 Comment

The article properly targets the usage of data in open source.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.