10 ways big data and data science impacted the world in 2020 | Opensource.com

10 ways big data and data science impacted the world in 2020

Learn how open source data science languages, libraries, and tools are helping us understand our world better by reviewing 2020's top 10 data science articles on Opensource.com.

Looking at a map
Image by : 

opensource.com

x

Subscribe now

Get the highlights in your inbox every week.

Big data’s one of many domains where open source shines. From open source alternatives for Google Analytics to new features in MySQL, 2020 brought several ways for open source enthusiasts to learn big data skills.

Get up to speed on how open source data science languages, libraries, and tools help us understand our world better by reviewing the top 10 data science articles published on Opensource.com last year. 

Once upon a time, Matplotlib was the lone way to make plots in Python. In recent years, Python's status as data science's de facto language changed that. We have a plethora of ways to plot data using Python today.

In this article, Shaun Taylor-Morgan walks through seven ways to plot data in Python. Don't worry if you're a Matplotlib user: It's covered, along with Seaborn, Plotly, and Bokeh. You'll find codes and charts per plotting library, plus some newcomers to the Python plotting field: Altair, Pygal, and pandas.

Transparent, open source alternative to Google Analytics

Many websites use Google Analytics to track their activity metrics. Its status as a de facto tool leaves some to wonder if open source options exist. In this overview of Plausible Analytics, Marko Saric proves they do.

If you want to compare Google Analytics against open source options, you will find Marko's article helpful. It's especially great if you're a website admin trying to comply with new data collection regulations, such as GDPR.

If you want to learn more about Plausible, you'll find links to Plausible's code and roadmap on GitHub in Marko's article.

5 MySQL features you need to know

After MySQL 8.0 came out in April 2018, its release cycle for new features updated to four times per year. Despite the more frequent deployments, many users don't know about new MySQL features that could save them hours of time.

In this March 2020 article, Dave Stokes shares five features that were new to MySQL. They include dual passwords, new shells, and better SQL support. But keep in mind that these updates are now close to a year old: There's a lot more to discover in MySQL since then!

Using C and C++ for data science

Did you know that C and C++ are both strong options for data science projects? They're especially good choices to run data science programs on the command line.

In this article, Cristiano L. Fontana uses C99 and C++11 to write a program that uses Anscombe's quartet dataset. The step-by-step instructions include reading data from a CSV file, interpolating data, and plotting results to an image file.

Using Python to visualize COVID-19 projections

The COVID-19 pandemic brought an influx of data to the proverbial forefront. In this article, Anurag Gupta shows how to use Python to project COVID-19 cases and deaths across India.

Anurag walks through downloading and parsing data, selecting and plotting data for India, and creating an animated horizontal bar graph. If you're interested in the complete script, you'll find a link at the end of this article.

How I use Python to map the global spread of COVID-19

If you want to track the spread of COVID-19 globally, you can use Python, pandas, and Plotly to do it. In this article, Anurag Gupta explains how you can use them to clean and visualize raw data.

Using screenshots to help, Anurag shares how to load data into a pandas DataFrame; clean and modify the DataFrame; and visualize the spread in Plotly. The complete code yields a gorgeous graph, and the article ends with a link to download and run it.

3 ways to use PostgreSQL commands

In this follow-up to his article on getting started with PostgreSQL, Greg Pittman shares how he uses PostgreSQL commands to keep his grocery shopping list updated.

Whether you want to do per-item entry or bring order to complex tables, Greg explains how to create the commands you need. He also shows how to output your lists once you're ready to print them.

No matter how long your shopping list is, PostgreSQL commands—especially the WHERE parameter—can bring ease to your life beyond programming.

Using Python and GNU Octave to plot data

Python is data science's language du jour, but how can you use it for specific tasks? In this article, Cristiano Fontana shares how to write a program in Python and GNU Octave.

Cristiano walks through each step to read data from a CSV file, interpolate the data with a straight line, and plot the result to an image file. From printing output and reading data to plotting the outcome, Fontana's step-by-step guidelines explain the whole process in Python and GNU Octave.

Fast data modeling with JavaScript

Want a way to model data in a few minutes? In this article, Szymon shares how to do it using less than 15 lines of JavaScript code.

It really is that simple: You merely need to create a class and use the defaultsDeep function in the Lodash JavaScript library. Szymon shows this process using screenshots and code samples.

It keeps your data in one place, avoids code repetition, and is fully customizable. If you want to try out the code in this article, Szymon links to it in CodeSandbox at the end.

How to process real-time data with Apache tools

We process so much data today that storing data for analysis later might be impossible soon. Teams that handle failure prediction and other context-sensitive data need to get this information in real time, before it hits a database. Luckily, you can do this with Apache tools.

In this article, Simon Crosby explains how Apache Spark—a unified analytics engine—can process large datasets in real time at scale. For instance, "Spark Streaming breaks data into mini-batches that are each independently analyzed by a Spark model or some other system," he writes.

If Apache's not your thing, Simon presents other open source options. Flink, Beam, and Stanza—along with Apache-licensed SwimOS and Hazelcast—are just a few of your choices.

What do you want to know?

What would you like to know about big data and data science? Please share your suggestions for article topics in the comments. And if you have something interesting to share about data science, please consider writing an article for Opensource.com.

About the author

Headshot of Lauren Maffeo
Lauren Maffeo - Lauren Maffeo has reported on and worked within the global technology sector. She started her career as a freelance journalist covering tech trends for The Guardian and The Next Web from London. Today, she works as a service designer for Steampunk, a human-centered design firm delivering civic tech solutions for government agencies. Prior to Steampunk, Lauren was an associate principal analyst at Gartner, where she covered the impact of emerging tech like AI and blockchain on small and midsize...