How to become a data scientist

Data scientists are in high demand. This guide tells you what you need to know for a career in data science—and how to learn it.
414 readers like this.
open science beaker

Opensource.com

Once upon a time, I wanted to be an evolutionary biologist. To make a long story short, I had a change of heart and dropped out of my PhD program to pursue a career in computer science. I'm now a senior software engineer at Red Hat, where I work on a variety of machine learning and data science projects (you can read more about my journey on my blog). Not long after joining Red Hat, many people—including three different University of Chicago grad students—asked me about transitioning to a career in data science, so I started looking into it.

The awesome thing about jumping into data science now is that everything (from the software, to the learning materials, to the discussion) is extremely open, so there's never been a better time to be an autodidact. In case it helps others considering a career in data science, here's what I've learned about making the leap.

Open discussion

As a warmup, I recommend the following links for background information on data science:

In general, members of the data science community are quite open about sharing their diverse experiences and backgrounds, which can be super helpful when you're choosing what particular flavor of data science to pursue.

Open experience

If you're serious about pursuing a career in data science, getting experience is more important than anything else. I know this advice rings true for many other fields, but because data science requires such a high level of mathematical and statistical maturity, it can be somewhat difficult to signal to potential employers that you know how to effectively apply these sophisticated techniques without relevant work experience.

If you're a student, your top priority should be landing an internship. It will make the eventual full-time job search much easier. Unfortunately, internships are also the least "open" aspect of the data scientist pursuit because they're usually only available to students. However, there are plenty of other open opportunities for gaining experience. For example, you can try out open competitions, like those on Kaggle.

There's also open source software development. Contributing to open source projects and/or putting your personal projects on GitHub (here's mine) is a great way of demonstrating your data science expertise. You can also consider pro bono ("open heart?") work. Have a favorite local restaurant? Ask its management if they'd be interested in a free data science consultation. (I know someone who actually did this!)

Finally, be sure to create a LinkedIn account and keep it updated (here's mine). LinkedIn has become an extremely valuable tool for recruiters, so it's important to be discoverable there.

Open education

Next, my favorite part, open education. Over the past few years, there has been a really exciting trend towards massive open online courses (aka MOOCs), which are basically full courses (including homework and exams) offered by top institutions and firms (e.g., Stanford, Harvard, Google) on a wide variety of topics. There are many companies and websites offering MOOCs, but some of my favorites include: Coursera, edX, Udacity, Saylor, and Khan Academy.

For guidance on which courses to take, I've put together a detailed data science curriculum and published my own full course history. Some subjects you'll definitely want to cover include:

Open source software

Finally, the part most readers of Opensource.com will be familiar with: open source software. Open source software abounds in data science, but, just like Linux, the code being free and open does not mean it's inferior to its proprietary counterparts. In fact, the open source solutions are typically the best in class.

Important open source software for data scientists to know includes:

Get started

These guidelines should get you off on the right foot in your pursuit of a career in data science. If you know about any other helpful data science resources, be sure to share them in the comments.

User profile image.
Michael is currently an Artificial Intelligence Center of Excellence Fellow with the USDA Agricultural Research Service. Previously, Michael earned a Ph.D. in Machine Learning from Auburn University, and he was once a Machine Learning Engineer at Red Hat. You can learn more about him on his website.

1 Comment

That's a familiar career path: doctoral student in animal behavior to geospatial epidemiologist in my case. I'm older than you, and now I seem to have evolved through un-natural selection into a bureaucrat who can't help writing code when my hard-working employees are too busy to prevent me.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.