Data science, machine learning, artificial intelligence, and deep neural nets are all hot topics these days (and key terms that might help this post with some SEO, unless the AI sees through my attempts). Below I've shared several of the resources I use regularly while working on data science projects over the last few years. I don't read many books, so that I've shared even one is evidence of how important it is.
There are enough resources here to get even the most novice engineer started on a path towards data science mastery in this new age where data science skills will be needed at every level. There is a tool for performing the work, a class taught by a renowned Stanford professor, websites with tutorials to give you real-life experience, and a site dedicated to making the latest research available to all for free so you can learn more if you want.
Enjoy the journey!
Book
Weapons of Math Destruction by Cathy O’Neil
If you want to be able to trust your AI outputs, then you need to read this book. It explains some of the different avenues by which bias can infiltrate your data and algorithms and what you can do about it.
Online course
Andrew Ng's free machine learning class on Coursera
This course makes it easy to get started in Machine Learning with very little prior knowledge. Andrew is an excellent instructor and provides helpful explanations for understanding complex concepts.
Tools
Data Set Search by Google (beta)
If you want to search a lot of public datasets to include what's in kaggle, then you need to check out this beta project from Google. You can use a lot of the common advanced search syntax you're already used to using in Google Search like specifying the site to search. This is where I go when looking for a dataset to use when I need one.
Colaboratory, a free Jupyter Notebook
This tool provides a Jupyter notebook implementation that allows you to collaborate with others similar to other Google Apps. If you're short on cash or just want a tool that's available from any internet-connected computer, then this will help you a lot. I use it almost exclusively just because it helps me avoid the issues of managing local dependencies.
Videos
Andrej Karpathy's Stanford class videos on YouTube
Recommended by Kartik Subbarao
These are great. Andrej gives you an intuitive understanding of neural networks in a way that's friendly for how programmers think about things. He's also got some great blog posts on the subject as well.
Websites
Arxiv.org
This is a site everyone should have saved if they're interested in data science. All of the latest research is published here to ensure the researchers can claim "first" in their findings before the papers are officially published. In data science, the field is moving so fast, that it's important to stay current in order to have the most effective and efficient algorithm.
Kdnuggets.com
Don't let this site's appearance fool you, it has a ton of high-quality content. It will also republish articles from other sites with the permission of the author. This often helps highlight articles that wouldn't necessarily get as much traffic. This is one of the best websites for data science content.
Kaggle.com
Anyone in data science will know this website. This site has a lot of datasets available, but these are mostly focused around data science competitions and projects. It's a great way to learn and begin interacting with some of the many public datasets. They have some project templates to help you get started and learn how all of this data science stuff works.
Towardsdatascience.com
This whole site has been an excellent resource for me. They constantly have great content covering both practical and theoretical topics in data science.
4 Comments