You don't have to be a data scientist to be fascinated by the world of machine learning, but a few travel guides might help you navigate the vast universe that also includes big data, artificial intelligence, and deep learning, along with a large dose of statistics and analytics. ("Deep learning" and "machine learning" are often used interchangeably, so for a quick terminology primer that might help you understand the difference, read Nvidia's blog post, What's the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?)
In this article, I'll look at three of the most popular machine learning libraries for Python.
Released nearly a decade ago and primarily developed by a machine learning group at Université de Montréal, Theano is one of the most-used CPU and GPU mathematical compilers in the machine learning community. A 2016 paper, Theano: A Python framework for fast computation of mathematical expressions, provides a thorough overview of the library. "Several software packages have been developed to build on the strengths of Theano, with a higher-level user interface, more suitable for certain goals," the paper explains. "Lasagne and Keras have been developed with the goal of making it easier to express the architecture of deep learning models and training algorithms as mathematical expressions to be evaluated by Theano. Another example is PyMC3, a probabilistic programming framework that uses Theano to derive expressions for gradients automatically, and to generate C code for fast execution." (Keras and Lasagne run on top of both TensorFlow and Theano.)
Theano has more than 25,000 commits and almost 300 contributors on GitHub, and has been forked nearly 2,000 times.
For a video tutorial, watch the PyCon Singapore 2015 talk by Martin Andrews, Machine Learning: Going Deeper with Python and Theano:
TensorFlow, an open source library for numerical computing using data flow graphs, is a newcomer to the world of open source, but this Google-led project already has almost 15,000 commits and more than 600 contributors on GitHub, and nearly 12,000 stars on its models repository.
In the first Open Source Yearbook, TensorFlow was picked as a project to fork in 2016. In the most recent Open Source Yearbook, TensorFlow made several appearances. We included the project on our list of top open source projects to watch in 2017. We also learned about TensorFlow-based project Magenta in an article by Josh Simmons, A tour of Google's 2016 open source releases. Simmons says Magenta is an effort to advance the state of the art in machine intelligence for music and art generation, and to build a collaborative community of artists, coders, and machine-learning researchers. Rachel Roumeliotis also refers to TensorFlow in a list of languages powering AI as part of her Hot programming trends of 2016 roundup.
TensorFlow 1.0 rolled out in mid-February. "In just its first year, TensorFlow has helped researchers, engineers, artists, students, and many others make progress with everything from language translation to early detection of skin cancer and preventing blindness in diabetics," the Google Developers Blog announcement says.
To learn more about TensorFlow, read the DZone series TensorFlow on the Edge, or watch the live stream recording from TensorFlow Dev Summit 2017:
Built on NumPy, SciPy, and Matplotlib, scikit-learn (pronounced sy-kit learn) is used by Spotify engineers for music recommendations, at OkCupid to help evaluate and improve their matchmaking system, and during the exploration phase of new product development at Birchbox.
Scikit-learn has almost 22,000 commits and 800 contributors on GitHub.
For a free tutorial, read An introduction to machine learning with scikit-learn on the project's website, or watch Sebastian Raschka's PyData Chicago 2016 talk, Learning scikit-learn: An introduction to Machine Learning in Python.