What is the one thing in common among Google Assistant, Amazon Alexa, Microsoft Cortana, Apple Siri, or Facebook Messaging M? All of these personal assistants are powered by machine learning and artificial intelligence (AI). According to Glassdoor and Upwork.com, machine learning is an in-demand skill for 2017 and will be for many years.
How will you develop skills to become the next machine learning engineer?
Get started in machine learning
Udacity, Coursera, edX, and other resources offer MOOCs (Massive Open Online Courses), and Amazon.com lists more than 5,000 books on machine learning. Many of these titles can help to you get started and gradually bring you to an advanced level with hands-on knowledge. Also, there is a great book by Aurélien Géron, a former Google engineer who led the YouTube video classification team: Hands-On Machine Learning with Scikit-Learn and TensorFlow. It is a practical, instructional book for hands-on machine learning and deep learning (a class of machine learning algorithms).
The book assumes that you know close to nothing about machine learning; however, the book also assumes that you have some Python programming experience and a reasonable understanding of college-level math, including linear algebra, probability, and statistics. Machine learning projects typically use Python, R, Java/Scala, and C/C++ as programming languages. Rather than having us implement our own toy versions of a machine learning algorithm, Géron uses production-ready Python frameworks scikit-learn and TensorFlow. Both are open source projects with active development.
Why scikit-learn and TensorFlow?
The scikit-learn project started as a Google Summer of Code project by David Cournapeau in 2007. Scikit-learn focuses on bringing machine learning to non-specialists using a general-purpose, high-level language like Python. According to scikit-learn's testimonials page, Spotify uses scikit-learn machine learning packages for music recommendations, and Evernote uses it for the classification of notes. It provides good-quality, easy-to-use implementations of basic machine learning algorithms, including regression, classification, clustering, and more. Scikit-learn is a good entry point to learn machine learning, and it is the second highest starred machine learning library on GitHub.
TensorFlow is the second project Géron evaluates. Starting in 2011, Google Brain built DistBelief, a machine learning system based on deep learning neural networks. TensorFlow is Google Brain's second-generation machine learning system, and it was released as open source software in 2015. Google uses TensorFlow neural network models for voice recognition, and Snapchat uses it for image recognition. It provides advanced machine learning algorithms for neural networks and deep learning, and it's the most starred machine learning library on GitHub.
An end-to-end machine learning project
The book's core strength is its focus on an end-to-end machine learning project. A typical machine learning project involves eight steps:
- Look at the big picture
- Get the data
- Visualize the data to gain insights
- Prepare the data for machine learning algorithms
- Select a model and train
- Fine tune the model
- Present the solution
- Launch, monitor, and maintain the machine learning system
Sample project: housing prices prediction
The book uses California public census data to predict district-wide housing prices. As a first step, the book helps frame the problem—this is a supervised learning problem with multivariate regression. Depending on the accuracy required, the next step is to select a performance measure; for example, 68% of the system's price prediction falls within $50,000 of actual value, while 95% of the system's price prediction falls within $100,000 of actual value.
Scikit-learn provides tools to get the data, explore the data, clean the data, and prepare the data for machine learning algorithms. Once this is done, the next step is to select and train a model. One of the big challenges in the machine learning project is to choose the right model, which will provide better accuracy in the prediction. An experienced machine learning professional chooses three to four models and selects one of the models based on the accuracy of its results.
For the housing prices prediction, Géron chooses linear regression, decision tree, and random forest models. He found that random forest performs better than the other two models for this problem.
The book covers many more machine learning algorithms and their relevance and usage, and it includes code examples that cover supervised learning, unsupervised learning, semi-supervised learning, and reinforced learning. Your machine learning problem could be segmenting customers and finding the best marketing strategy for each group, detecting which transactions are likely to be fraudulent, and predicting next year's revenue.
Read: Hands-On Machine Learning with Scikit-Learn and TensorFlow
1 Comment