Big Data. Deep Learning. Data Science. Artificial Intelligence.
It seems like a day doesn’t go by when we’re not bombarded with these buzzwords. But what’s with all the hype? And how can you use it in your own business?
At its simplest level, machine learning is simply the process of optimizing mathematical equations. There are several different kinds of machine learning, all with a different purpose. Two of the most popular forms of machine learning are supervised and unsupervised learning. We’ll go through how they work below:
- Supervised Learning – supervised learning uses labeled examples of known data to predict future outcomes. For example, if you kept track of weather conditions and whether your favorite sports team was playing that day, you could learn from those patterns over time and predict if the game would be rained out or not based on the weather forecast. The “supervised” part means that you have to supply the system with “answers” that you already know. That is, you already knew when your team did and didn’t play, and you know what the weather was on those days. The computer reads through this information iteratively and uses it to form patterns and make predictions. Other applications of supervised learning could be predicting if people will default on their loan payments.
- Unsupervised Learning – unsupervised learning refers to a type of machine learning where you don’t necessarily know what the “answer” is you’re looking for. Unlike our “will my sports game get rained out” example, unsupervised learning is more suitable for exploratory or clustering work. Clustering groups things that are similar or connected, so you could feed it a group of Twitter posts and have it tell you what people are most commonly talking about. Some algorithms that apply unsupervised learning are K-Means and LDA.
Deep learning, despite the hype, is simply the application of multi-layered artificial neural networks to machine learning problems. It’s called “deep” learning because the neural networks contain many levels of classification instead of one layer as a whole. For example, a deep learning algorithm that wanted to classify faces in photos would first learn to classify the shape of eyes, then noses, then mouths, and then the spatial relationship of them all together. This is instead of trying to recognize the whole face at once. It breaks it down into component parts to get a better understanding.
Deep learning has been in the news a lot lately. You may remember the trippy image generation project called DeepDream that Google released in 2015. Also noteworthy was AlphaGo’s triumph over a professional Go player, also using deep learning. Before this, a computer had never been able to beat a human at a game of Go, so this marked a new milestone in artificial intelligence.
credit: Deep Dreamscope–
One of the best things about Python is the fact that there are so many libraries available. Since anyone can create a Python package and submit it to PyPI (Python Package Index), there are packages out there for just about everything you can think of. Machine and Deep Learning are no exception.
In fact, Python is one of the most popular languages for data scientists due to its ease of use and wealth of scientific packages available. Many Python developers, especially in the data space, like to use Jupyter Notebooks because it allows them to iterate and refine code and models without running the entire program each time.
scikit-learn is the frontrunner and longtime favorite of data scientists. It’s been around the longest and has whole books devoted to the topic. If you want a wealth of machine learning algorithms and customizations, scikit-learn likely has what you need. However, if you’re looking for something that’s more heavily stats-focused, you may want to go with StatsModels instead.
Caffe is a fast open framework for deep learning written in Python. Developed by an AI research team at UC Berkeley, it performs well in image processing scenarios and is used by large companies such as Facebook, Microsoft, Pinterest, and more.
TensorFlow made waves in the machine learning community as Google’s open source deep learning offering. It currently stands as the most prominent deep learning framework in the space, with many developers participating. TensorFlow works well with object recognition and speech recognition tasks.
Theano is a Python library for fast numerical computation. Many developers use it on GPUs for data-intensive operations. It also has symbolic computation capabilities so you can calculate derivatives for functions with many variables. In fact, with GPU optimization, it can even outperform C. If you’re crunching some serious data, Theano could be your go-to.
A better question would be: who’s not using machine learning in their business? And if not, why not?
The possibilities of data analytics at scale have been realized across industries, from healthcare to finance to oil and gas. Here are some notable firms betting on machine learning:
- Google — Google uses machine learning across their company, from Google Translate to helping you categorize your photos to self-driving car research. Teams at Google also develop TensorFlow, a leading deep learning framework.
- Facebook — Facebook makes heavy use of machine learning in the ad space. By looking at your interests, pages you visit, and things you ‘like’, Facebook gets a very good idea of who you are as a person and what kind of things you may be interested in buying. It uses this information to show you advertisements and posts in your newsfeed. Facebook also uses machine learning to recognize faces in your photos and help you tag them.
- Netflix — Netflix uses the movies you watch, rate, and search for to create customized recommendations. One machine learning algorithm for product recommendations that both Netflix and Amazon employ is called collaborative filtering. In fact, Netflix hosts a contest called The Netflix Prize that awards people that can develop new and better recommendation systems.
- Python is a general-purpose language, which means it can be used in a variety of scenarios and has a wealth of packages available for just about any purpose.
- Python is easy to learn and read.
- Developers can use Jupyter Notebooks to iteratively build their code and test it as they go.
- There’s no industry-standard IDE for Python like there is for R. Still, many good options exist.
- In most cases, Python’s performance cannot compare with C/C++.
- The wealth of options in Python can be both a pro and a con. There are lots of choices, but it may take more digging and research to find what you need. In addition, setting up separate packages can be complicated if you’re a novice programmer.
The era of Big Data is here, and it’s not going away. You have learned a little more about the different types of machine learning, deep learning, and the major technologies that companies are using. Next time you have a data-intensive problem to solve, look no further than Python!
Latest posts by Al Nelson (see all)
- ETL Management with Luigi Data Pipelines - October 15, 2017
- Who’s That Star? Recognize Celebrities With Computer Vision - September 21, 2017
- Plotting Climate Data with Matplotlib and Python - August 17, 2017