About the Author:

What machine learning is today, and what it could be soon

February 18th, 2019

If AI is a broad umbrella that includes the likes of sci-fi movies, the development of robots, and all sorts of technology that fuels legacy companies and startups, then machine learning is one of the metal tongs (perhaps the strongest) that holds the AI umbrella up and open.

So, what is machine learning offering us today? And what could it offer us soon? Let’s explore the potential for ML technologies.

Intro to machine learning

Machine learning is the process of machines sorting through large amounts of data, looking for patterns that can’t be seen by the human eye. A theory for decades, the application of machine learning requires two major components: machines that can handle the amount of processing necessary, plus a lot (a lot!) of gathered, cleaned data.

Thanks to cloud computing, we finally have both. With cloud computing, we can speed through data processing. With cloud storage, we can collect huge amounts of data to actually sort through. Before all this, machines had to be explicitly programmed to accomplish a specific task. Now, however, computers can learn to find patterns, and perhaps act on them, without such programming. The more data, the more precise machine learning can be.

Current examples of machine learning

Unless you are a complete luddite, machine learning has already entered folds of your life. Choosing a Netflix title based on prompted recommendations? Browsing similar titles for your Kindle based on the book you just finished? These recommendations are actually tailor-made for you. (In the recent past, they relied on an elementary version of “if you liked x, you may like y”, culled from a list that was put together manually.)

Today, companies have developed proprietary algorithms that machine learnings train, or look for patterns, on, using your data combined with the data of millions of other customers. This is why your Netflix may be chock full of action flicks and superhero movies and your partner’s queue leans heavily on crime drama and period pieces.

But machine learning is doing more than just serving up entertainment. Credit companies and banks are getting more sophisticated with credit scores. Traditionally, credit companies relied on a long-established pattern of credit history, debt and loan amounts, and timely payments. This meant if you weren’t able to pay off a loan from over a decade ago, even if you’re all paid up now, your credit score likely still reflects that story. This made it very difficult to change your credit score over time – in fact, time often felt like the only way to improve your credit score.

Now, however, machine learning is changing how credit bureaus like Equifax determine your score. Instead of looking at your past payments, data from the very near past – like, the last few months – can actually better predict what you may do in the future. Data analysis from machine learning means that history doesn’t decide; data can predict your credit-worthiness based on current trends.

What the future holds for machine learning

Machine learning is just getting started. When we think of the future for machine learning, an example we also hear about are those elusive self-driving cars, also known as autonomous vehicles.

In this case, machine learning is able to understand how to respond to particular traffic situations based on reviewing millions of examples: videos of car crashes compared to accident-free traffic, how human-driven cars respond to traffic signs or signals, and watching how, where, and when pedestrians cross streets.

Machine learning is beginning to affect how we see images and videos – computers are using neural networks to cull thousands of images from the internet to fill in blanks in your own pictures.

Take, for instance, the photo you snapped on your holiday in London. You have a perfect shot of Big Ben, except for a pesky pedestrian sneaking by along a wall. You are able to remove the person from your image, but you may wonder how to fill the space on the wall that walker left behind. Adobe Photoshop and other image editors rely on an almost-standard API to cull other images of walls (that specific wall, perhaps, as well as other walls that look similar) and randomize it so that it looks natural and organic.


This type of machine learning is advancing rapidly and it could soon be as easy as an app on our phones. Imagine how this can affect the veracity of a video – is the person actually doing what the video shows?

Problems with machine learning

We are at a pivotal point where we can see a lot of potential for machine learning, but we can also see a lot of potential problems. Solutions are harder to grasp as the technology forges forward.

The future of machine learning is inevitable; the question is more when? Predictions indicate that nearly every kind of AI will include machine learning, no matter the size or use. Plus, as cloud computing grows and the world amasses infinite data, machines will be able to learn continuously, on limitless data, instead of on specific data sets. Once connected to the internet, there is a constant stream of emerging information and content.

This future comes with challenges. First, hardware vendors will necessarily have to make their computers and servers stronger and speedier to cope with these increased demands.

As for experts in AI, it seems there will be a steep and sudden shortage in the professional manpower who can cope with what AI will be able to day. Behind the private and pricey walls of Amazon, Google, Apple, Uber, and Facebook, most small- and medium-sized businesses (SMBs) actually aren’t stepping more than a toe or two into the world of machine learning. While this is due in part to a lack of money or resources, the lack of expert knowledge is actually the biggest reason that SMBs aren’t deeper into ML. But, as ML technologies normalize, they’ll cost less and become a lot more accessible. If your company doesn’t have experts who knows how you could be using ML to help your business, you’re missing out.

On a global level, machine learning provides some cause for concern. There’s the idea that we’ll all be replaced in our jobs by specific machines or robots – which may or may not come to fruition.

More immediately and troubling, however, is the idea that imaging can be faked. This trick is certainly impressive for an amateur photographer, but it begs an important question: how much longer can we truly believe everything that we see? Perhaps seeing is believing has a limited window as a standard truthbearer in our society.


About the Author:

Reaching the Cloud: Is Everything Serverless?

February 18th, 2019

As it goes in technology, as soon as we all adapt a new term, there will assuredly be another one ready to take its place. As we embrace cloud technology, migrating functions and software for organization, AI potential, timeliness, and flexibility, we are now encountering yet another buzzword: serverless.

Serverless and the cloud may sound similar, both floating off in some distant place, existing beyond your company’s cool server room. But are the cloud and serverless the same? Not quite. This article explores how serverless technology relates to the cloud, as well as, and more importantly, whether you have to adapt a serverless culture.

What is serverless?

Serverless is shorthand for two terms: serverless architecture and serverless computing.

Once we get past the name, serverless is a way of building and deploying software and apps on cloud computers. For all your developers and engineers who are tired of coping with server and infrastructure issues because they’d rather be coding, serverless could well be the answer.

Serverless architecture is the foundation of serverless computing. Generally, three types of software services can function well on serverless architecture: function-as-a-service (FaaS), backend-as-a-service (BaaS), and databases.

Serverless code, then, relies on serverless architecture to develop stand-alone apps or microservices without provisioning servers, as is required in traditional (server-necessary) coding. Of course, serverless coding can also be used in tandem with traditional coding. An app or software that runs on serverless code is triggered by events and its overall execution is managed by the cloud provider. Pricing varies but is generally based on the number of executions (as opposed to a pre-purchased compute capacity that other cloud services you use may rely on).

As for the name itself: calling something “serverless” is a bit of a misnomer because serverless anything isn’t possible. Serverless software and apps still rely on a server, it’s just not one that you maintain in-house. Instead, your cloud provider, such as Google, AWS, Azure, or IBM, acts as your server and your server manager, allocating your machine resources.

The cloud vs. serverless

While the cloud and serverless are certainly related, there’s a better reason why we are hearing about serverless technologies ad nauseum. Because cloud leaders like AWS, Google, Azure, and IBM are investing heavily in serverless (and that’s a ton of money, to be sure).

Just as these companies spearheaded a global effort to convince companies their apps and data can perform and store better in the cloud, they are now encouraging serverless coding and serverless architecture so that you continue to use their cloud services.

Serverless benefits

Is everything serverless? Will everything be serverless soon? In short, no and no.

The longer answer is that serverless architecture and serverless computing are good for simple applications. In serverless coding, your cloud provider takes care of the server-side infrastructure, freeing up your developers to focus on your business goals.

Your developers may already be working on serverless code – or they want to be. That’s because it frees them from the headache of maintaining infrastructure. They can dispense with annoying things like provisioning a server, ensuring its functionality, creating test environments, and maintaining server uptime, which means they are focused primarily on actual developing.

As long as the functionality is appropriate, serverless can provide the following benefits:

  • Efficient use of resources
  • Rapid testing and deployment, as multiple environments are a breeze to set up
  • Reduced cost (server maintenance, team support, etc.)
  • Focus on coding – may result in increased productivity around business goals
  • Familiar programming languages and environment
  • Increased scalability

Traditional code isn’t going anywhere (yet)

While focusing on your core business is always a good goal, the reality is that serverless isn’t a silver bullet for your coding or your infrastructure.

Depending on your business, it’s likely that some products and apps require more complex functions. For these, serverless may be the wrong move. Traditional coding still offers many benefits, despite still requiring fixed resources that require provisioning, states, and human maintenance. Networking is easier because everything lives within your usual environment. And, let’s face it: unless you’re a brand-new startup, you probably already have the servers and tech staff to support traditional coding and architecture.

Computationally, serverless has strict limits. Most cloud providers price serverless options based on time: how many seconds or minutes does an execution take? Unfortunately, the more complex your execution, the more likely you’re go past the maximum time allowed, which hovers around 300 seconds (five minutes). With a traditional environment, however, there is no timeout limit. Your servers are dedicated to your executions, no matter how long they take or how many external databases they have to reference. This can make activities like testing and external call up harder or impossible to accomplish.

From a business perspective, you have to decide what you value more: only paying for what you use (caveat emptor), with decreased opex costs. Or, perhaps control is tantamount, as you are skeptical of the trust and security risk factors that come with using a third party. Plus, not all developers work the same. While some devs want to use cutting-edge technology that allows them to focus on front-end logic, others prefer the control and holistic access that traditional architecture and coding provides.

About the Author:

When Technology Moves Faster Than Training, Bad Things Happen

February 18th, 2019

Technology is changing how we design training, and it should. Unfortunately, many instructional designers are not producing the learning programs and products that today’s technical talent needs. Not because they don’t want to, but because many companies don’t support their efforts to advance their work technologically or financially.

That’s a mistake. Technology has already changed learning design. Those who don’t acknowledge this appropriately are doing their organizations – and their technical talent – a disservice.

Bob Mosher, chief learning evangelist for Apply Synergies, a learning and performance solutions company, said we can now embed technology in training in ways we never could before. E-learning, for instance, has been around in some for or another, but it always sat in an LMS or outside of the technology or whatever subject matter it was created to support. That’s no longer the case.

“Now I don’t have to leave the CRM or ERP software, or cognitively leave my workflow,” Mosher explained. “I get pop ups, pushes, hints, lessons when I need them, while I’m staring at what I’m doing. These things guide me through steps; they take over my machine, they watch me perform and tell me when and where I go wrong. Technology has allowed us to make all of those things more adaptive.”

Of course, not all learning design affected by technology is adaptive, but before adaptive learning came on the scene, training was more pull than push, which can be problematic. If you don’t know what you don’t know, you may proceed blindly thinking that, “oh, I’m doing great,” when you’re really not. Mosher said adaptive learning technologies that monitor learner behavior and quiz and train based on an individual’s answers and tactics, can be extremely powerful.

But – there’s almost always a but – many instructional designers are struggling with this because they’re more familiar with event-based training design. Designing training for the workflow is very different animal.

The Classroom Is Now a Learning Lab

“It’s funny, for years we’ve been talking about personalized learning, but we’ve misunderstood it thinking we have to design the personalized experience for every learner,” Mosher said. “But how do I design something personalized for you? I can give you the building blocks, but in the end, no one can personalize better than the learners themselves. Designing training for the workflow is a very different animal.”

In other words, new and emerging technologies are brilliant because they enable learners to customize the learning experience and adapt it to the work they do every day. But it’s one thing to have these authoring technologies and environments; it’s something else for an instructional designer to make the necessary shift and use them well.

Further, learning leaders will have to use the classroom differently, leveraging the different tools at their disposal appropriately. “If I know I have this embedded technology in IT, that these pop ups are going to guide people through, say, filling out a CRM, why spend an hour of class teaching them those things? I can skip that,” Mosher said. “Then my class becomes more about trying those things out.”

That means learning strategies that promote peer learning, labs and experiential learning move to the forefront, with adaptive training technology as the perfect complement. Antiquated and frankly ineffective technical training methods filled with clicking, learning by repetition through menus, and procedural drilling should be retired post haste in favor of context-rich learning fare.

Then instructors can move beyond the sage-on-the-stage role, and act as knowledge resources and performance support partners, while developers and engineers write code and metaphorically get their hands dirty. “If I have tools that help me with the procedures when I’m not in class, in labs I can do scenarios, problem solving, use cases, have people bounce ideas and help me troubleshoot when I screw up,” Mosher said. “I’m not taking a lesson to memorize menus.”

Learning Leaders, Act Now

Learning leaders who want to adapt to technology changes in training design must first secure appropriate budget. Basically, you can’t use cool technology for training unless you actually buy said cool technology. Budgetary allocations and experimentation must be done, and instructional designers have to have the time and latitude to upgrade their skills as well because workflow learning is a new way of looking at design.

“Everyone wants agile instructional design, but they want to do it the old way,” Moshers said. “You’re not going to get apples from oranges. Leadership has to loosen the rope a little bit so instructional designers (IDs) can change from the old way of designing to the new way.

“IT’s been agile for how long now? Yet we still ask IDs to design in a waterfall, ADDIE methodology. That’s four versions behind. Leadership has to understand that to get to the next platform, there’s always a learning curve. There’s an investment that you don’t get a return on right away – that’s what an investment is.”

For learning leaders who want to get caught up quickly and efficiently, Mosher said it can be advantageous to use a vendor. They’re often on target with the latest instructional design approaches and have made the most up to date training technology investments. But leadership must communicate with instructional designers to avoid resistance.

“Good vendors aren’t trying to put anybody out of a job, or call your baby ugly,” he explained. “It’s more like, look. You’ve done great work and will continue to do great work, but you’re behind. You deserve to be caught up.”

The relationship should be a partnership where vendor and client work closely together. “Right,” Mosher said. “If you choose the right vendor.”

About the Author:

Working with ElasticSearch

January 26th, 2018

The Working with ElasticSearch training course teaches architects, developers, and administrators the skills and knowledge needed to use Elasticsearch as a data index or data store with Kibana as the front-end and programatic access using the Application Program Interfaces (APIs) using Python.

The ElasticSearch training course begins by examining how to install, configure, and run Elasticsearch and Kibana. With the foundation laid, the course then examines how to configure Elasticsearch data mappings and simple data loading. Next querying Elasticsearch using Kibana is discussed. Day two begins with a deeper dive into how Elasticsearch indexes and searches data, and how it provides clustering and fault tolerance. Next configuration of data indexing and analysis is reviewed. Finally the various major Elasticsearch APIs are explored and exercised.

The Working with ElasticSearch course assumes some familiarity with Python (limited), Extensible Markup Language (XML), JavaScript Object Notation (JSON), and command line tools.

About the Author:

Cleaning Dirty Data with Pandas & Python

August 10th, 2017

Pandas is a popular Python library used for data science and analysis. Used in conjunction with other data science toolsets like SciPy, NumPy, and Matplotlib, a modeler can create end-to-end analytic workflows to solve business problems.

While you can do a lot of really powerful things with Python and data analysis, your analysis is only ever as good as your dataset. And many datasets have missing, malformed, or erroneous data. It’s often unavoidable–anything from incomplete reporting to technical glitches can cause “dirty” data.

Thankfully, Pandas provides a robust library of functions to help you clean up, sort through, and make sense of your datasets, no matter what state they’re in. For our example, we’re going to use a dataset of 5,000 movies scraped from IMDB. It contains information on the actors, directors, budget, and gross, as well as the IMDB rating and release year. In practice, you’ll be using much larger datasets consisting of potentially millions of rows, but this is a good sample dataset to start with.

Unfortunately, some of the fields in this dataset aren’t filled in and some of them have default values such as 0 or NaN (Not a Number).

Screen Shot 2017-08-09 at 5.19.32 PM.png

No good. Let’s go through some Pandas hacks you can use to clean up your dirty data.

Getting started

To get started with Pandas, first you will need to have it installed. You can do so by running:

$ pip install pandas

Then we need to load the data we downloaded into Pandas. You can do this with a few Python commands:

import pandas as pd

data = pd.read_csv(‘movie_metadata.csv’)

Make sure you have your movie dataset in the same folder as you’re running the Python script. If you have it stored elsewhere, you’ll need to change the read_csv parameter to point to the file’s location.

Look at your data

To check out the basic structure of the data we just read in, you can use the head() command to print out the first five rows. That should give you a general idea of the structure of the dataset.


When we look at the dataset either in Pandas or in a more traditional program like Excel, we can start to note down the problems, and then we’ll come up with solutions to fix those problems.

Pandas has some selection methods which you can use to slice and dice the dataset based on your queries. Let’s go through some quick examples before moving on:

  • Look at the some basic stats for the ‘imdb_score’ column: data.imdb_score.describe()
  • Select a column: data[‘movie_title’]
  • Select the first 10 rows of a column: data[‘duration’][:10]
  • Select multiple columns: data[[‘budget’,’gross’]]
  • Select all movies over two hours long: data[data[‘duration’] > 120]
Deal with missing data

One of the most common problems is missing data. This could be because it was never filled out properly, the data wasn’t available, or there was a computing error. Whatever the reason, if we leave the blank values in there, it will cause errors in analysis later on. There are a couple of ways to deal with missing data:

  • Add in a default value for the missing data
  • Get rid of (delete) the rows that have missing data
  • Get rid of (delete) the columns that have a high incidence of missing data

We’ll go through each of those in turn.

Add default values

First of all, we should probably get rid of all those nasty NaN values. But what to put in its place? Well, this is where you’re going to have to eyeball the data a little bit. For our example, let’s look at the ‘country’ column. It’s straightforward enough, but some of the movies don’t have a country provided so the data shows up as NaN. In this case, we probably don’t want to assume the country, so we can replace it with an empty string or some other default value.

data.country = data.country.fillna(‘’)

This replaces the NaN entries in the ‘country’ column with the empty string, but we could just as easily tell it to replace with a default name such as “None Given”. You can find more information on fillna() in the Pandas documentation.

With numerical data like the duration of the movie, a calculation like taking the mean duration can help us even the dataset out. It’s not a great measure, but it’s an estimate of what the duration could be based on the other data. That way we don’t have crazy numbers like 0 or NaN throwing off our analysis.

data.duration = data.duration.fillna(data.duration.mean())

Remove incomplete rows

Let’s say we want to get rid of any rows that have a missing value. It’s a pretty aggressive technique, but there may be a use case where that’s exactly what you want to do.

Dropping all rows with any NA values is easy:


Of course, we can also drop rows that have all NA values:


We can also put a limitation on how many non-null values need to be in a row in order to keep it (in this example, the data needs to have at least 5 non-null values):


Let’s say for instance that we don’t want to include any movie that doesn’t have information on when the movie came out:


The subset parameter allows you to choose which columns you want to look at. You can also pass it a list of column names here.

Deal with error-prone columns

We can apply the same kind of criteria to our columns. We just need to use the parameter axis=1 in our code. That means to operate on columns, not rows. (We could have used axis=0 in our row examples, but it is 0 by default if you don’t enter anything.)

Drop the columns with that are all NA values:

data.dropna(axis=1, how=’all’)

Drop all columns with any NA values:

data.dropna(axis=1, how=’any’)

The same threshold and subset parameters from above apply as well. For more information and examples, visit the Pandas documentation.

Normalize data types

Sometimes, especially when you’re reading in a CSV with a bunch of numbers, some of the numbers will read in as strings instead of numeric values, or vice versa. Here’s a way you can fix that and normalize your data types:

data = pd.read_csv(‘movie_metadata.csv’, dtype={‘duration’: int})

This tells Pandas that the column ‘duration’ needs to be an integer value. Similarly, if we want the release year to be a string and not a number, we can do the same kind of thing:

data = pd.read_csv(‘movie_metadata.csv’, dtype={title_year: str})

Keep in mind that this data reads the CSV from disk again, so make sure you either normalize your data types first or dump your intermediary results to a file before doing so.

Change casing

Columns with user-provided data are ripe for corruption. People make typos, leave their caps lock on (or off), and add extra spaces where they shouldn’t.

To change all our movie titles to uppercase:


Similarly, to get rid of trailing whitespace:


We won’t be able to cover correcting spelling mistakes in this tutorial, but you can read up on fuzzy matching for more information.

Rename columns

Finally, if your data was generated by a computer program, it probably has some computer-generated column names, too. Those can be hard to read and understand while working, so if you want to rename a column to something more user-friendly, you can do it like this:

data.rename(columns = {‘title_year’:’release_date’, ‘movie_facebook_likes’:’facebook_likes’})

Here we’ve renamed ‘title_year’ to ‘release_date’ and ‘movie_facebook_likes’ to simply ‘facebook_likes’. Since this is not an in-place operation, you’ll need to save the DataFrame by assigning it to a variable.

data = data.rename(columns = {‘title_year’:’release_date’, ‘movie_facebook_likes’:’facebook_likes’})

Save your results

When you’re done cleaning your data, you may want to export it back into CSV format for further processing in another program. This is easy to do in Pandas:

data.to_csv(‘cleanfile.csv’ encoding=’utf-8’)

More resources

Of course, this is only the tip of the iceberg. With variations in user environments, languages, and user input, there are many ways that a potential dataset may be dirty or corrupted. At this point you should have learned some of the most common ways to clean your dataset with Pandas and Python.

For more resources on Pandas and data cleaning, see these additional resources:

About the Author:

Building a Serverless Chatbot w/ AWS, Zappa, Telegram, and api.ai

August 2nd, 2017

If you’ve ever had to set up and maintain a web server before, you know the hassle of keeping it up-to-date, installing security patches, renewing SSL certificates, dealing with downtime, rebooting when things go wrong, rotating logs and all of the other ‘ops’ that come along with managing your own infrastructure. Even if you haven’t had to manage a web server before, you probably want to avoid all of these things.

For those who want to focus on building and running code, serverless computing provides fully-managed infrastructure that takes care of all of the nitty-gritty operations automatically.

In this tutorial, we’ll show you how to build a chatbot which performs currency conversions. We’ll make the chatbot available to the world via AWS Lambda, meaning you can write the code, hit deploy, and never worry about maintenance again. Our bot’s brain will be powered by api.ai, a natural language understanding platform owned by Google.


In this post we’ll walk you through building a Telegram Bot. We’ll write the bot in Python, wrap it with Flask and use Zappa to host it on AWS Lambda. We’ll add works-out-the-box AI to our bot by using api.ai.

By the end of this post, you’ll have a fully-functioning Chatbot that will respond to Natural Language queries. You’ll be able to invite anyone in the world to chat with your bot and easily edit your bot’s “brain” to suit your needs.

Before We Begin

To follow along with this tutorial, you’ll have to have a valid phone number and credit card (we’ll be staying within the free usage limits of all services we use, so you won’t be charged). Specifically, you’ll need:

  • …to sign up with Amazon Web Services. The signup process can be a bit long, and requires a valid credit card. AWS offers a million free Lambda requests per month, and our usage will stay within this free limit.
  • …to sign up with api.ai. Another lengthy sign-up process, as it requires integration with the Google Cloud Platform. You’ll be guided through this process when you sign up with api.ai. Usage is currently free.
  • …to sign up with Telegram, a chat platform similar to the more popular WhatsApp. You’ll need to download one of their apps (for Android, iPhone, Windows Phone, Windows, MacOS, or Linux) in order to register, but once you have an account you can also use it from web.telegram.org. You’ll also need a valid phone number. Telegram is completely free.
  • …basic knowledge of Python and a working Python environment (that is, you should be able to run Python code and install new Python packages). Preferably, you should have used Python virtual environments before, but you should be able to keep up even if you haven’t. All our code examples use Python 3, but most things should be Python 2 compatible.

If you’re aiming to learn how to use the various services covered in this tutorial, we suggest you follow along step by step, creating each component as it’s needed. If you’re impatient and want to get a functioning chatbot set up as fast as possible, you can clone the GitHub repository with all the code presented here and use that as a starting point.

Building an Echo Bot

When learning a new programming language, the first program you write is one which outputs the string “Hello, World!” When learning to build chatbots, the first bot you build is one that repeats everything you say to it.

Achieving this proves that your bot is able to accept and respond to user input. After that, it’s simple enough to add the logic to make your bot do something more interesting.

Getting a Token for Our New Bot

The first thing you need is a bot token from Telegram. You can get this by talking to the @BotFather bot through the Telegram platform.

In your Telegram app, open a chat with the official @BotFather Chatbot, and send the command /newbot. Answer the questions about what you’ll use for your new bot’s name and username, and you’ll be given a unique token similar to 14438024:AAGI6Kh8ew4wUf9-vbqtb3S4sIM7nDlcXj3. We’ll use this token to prove ownership of our new bot, which allows us to send and receive messages through the Bot.

We can now control our new bot via Telegram’s HTTP API. We’ll be using Python to make calls to this API.

Writing the First Code for Our New Bot

Create a new directory called currencybot to house the code we need for our bot’s logic, and create three Python files in this directory named config.py, currencybot.py, and bot_server.py The structure of your project should be as follows:


in config.py we need a single line of code defining the bot token, as follows (substitute with the token you received from BotFather).

bot_token = "14438024:AAGI6Kh8ew4wUf9-vbqtb3S4sIM7nDlcXj3"

In currencybot.py we need to put the logic for our bot, which revolves around receiving a message, handling the message, and sending a message. That is, our bot receives a message from some user, works out how to respond to this message, and then sends the response. For now, because we are building an echo bot, the handling logic will simply return any input passed to it back again.

Add the following code to currencybot.py:

import requests
import config

# The main URL for the Telegram API with our bot's token
BASE_URL = "https://api.telegram.org/bot{}".format(config.bot_token)

def receive_message(message):
    """Receive a raw message from Telegram"""
        message = str(msg["message"]["text"])
        chat_id = msg["message"]["chat"]["id"]
        return message, chat_id
    except Exception as e:
        return (None, None)
def handle_message(message):
    """Calculate a response to the message"""
    return message
def send_message(message, chat_id):
    """Send a message to the Telegram chat defined by chat_id"""
    data = {"text": message.encode("utf8"), "chat_id": chat_id}
    url = BASE_URL + "/sendMessage"
        response = requests.post(url, data).content
    except Exception as e:
def run(message):
    """Receive a message, handle it, and send a response"""
        message, chat_id = receive_message(message)
        response = handle_message(message)
        send_message(response, chat_id)
    except Exception as e:

Finally, bot_server.py is a thin wrapper for our bot that will allow it to receive messages via HTTP. Here we’ll run a basic Flask application. When our bot receives new messages, Telegram will send these via HTTP to our Flask app, which will pass them on to the code we wrote above. In bot_server.py, add the following code:

from flask import Flask
from flask import request
from currencybot import run

app = Flask(__name__)

@app.route("/", methods=["GET", "POST"])
def receive():
        return ""
    except Exception as e:
        return ""

This is a minimal Flask app that imports the main run() function from our currencybot script. It uses Flask’s request module (distinct from the requests library we used earlier, though the names are similar enough to be confusing) to grab the POST data from an HTTP request and convert this to JSON. We pass the JSON along to our bot, which can extract the text of the message and respond to it.

Deploying Our Echo Bot

We’re now ready to deploy our bot onto AWS Lambda so that it can receive messages from the outside world.

We’ll be using the Python library Zappa to deploy our bot, and Zappa will interact directly with our Amazon Web Services account. In order to do this, you’ll need to set up command line access for your AWS account as described here: https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/.

To use Zappa, it needs to be installed inside a Python virtual environment. Depending on your operating system and Python environment, there are different ways of creating and activating a virtual environment. You can read more about how to set one up here. If you’re using MacOS or Linux and have used Python before, you should be able to create one by running the following command.

virtualenv ~/currencybotenv

You should see output similar to the following:

~/git/currencybot g$ virtualenv ~/currencybotenv

Using base prefix '/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6'

New python executable in /Users/g/currencybotenv/bin/python3.6

Also creating executable in /Users/g/currencybotenv/bin/python

Installing setuptools, pip, wheel...done.

The result is that a clean Python environment has been created, which is important so Zappa will know exactly what dependencies to install on AWS Lambda. We’ll install the few dependencies we need for our bot (including Zappa) inside this environment.

Activate the environment by running:

source ~/currencybotenv/bin/activate

You should see your Terminal’s prompt change to indicate that you’re now working inside that environment. Mine looks like this:

(currencybotenv) ~/git/currencybot g$

Now we need to install the dependencies for our bot using pip. Run:

pip install zappa requests flask

At this point, we need to initialize our Zappa project. We can do this by running:

zappa init

This will begin an interactive process of setting up options with Zappa. You can accept all of the defaults by pressing Enter at each prompt. Zappa should figure out that your Flask application is inside bot_server.py and prompt to use bot_server.app as your app’s function.

You’ve now initialized the project and Zappa has created a zappa_settings.json file in your project directory. Next, deploy your bot to Lambda by running the following command (assuming you kept the default environment name of ‘dev’):

zappa deploy dev

This will package up your bot and all of its dependencies, and put them in an AWS S3 bucket, from which it can be run via AWS Lambda. If everything went well, Zappa will print out the URL where your bot is hosted. It should look something like https://l19rl52bvj.execute-api.eu-west-1.amazonaws.com/dev. Copy this URL because you’ll need to instruct Telegram to post any messages sent to our bot to this endpoint.

In your web browser, change the setting of your Telegram bot by using the Telegram API and your bot’s token. To set the URL to which Telegram should send messages to your bot, build a URL that looks like the following, but with your bot’s token and your AWS Lambda URL instead of the placeholders.


For example, your URL should look something like this:


Note that the string bot must appear directly before the token.

Testing Our Echo Bot

Visit your bot in the Telegram client by navigating to t.me/<your-bot’s-username>. You can find a link to your bot in the last message sent by BotFather when you created the bot. Open up a Chat with your bot in the Telegram client and press the /start button.

Now you can send your bot messages and you should receive the same message as a reply.


If you don’t, it’s likely that there’s a bug in your code. You can run zappa tail dev in your Terminal to view the output of your bot’s code, including any error messages.

Teaching Our Bot About Currencies

You’ll probably get bored of chatting to your echo bot pretty quickly. To make it more useful, we’ll teach it how to send us currency conversions.

Add the following two functions to the currencybot.py file. These functions allow us to use the Fixer API to get today’s exchange rates and do some basic calculations.

def get_rate(frm, to):
    """Get the raw conversion rate between two currencies"""
    url = "http://api.fixer.io/latest?base={}&symbols={}".format(frm, to)
        response = requests.get(url)
        js = response.json()
        rates = js['rates']
        return rates.popitem()[1]
    except Exception as e:
        return 0

def get_conversion(quantity=1, frm="USD", to="GBP"):
    rate = get_rate(frm.upper(), to.upper())
    to_amount = quantity * rate
    return "{} {} = {} {}".format(quantity, frm, to_amount, to)

We’ll now expect the user to send currency conversion queries for our bot to compute. For example, if a user sends “5 USD GBP” we should respond with a calculation of how many British Pounds are equivalent to 5 US Dollars. We need to change our handle_message() function to split the message into appropriate parts and pass them to our get_conversion() function. Update handle_message() in currencybot.py to look like this:

def handle_message(message):
    """Calculate a response to a message"""
        qty, frm, to = message.split(" ")[:3]
        qty = int(qty)
        response = get_conversion(qty, frm, to)
    except Exception as e:
        response = "I couldn't parse that"
    return response

This function now parses messages that match the required format into the three parts. If the message doesn’t match what we were expecting, we inform the user that we couldn’t deal with their input.

Save the code and update the bot by running the following command (make sure you are still within your Python virtual environment, in your project directory).

zappa update dev

Testing Our Currency Converter Bot

After the update has completed, you’ll be able to chat with your bot and get currency conversions. You can see an example of the bot converting US Dollars to South African Rands and US Dollars to British Pounds below:


Adding AI to Our Bot

Our bot is more useful now, but it’s not exactly smart. Users have to remember the correct input format and any slight deviations will result in the “I couldn’t parse that” error. We want our bot to be able to respond to natural language queries, such as “How much is 5 dollars in pounds?” or “Convert 3 USD to pounds”. There are an infinite number of ways that users might ask these questions, and extracting the three pieces of information (the quantity, from-currency, and to-currency) is a non-trivial task.

This is where Artificial Intelligence and Machine Learning can help us out. Instead of writing rules to account for each variation of the same question, machine learning lets us learn patterns from existing examples. Using machine learning, we can teach a program to extract the pieces of information that we want by ‘teaching’ it with a number of existing examples. Luckily, someone else has already done this for us, so we don’t need to start from scratch.

Create an account with api.ai, and go through their setup process. Once you get to the main screen, select the “Prebuilt Agents” tab, as shown below


Select the “Currency Converter” agent from the list of options, and choose a Google Cloud Project (or create a new one) to host this agent. Now you can test your agent by typing in a query in the top right-hand corner of the page, as indicated below:


Hit the “Copy Curl” link, which will copy a URL with the parameters you need to programmatically make the same request you just made manually through the web page. It should have copied a string that looks similar to the following into your clipboard.

curl 'https://api.api.ai/api/query?v=20150910&query=convert%201%20usd%20to%20zar&lang=en&sessionId=fed2f39e-6c38-4d42-aa97-0a2076de5c6b&timezone=2017-07-15T18:12:03+0200' -H 'Authorization:Bearer a5f2cc620de338048334f68aaa1219ff'

The important part is the Authorization argument, which we’ll need to make the same request from our Python code. Copy the whole token, including Bearer into your config.py file, which should now look similar to the following:

bot_token = "14438024:AAGI6Kh8ew4wUf9-vbqtb3S4sIM7nDlcXj3"

apiai_bearer = "Bearer a5f2cc620de338048334f68aaa1219ff"

Add the following line to the top of your currencybot.py file:

from datetime import datetime

And add a parse_conversion_query() function below in the same file, as follows:

def parse_conversion_query(query):
    url_template = "https://api.api.ai/api/query?v=20150910&query={}&lang=en&sessionId={}"
    url = url_template.format(query, datetime.now())
    headers = {"Authorization":  config.apiai_bearer}
    response = requests.get(url, headers=headers)
    js = response.json()
    currency_to = js['result']['parameters']['currency-to']
    currency_from = js['result']['parameters']['currency-from']
    amount = js['result']['parameters']['amount']
    return amount, currency_from, currency_to

This reconstructs the cURL command that we copied from the api.ai site for Python. Note that the v=20150910 in the url_template is fixed and should not be updated for the current date. This selects the current version of the api.ai API. We omit the optional timezone argument but use datetime.now() as a unique sessionId.

Now we can pass a natural language query to the api.ai API (if you think that’s difficult to say, just look at the url_template which contains api.api.ai/api/!) It will work out what the user wants in terms of quantity, from-currency and to-currency, and return structured JSON for our bot to parse. Remember that api.ai doesn’t do the actual conversion–its only role is to extract the components we need from a natural language query, so we’ll pass these pieces to the fixer.io API as before. Update the handle_message() function to use our new NLU parser. It should look as follows:

def handle_message(message):
    """Calculate a response to a message"""
        qty, frm, to = parse_conversion_query(message)
        qty = int(qty)
        response = get_conversion(qty, frm, to)
    except Exception as e:
        response = "I couldn't parse that"
    return response

Make sure you’ve saved all your files, and update your deployment again with:

zappa update dev

Testing Our Bot’s AI

Now our bot should be able to convert between currencies based on Natural Language queries such as “How much is 3 usd in Indian Rupees”.


If this doesn’t work, run zappa tail dev again to look at the error log and figure out what went wrong.

Our bot is by no means perfect, and you should easily be able to find queries that break it and cause unexpected responses, but it can handle a lot more than the strict input format we started with! If you want to teach it to handle queries in specific formats, you can use the api.ai web page to improve your bot’s understanding and pattern recognition.


Serverless computing and Chatbots are both growing in popularity, and in this tutorial you learned how to use both of them.

We showed you how to set up a Telegram Chatbot, make it accessible to the world, and plug in a prebuilt brain.

You can now easily do the same using the other pre-built agents offered by api.ai, or start building your own. You can also look at the other Bot APIs offered by Facebook Messenger, Skype, and many similar platforms to make your Bots accessible to a wider audience.

About the Author:

Python 2 vs. Python 3 Explained in Simple Terms

July 13th, 2017

Python is a high level, versatile, object-oriented programming language. Python is simple and easy to learn while also being powerful and highly effective. These advantages make it suitable for programmers of all backgrounds, and Python has become one of the most widely used languages across a variety of fields.

Python differs from most other programming languages in that two incompatible versions, Python 2 and Python 3, are both widely used. This article presents a brief overview of a few of the differences between Python 2 and Python 3 and is primarily aimed at a less-technical audience.

Python 2 (aka Python 2.x)

The second version of Python, Python 2.0, arrived in 2000. Upon its launch, Python introduced many new features that improved upon the previous version. Notably, it included support for Unicode and added garbage collection for better memory management. The Python Foundation also introduced changes in the way the language itself was developed; the development process became more open and included input from the community.

Python 2.7 is the latest (and final) Python 2 release. One feature included in this version is the Ordered Dictionary. The Ordered Dictionary enables the user to create dictionaries in an ordered manner, i.e., they remember the order in which their elements are inserted, and therefore it is possible to print the elements in that order. Another feature of Python 2.x is set literals. Previously, one had to create a set from another type, such as a list, resulting in slower and more cumbersome code.

While these are some prominent features that were included with Python 2.7, there are other features in this release. For instance, Input/Output modules, which are used to write to text files in Python, are faster than before. All the aforementioned features are also present in Python 3.1 and later versions.

Python 3 (aka Python 3.x)

Even though Python 2.x had matured considerably, many issues remained. The print statement was complicated to use and did not behave like Python functions, resulting in more code in comparison to other programming languages. In addition, Python strings were not Unicode by default, which meant that programmers needed to invoke functions to convert strings to Unicode (and back) when manipulating non-ASCII characters (i.e., characters which are not represented on the QWERTY keyboard).

Python 3, which was launched in 2008, was created to solve these problems and bring Python into the modern world. Nine years in, let’s consider how the adoption of Python 3 (which is currently at version 3.6) has fared against the latest Python 2.x release.

The most notable change in Python 3 is that print is now a function rather than a statement, as it was in Python 2. Since print is now a function, it is more versatile than it was in Python 2. This was perhaps the most radical change in the entire Python 3.0 release, and as a result, ruffled the most feathers. Users are now required to write print() instead of print, and programmers naturally object to having to type two additional characters and learn a new syntax. To be fair, the print() function is now able to write to external text files, something which was not possible before, and there are others advantages of it now being a function.

You might think that print becoming a function is a small change and having to type two more characters is not a big issue. But it is one of multiple changes that make Python 3 incompatible with Python 2. The problem of compatibility becomes complicated by the fact that organizations and developers may in fact have large amounts of Python 2 code that needs to be converted to Python 3.

Python 3.6 adds to these changes by allowing optional underscores in numeric literals for better readability (e.g., 1_000_000 vs. 1000000), and in addition extends Python’s functionality for multitasking. (Note that the new features which appear in each successive version of Python 3 are not “backported” to Python 2.7, and as a result, Python 3 will continue to diverge from Python 2 in terms of functionality.)

Should You Care?

It depends. If you are a professional developer who already works with Python, you should consider moving to Python 3 if you haven’t already. In order to make the transition easier, Python 3 includes a tool called 2to3 which is used to transform Python 2 code to Python 3. 2to3 will prove helpful to organizations which are already invested in Python 2.x, as it will help them convert their Python 2 code base to Python 3 as smoothly as possible.

If you are just starting out with Python, your best strategy is to embrace Python 3, although you should be aware that it is incompatible with Python 2, as you may encounter Python 2 code on websites such as stackoverflow and perhaps at your current (or future) workplace.


The overall consideration in 2017 whether to use Python 3 or Python 2 depends on the intended use. Python 2.7 will be supported till 2020 with the latest packages. According to py3readiness.org, which measures how many popular libraries are compatible with Python 3, 345 out of 360 libraries support Python 3. This number will continue to grow in the future as support for 2.7 drops. While Python 2.7 is sufficient for now, Python 3 is definitely the future of the language and is here to stay.

Takeaway: Python 2 is still widely used. Python 3 introduced several features that were not backward compatible with Python 2. It took a while for some popular libraries to support Python 3, but most major libraries now support Python 3, and support for Python 2 will eventually be phased out. Python 2 is still here in 2017 but is gradually on the way out.

About the Author:

An Overview of Python Web Development Options

July 6th, 2017

Python is a powerful language that supports many of the largest sites on the web. There are several prominent Python web development frameworks, each with their own use cases and features.

In fact, you probably visit a website powered by Python every day. Heard of Reddit? Instagram? Yelp? They all use Python.

It can be a little overwhelming to hear all the jargon being thrown around (Django? Flask? Pyramid?), so we’re going to break things down step-by-step here. Who’s using Python for web development, why, and what are the options out there as a developer?


Django is the most robust and full-featured of the pack. It has also been around the longest. Their motto is “The web framework for perfectionists with deadlines.” Because of this, the framework is very pragmatic and structured, but it can be quite opinionated at times. If you’re doing something that fits into their “way” of doing things, great. But if you have a more off-the-wall project, it may be more difficult to work around Django’s design constraints.

While it interfaces with databases quite well, it can be a lot of overhead if you’re just wanting to make a small project.

Here is a selection of popular sites that use Django:


If you’re starting out and Django is a little too complicated, look no further than Flask. It bills itself as a “microframework” and can set up a running web server in less than 10 lines of code. It’s lightweight, fast, and very customizable.

However, extra libraries or configuration may be needed for more complex sites on Flask. That’s the downside of having creative freedom within the framework. It doesn’t enforce standards like Django, which can be both a pro and a con depending on your use case.

If you’re just looking for a small web server or a personal web site, Flask is a good option.

Sites that use Flask include:


Pyramid seeks to bridge the gap between “megaframeworks” like Django and “microframeworks” like Flask. Their motto is “smart small, finish big, stay finished.” This means that you can get a web service up and running easily (similar to Flask), but Pyramid provides more resources and libraries to support scaling your site as well.

Companies using Pyramid include:

Static Site Generators

Static site generators are the new kids on the web development block. Instead of describing your website in a programming language you may or may not fully understand, static site generators allow you to write posts in (more or less) plain text. Many static site generators let you write in Markdown, which is basically just plain text with a little extra seasoning for formatting text and links. They then use a rendering engine to make your text appear on the web page in a structured and styled form. These sites are even more lightweight than Flask, which means there’s very little if any overhead to learn and set up.

The word “static” here means that you cannot interface with a database within this website. That means stuff like databases, registering new users, and dynamic code execution are not possible within this model. Some people see this as a perk rather than a limitation; many of the most nefarious web security vulnerabilities come from leaving the database exposed. If there’s no database to begin with, many of those vulnerabilities do not exist for your site.

If you’re looking for something that you can update easily and doesn’t have all the security worries of something like WordPress, give a static site generator a try.

Some of the most popular static site generators for Python include:

Pros and Cons of Python for Web Development

Because Python is a pretty simple and intuitive language to pick up, it’s accessible to coders and non-coders alike. The thriving Python development community ensures there’s a wealth of packages available to help you program just about anything you can imagine. Put a couple libraries together with one of the frameworks mentioned above, and the possibilities are limitless.

That being said, it’s not perfect. Here are some situations where you might not want to use Python:

  • Mobile development
  • Memory-intensive calculations
  • Performance-critical applications

Despite the shortcomings, Python is a strong choice for web developers old and new.

About the Author:

Intro to the Serverless Framework: Building an API

June 14th, 2017

Cloud computing services have been revolutionary to how software systems are developed and deployed. One growing trend in this area has been the rise in popularity of serverless architecture. In the past, serverless described an application architecture that heavily relied on 3rd party services that manage server-side logic and state, typically referred to as Backend-As-A-Services or BaaS. However, today the term is refers to server side logic that is run in stateless, event triggered, and ephemeral compute containers that are managed by a 3rd party and is commonly called Function-As-A-Service or FaaS.

AWS Lambda is widely seen as the pioneer of the serverless space but all of the major cloud players now have competing products in the space. Frameworks like Serverless, Apex, and Chalice are built on top of the various serverless platforms in order to extend their functionality and make serverless products/platforms easier to work with.

The serverless style of architecture comes with a variety of benefits, namely:

  • Easier operational management as the platform separates the application from the infrastructure that it is running on.
  • Innovation happens quicker because of the aforementioned separation allows for a focus on the application logic rather than concerns stemming from systems engineering of the infrastructure.
  • Reduced operations costs as you only pay for the time and resources needed to execute a function.

Compared to a traditional server-side setup, the gains of these benefits can be understood in the context of the development life cycle. When deploying a new feature or bug fix, the whole backend or service where that code appears must temporarily be down for the update to be applied. Any system downtime can result in the loss of data and a poor user experience. With redundancy and the right deployment configurations, this can be mitigated. However, upkeep of such a setup incurs cost in server resources, its own development and maintenance, and dedicated personnel time.

With serverless architecture, developers can apply updates piecemeal with none of the risks of downtime as each function is an independent resource. This encourages a modular style of writing code that is recommended as a best practice for development and testing. As an independent resource, the code is run only when called, meaning there is no cost for idly running.

In this article, we will be using the Serverless Framework, an open-source application framework, to build serverless architectures on AWS Lambda and other cloud based services. We are going to build a secure API for a ToDo application and write the server side functions to run on Lambda. Many tutorials for front end tools and frameworks use the ToDo application for teaching their basic concepts. We want to consider what the setup of a backend for such an application could look like to process server-side logic such as storing and accessing data.


Please make sure that you have installed Node.js on your computer (to be able to follow along). Following the directions in the Serverless documentation, you can install the command line tool serverless. Please note that at the time of the writing of this article there is a known issue with Node.js version 8.0.
To get an idea about the basic structure of a Serverless applications, use the command line tool to create an empty project.

eddie:serverless$ serverless create --template aws-nodejs --path serverless-demo
Serverless: Generating boilerplate...
Serverless: Generating boilerplate in "/Users/ekollar/Development/serverless/serverless-demo"


Serverless: Successfully generated boilerplate for template: "aws-nodejs"

Looking at the directory structure we can see that the boilerplate include just two files:

eddie:serverless$ tree serverless-demo
├── handler.js
└── serverless.yml

Inside handler.js we see the code to managed and executed in Lambda:

'use strict';
module.exports.hello = (event, context, callback) => {
  const response = {
    statusCode: 200,
    body: JSON.stringify({
      message: 'Go Serverless v1.0! Your function executed successfully!',
      input: event,
  callback(null, response);
  // Use this code if you don't use the http event with the LAMBDA-PROXY integration
  // callback(null, { message: 'Go Serverless v1.0! Your function executed successfully!', event });

A look at the configuration file serverless.yml shows several commented lines generated where we can see the options for various cloud services. Below are only the uncommented lines that configure this demo project:

service: serverless-demo
name: aws
runtime: nodejs6.10
handler: handler.hello

The service section describes the name of project; provider contains the configuration options for cloud service provider; functions sections contains configurations relating to what functions are available: their naming, what code they relate to, and what events can access them.
Our next step is to go inside the project directory where we’ll use the command line tool to deploy this function on AWS.

ekollar:serverless-demo$ serverless deploy -v
Serverless: Packaging service...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
CloudFormation - CREATE_IN_PROGRESS - AWS::CloudFormation::Stack - serverless-demo-dev
CloudFormation - CREATE_IN_PROGRESS - AWS::S3::Bucket - ServerlessDeploymentBucket
CloudFormation - CREATE_IN_PROGRESS - AWS::S3::Bucket - ServerlessDeploymentBucket
CloudFormation - CREATE_COMPLETE - AWS::S3::Bucket - ServerlessDeploymentBucket
CloudFormation - CREATE_COMPLETE - AWS::CloudFormation::Stack - serverless-demo-dev
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (409 B)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
CloudFormation - UPDATE_IN_PROGRESS - AWS::CloudFormation::Stack - serverless-demo-dev
CloudFormation - CREATE_IN_PROGRESS - AWS::Logs::LogGroup - HelloLogGroup
CloudFormation - CREATE_IN_PROGRESS - AWS::Logs::LogGroup - HelloLogGroup
CloudFormation - CREATE_COMPLETE - AWS::Logs::LogGroup - HelloLogGroup
CloudFormation - CREATE_IN_PROGRESS - AWS::IAM::Role - IamRoleLambdaExecution
CloudFormation - CREATE_IN_PROGRESS - AWS::IAM::Role - IamRoleLambdaExecution
CloudFormation - CREATE_COMPLETE - AWS::IAM::Role - IamRoleLambdaExecution
CloudFormation - CREATE_IN_PROGRESS - AWS::Lambda::Function - HelloLambdaFunction
CloudFormation - CREATE_IN_PROGRESS - AWS::Lambda::Function - HelloLambdaFunction
CloudFormation - CREATE_COMPLETE - AWS::Lambda::Function - HelloLambdaFunction
CloudFormation - CREATE_IN_PROGRESS - AWS::Lambda::Version - HelloLambdaVersionLLztSdO2tYQbTC7ic22ZpdDWkh9zLOvbnQsXl4gZ0
CloudFormation - CREATE_IN_PROGRESS - AWS::Lambda::Version - HelloLambdaVersionLLztSdO2tYQbTC7ic22ZpdDWkh9zLOvbnQsXl4gZ0
CloudFormation - CREATE_COMPLETE - AWS::Lambda::Version - HelloLambdaVersionLLztSdO2tYQbTC7ic22ZpdDWkh9zLOvbnQsXl4gZ0
CloudFormation - UPDATE_COMPLETE_CLEANUP_IN_PROGRESS - AWS::CloudFormation::Stack - api-service-dev
CloudFormation - UPDATE_COMPLETE - AWS::CloudFormation::Stack - serverless-demo-dev
Serverless: Stack update finished...
Service Information
service: serverless-demo
stage: dev
region: us-east-1
api keys:
hello: serverless-demo-dev-hello
Stack Outputs
HelloLambdaFunctionQualifiedArn: HelloLambdaFunctionQualifiedArn: arn:aws:lambda:us-east-1:743238559645:function:serverless-demo-dev-hello:1
ServerlessDeploymentBucketName: serverless-demo-dev-serverlessdeploymentbucket-zi9rpv2yn3uc

Diving into this output we learn a few things about how a serverless deployment is configured on the AWS infrastructure. There are three services being utilized by this: Cloudformation, S3, and Lambda. Cloudformation is a platform that allows users to create and manage a collection of related AWS resources. S3 is short for Simple Storage Service, which is an object store with a web interface which allows for storage and retrieval of data. This is where the code will reside, in a in a designated bucket named serverless-demo-dev-serverlessdeploymentbucket-zi9rpv2yn3uc.
The Service Information section looks familiar with some addition information to the configurations from the serverless.yaml file. The keys stage, region, and api keys are in fact default configurations that can be set up in that YAML file. The staging environment the code will be deployed to is defined by stage, region defines which geographical region of the AWS infrastructure the code will reside on, and lastly we have api keys that list out the names of keys to be used to securely call our Lambda functions. In a future step, we will set this up.
The last bit of information is the ARN (Amazon Resource Name) for the Lambda function which helps to uniquely identify resources in AWS.
To see what is returned from a call to this ARN we can run the command line Serverless tool to call the function directly:

eddie:serverless-demo$ serverless invoke --function hello --log
  "statusCode": 200,
  "body": "{\"message\":\"Go Serverless v1.0! Your function executed successfully!\",\"input\":{}}"
START RequestId: c5453e4c-4e02-11e7-af1d-fb217ce93c95 Version: $LATEST
END RequestId: c5453e4c-4e02-11e7-af1d-fb217ce93c95
REPORT RequestId: c5453e4c-4e02-11e7-af1d-fb217ce93c95 Duration: 1.70 ms Billed Duration: 100 ms Memory Size: 1024 MB Max Memory Used: 19 MB

The JSON data is what is returned to requester of the Lambda function and in our example we have an HTTP response. We’ll be creating an HTTP endpoint configuration for this function so that an application external to AWS can call it.
In our serverless.yml file we will add an event for the function:

  handler: handler.hello
    - http:
        path: sayhello
        method: get

After this change we need to deploy again:
eddie:serverless-demo$ serverless deploy -v
In the out you can see a number of provisioning and configuration steps taking place that we won’t go into detail. You will notice that now there is a new service being used, ApiGateway. As the name implies this service allows for the configuration and use of APIs.

 GET - https://utg4c2yny6.execute-api.us-east-1.amazonaws.com/dev/sayhello

Running request on this endpoint will give you the full response along with the message from the direct call to the function.

curl https://utg4c2yny6.execute-api.us-east-1.amazonaws.com/dev/sayhello

As we know it’s not good practice to have insecure endpoints, we are going to add configuration to generate an API key and secure our call to sayhello. Here is what the full revised serverless.yml file will look like:

service: serverless-demo
name: aws
runtime: nodejs6.10
stage: dev
region: us-east-1
  - secret
  handler: handler.hello
    - http:
        path: sayhello
        method: get
        private: true

Our next deploy will update the configuration and in the Service Information we will see the API key generated by AWS:
api keys:
secret: Dt7CiOXofX3TeRvxZxOfe11RVwRZVeSp7OhNXsIv
If we try running the curl command again we know get an error message:
At the time of the writing of this article, the team working on Serverless is implementing automation to have the API key associated to endpoints. For now let me walk you through how to create a Usage Plan for you endpoint, this defines configuration that defines throttling and quota limit on each API key.
Log into your AWS console and navigate to the page for API Gateway. Select Usage Plans in the left side menu. When you click on the Create button a form will popup. Below you can see my configurations. Feel free to adjust them as needed:
Screen Shot 2017-06-10 at 11.35.22 AM.png
Next we add the associated API stage, which in our case will be serverless-demo-dev:
Screen Shot 2017-06-10 at 11.36.05 AM.png
We’ve already generated an API key through the serverless command line tool earlier, but in this step of the wizard we will look it up and associate it with the Usage Plan:
Screen Shot 2017-06-10 at 11.36.51 AM.png
When you’re done you will see the configuration page for the new Usage Plan:
Screen Shot 2017-06-10 at 11.37.20 AM.png
To test that our key does in fact work add we can now add it as a parameter to the call:

curl https://utg4c2yny6.execute-api.us-east-1.amazonaws.com/dev/sayhello --header "x-api-key: Dt7CiOXofX3TeRvxZxOfe11RVwRZVeSp7OhNXsIv"

You should receive an HTTP response similar to when the endpoint was insecure.
Now we ready to mock up the endpoints for a To Do application. We are interested in providing basic CRUD (Create Read Update Delete) functionality that call be called by updating the handler.js file.

'use strict';
module.exports.create = (event, context, callback) => {
const timestamp = new Date().getTime();
const data = JSON.parse(event.body);
if (typeof data.text !== 'string') {
  console.error('Validation Failed');
  callback(new Error('Couldn\'t create the todo item.'));
const response = {
  statusCode: 200,
  body: JSON.stringify({
    message: 'Create todo',
    input: event,
    data: data
callback(null, response);
module.exports.get = (event, context, callback) => {
const id = event.pathParameters.id;
const response = {
  statusCode: 200,
  body: JSON.stringify({
    message: 'Get ToDo: '+id,
    input: event,
callback(null, response);
module.exports.list = (event, context, callback) => {
const response = {
  statusCode: 200,
  body: JSON.stringify({
    message: 'List all ToDos',
    input: event,
callback(null, response);
module.exports.update = (event, context, callback) => {
const timestamp = new Date().getTime();
const data = JSON.parse(event.body);
// validation
if (typeof data.text !== 'string' || typeof data.checked !== 'boolean') {
  console.error('Validation Failed');
  callback(new Error('Couldn\'t update the todo item.'));
const response = {
  statusCode: 200,
  body: JSON.stringify({
    message: 'Update ToDo',
    input: event,
callback(null, response);
module.exports.delete = (event, context, callback) => {
const id = event.pathParameters.id;
const response = {
  statusCode: 200,
  body: JSON.stringify({
    message: 'Delete ToDO',
    input: event,
callback(null, response);

Updating the functions section of the YAML file:

  handler: handler.create
    - http:
        path: todos
        method: post
        cors: true
        private: true
  handler: handler.list
    - http:
        path: todos
        method: get
        cors: true
        private: true
  handler: handler.get
    - http:
        path: todos/{id}
        method: get
        cors: true
        private: true
  handler: handler.update
    - http:
        path: todos/{id}
        method: put
        cors: true
        private: true
  handler: handler.delete
    - http:
        path: todos/{id}
        method: delete
        cors: true
        private: true

With this deploy we now have fully mocked up API:

 POST - https://utg4c2yny6.execute-api.us-east-1.amazonaws.com/dev/todos
 GET - https://utg4c2yny6.execute-api.us-east-1.amazonaws.com/dev/todos
 GET - https://utg4c2yny6.execute-api.us-east-1.amazonaws.com/dev/todos/{id}
 PUT - https://utg4c2yny6.execute-api.us-east-1.amazonaws.com/dev/todos/{id}
 DELETE - https://utg4c2yny6.execute-api.us-east-1.amazonaws.com/dev/todos/{id}
 create: serverless-demo-dev-create
 list: serverless-demo-dev-list
 get: serverless-demo-dev-get
 update: serverless-demo-dev-update
 delete: serverless-demo-dev-delete

Congratulations! We’ve successfully gone through the basics of creating an API hosted on AWS using the serverless command line tool. You know a little about the cloud services used to architect this backend. The next steps are to add persistent storage for our ToDo application.
If you would like the full code from this project please visit the GitHub repository.

About the Author:

Practical Neural Networks with Keras: Classifying Yelp Reviews

June 1st, 2017

Keras is a high-level deep learning library that makes it easy to build Neural Networks in a few lines of Python. In this post, we’ll use Keras to train a text classifier. We’ll use a subset of Yelp Challenge Dataset, which contains over 4 million Yelp reviews, and we’ll train our classifier to discriminate between positive and negative reviews. Then we’ll compare the Neural Network classifier to a Support Vector Machine (SVM) on the same dataset, and show that even though Neural Networks are breaking records in most machine learning benchmarks, the humbler SVM is still a great solution for many problems.

Once we’re done with the classification tasks, we’ll show how to package the trained model so that we can use it for more practical purposes.


This post is aimed at people who want to learn about neural networks, machine learning, and text classification. It will help if you have used Python before, but we’ll explain all of the code in detail, so you should be able to keep up if you’re new to Python as well. We won’t be covering any of the mathematics or theory behind the deep learning concepts presented, so you’ll be able to follow even without any background in machine learning. You should have heard, and have some high-level understanding, of terms such as “Neural Network”, “Machine Learning”, “Classification” and “Accuracy”. If you’ve used SSH, or at least run commands in a shell before, some of the setup steps will be much easier.

If you want to follow along with the examples in this post, you’ll need an account with Amazon Web Services, as we’ll be using their Spot Instance GPU-compute machines for training. You can use your own machine, or from any other cloud provider that offers GPU-compute virtual private servers, but then you’ll need to install and configure:

In our example, we’ll be using the AWS Deep Learning AMI, which has all of the above pre-installed and ready to use.


In this post, we will:

  • Set up an AWS Spot Instance (pre-configured with a Tesla GPU, CUDA, cuDNN, and most modern machine learning libraries)
  • Load and parse the Yelp reviews in a Jupyter Notebook
  • Train and evaluate a simple Recurrent Neural Network with Long Short-Term Memory (LSTM-RNN) using Keras
  • Improve our model by adding a Convolutional Neural Network (CNN) layer
  • Compare the performance of the Neural Network classifier to a simpler SVM classifier
  • Show how to package all of our models for practical use
Setting up an AWS Spot Instance

Because Neural Networks need a lot of computational power to train, and greatly benefit from being run on GPUs, we’ll be running all the code in this tutorial on a Virtual Private Server (VPS) through Amazon Web Services (AWS).

AWS offers cloud machines with high processing power, lots of RAM and modern GPUs. They auction off extra capacity by the hour through “Spot Instances” (https://aws.amazon.com/ec2/spot/) which are specifically designed for short-lived instances. Because we’ll only need the instance for a couple of hours at most, spot instances are ideal for us. The EC2 p2.xlarge instances that we’ll be using usually cost around $1 per hour, while the same machine using Spot Pricing is usually around $0.20 per hour (depending on current demand).

If you don’t have an AWS account, create one at https://aws.amazon.com. You’ll need to go through a fairly long sign-up process, and have a valid credit card, but once you have an account, launching machines in the cloud can be done a few clicks.

Once you have an account, log in to the AWS console at https://console.aws.amazon.com and click on the link to “EC2”, under the “Compute” category. The first thing we need to do is to pick a region into which our instance will be launched. Pick one of “US East (N. Virginia)”, “US West (Oregon)” or “EU (Ireland)” (whichever one is closest to you) as this is where the Deep Learning pre-configured machine images are available. You can select the region from the dropdown in the top right corner of the page. Once you’ve chosen your region, click on the Spot Instances link in the left column, and hit “Request Spot Instances”.


We can accept most of the defaults in the resulting page, but we need to choose:

  • The AMI (Amazon Machine Image): We’ll pick an image with everything we need already installed and configured.
  • The Instance Type: EC2 instances come in different shapes and sizes. We’ll pick one optimized for GPU-compute tasks, specifically the p2.xlarge instance.

The easiest way to find the correct AMI is by its unique ID. Press “Search for AMI”, select “Community AMIs” from the dropdown, and paste the relevant AMI ID from here https://aws.amazon.com/marketplace/pp/B06VSPXKDX into the search box. For example, I am using the eu-west-1 (Ireland) region, so the AMI-ID is ami-c5afaaa3. Hit the select button once you’ve found the AMI, and close the pop-up, shown in the image below.

/Users/g/Desktop/Select AMI.png

To choose the instance type, first delete the default selection by pressing the x circled below, then press the Select button.

In the pop-up, choose “GPU Compute” from the “Instance Type” dropdown, and select p2.xlarge. You can have a look at the current spot price, or press “Pricing History” for a more detailed view. It’s good to check that there haven’t been any price spikes in the past few days. Finally, press “Select” to close the window.


Under the Availability Zone section, tick all three options, as we don’t care which zone our instance is in. AWS will automatically fire up an instance in the (current) cheapest zone. Press on “Next” to get to the second page of options. By “Instance store” tick the “Attach at launch” box, so that our disk will be ready for using the moment our instance boots.

Under “Set keypair and role”, choose a key pair. If you don’t have one yet, press “Create new key pair”, which will generate a public-private key pair. You’ll need to follow the instructions at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html if you’re not used to working with SSH and key pairs, and specifically it’s important to change the permissions on your new private key before you can use by running the following command, substituting ‘my-key-pair.pem’ with the full path and name of your private key.

chmod 400 my-key-pair.pem

The last thing to configure is a security group. Large AWS machines present juicy targets to attackers who want to use them for botnets or other nefarious purposes, so by default they don’t allow any incoming traffic. As we’ll be wanting to connect to our instance via SSH, we need to configure the firewall appropriately. Select “create new security group”, and press “Create Security Group” in the next window. You’ll see a pop-up similar to the one below. Fill in the top three fields and add an incoming rule to allow SSH traffic from your IP address.


Name the new group “allow-ssh”, and add “Allow us to SSH” as a description. Also, change the “VPC” dropdown from “No VPC” to the only other option, which is the VPC you’ll be launching the instance into. Hit the “Add Rule” button on the “Inbound” tab, and choose to allow SSH traffic under “Type”, and choose “My IP” under “Source”. This will automatically whitelist your current IP address, and allow you to connect to your instance. Click “Create”, and close the Security Groups tab to return to the Request Instance page where we were before.

Hit the review button at the bottom right of the page, check that all the details are correct. Then press the blue “Launch” button. Your request should be fulfilled within a few seconds, and you’ll see your instance boot up in the console under “Instances”. (Hit the Refresh button indicated below if you don’t see your new instance). Copy the public IP address, which you might need to scroll to the right to see, to the clipboard.


If the instance status in the “Spot Requests” panel gets stuck in “pending”, it might be because your AWS account hasn’t yet been verified, or because your Spot Instance limit is set to zero. You can see your limits by clicking on “Limits” in the left-hand panel, and looking for “Spot instance requests”. If your account didn’t get properly verified, you’ll be unable to launch instances. In this case, check that you’ve completed all the steps at https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/ and contact AWS Support if necessary.

Connecting to our EC2 Instance

Now we can connect to our instance via SSH. If you’re using Mac or Linux, you can SSH simply by opening a terminal and typing the following command, replacing the with the public IP address that you copied above, and the /path/to/your/private-key.pem with the full path to your private key file.

ssh -i /path/to/your/private-key.pem -L 8888:localhost:8888 ubuntu@

The -i flag is for “identifier”. This cryptographically proves to AWS that we are who we say we are because we have the private key file associated with our instance. The -L flag sets up an SSH tunnel so that we can access a Jupyter Notebook running on our instance as if it were running on our local machine.

If you see an error when trying to connect via SSH, you can run the command again and add a -vvv flag which will show you verbose connection information. Depending on how your account is configured to use VPCs, you may need to use the public DNS address instead of the public IP address. See https://stackoverflow.com/questions/20941704/ec2-instance-has-no-public-dns for some common issues and solutions regarding VPC settings.

If you’re on Windows, you can’t use the ssh command by default, but you can work around this through any of the following options:

  • If you’re on Windows 10, you can use the Windows Subsystem for Linux (WSL). Read about how to set it up and use it here: https://msdn.microsoft.com/commandline/wsl/about
  • If you’re on an older version of Windows, you can use PuTTy (if you prefer having a graphical user interface), or Git Tools for Windows (if you prefer the command line). You can get PuTTy and instructions on how to use it at https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html or Git for Windows here https://git-for-windows.github.io/. If you install Git for Windows, it’s important to select the “Use Git and optional Unix tools” option in the last step of the installer. You’ll then be able to use SSH directly from the Windows command prompt.
  • /Users/g/Desktop/git-for-windows.png
Running a Jupyter Notebook server

After connecting to the instance, we’ll need to run a couple of commands. First, we need to upgrade Keras to version 2, which comes with many API improvements, but which breaks compatibility in a number of ways with older Keras releases. Second, we’ll run a Jupyter Notebook server inside a tmux session so that we can easily run Python code on our powerful AWS machine.

On the server (in the same window that you connected to the server with SSH), run the following commands:

pip3 install keras --upgrade --user

Now open a tmux session by running:


This creates a virtual session in the terminal. We need this in case our SSH connection breaks for any reason (e.g. if your WiFi disconnects). Usually, breaking the connection to the server automatically halts any programs that are currently running in the foreground. We don’t want this to happen while we are a few hours into the training of our neural network! Running a tmux session will allow us to resume our session after reconnecting to the server if necessary, leaving all our code running in the background.

Inside the tmux session, run:

jupyter notebook

You should see output similar to the following. Copy the URL that you see in a web browser on your local machine to open up the notebook interface, from which you can easily run snippets of Python code. Code run in the notebook will be executed on the server, and Keras code will automatically be run in massively parallel batches on the GPU.


Hit ctrl + b and tap the d key to “detach” the tmux session — now the Jupyter session is still running, but it won’t die if the connection is interrupted. You can type tmux a -t 0 (for “tmux attach session 0”) to attach the session again if you need to stop the server or view any of the output.

Loading and parsing the Yelp dataset

Now download the Yelp dataset from https://www.yelp.com/dataset_challenge. You’ll need to fill in a form and agree to only use the data for academic purposes. The data is compressed as a .tgz file, which you can transfer to the AWS instance by running the following command on your local machine (after you have downloaded the dataset from Yelp):

scp -i ~/path/to/your/private/key.pem yelp_dataset_challenge_round9.tgz ubuntu@

Once again, substitute with the path to your private key file and the public IP address of your VPS as appropriate. Note that the command ends with ~. The colon separates the SSH connection string of your instance from the path that you want to upload the file to, and the ~ represents the home directory of your VPS.

Now, on the server untar the dataset by running tar -xvf yelp_dataset_challenge_round9.tgz. This should extract a number of large JSON files to your home directory on the server.

At this point, we’re ready to start running our Python code. Create a notebook on your local machine by firing up your web browser and visiting the URL that was displayed after you ran the Jupyter notebook command. It should look similar to http://localhost:8888?token=<yourlongtoken>. For this to work, you need to be connected to your VPS via SSH with the -L 8888:localhost:8888 as specified earlier. In your browser, click the “new” button in the top right, and choose to create a Python 3 notebook.


If you haven’t used Jupyter notebook before, you’ll love it! You can easily run snippets of Python code in so-called “cells” of the notebook. All variables are available across all cells, so you never have to re-run earlier bits of code just to reload the same data. Create a new cell and run the following code. (You can insert new cells by using the “Insert -> Insert Cell Below” menu at the top, and you can run the code in the current cell by hitting “Cell -> Run Cells”. A useful shortcut is Shift+Enter, which will run the current cell and insert a new one below).

from collections import Counter
from datetime import datetime

import json

from keras.layers import Embedding, LSTM, Dense, Conv1D, MaxPooling1D, Dropout, Activation
from keras.models import Sequential
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

import numpy as np

The above code imports a bunch of libraries for us that we’ll be using later on. The first three are standard Python imports, while Keras and Numpy are third-party libraries that come installed with the Deep Learning AMI that we are using.

To load the reviews from disk, run the following in the next cell:


# Load the reviews and parse JSON
t1 = datetime.now()
with open("yelp_academic_dataset_review.json") as f:
    reviews = f.read().strip().split("\n")
reviews = [json.loads(review) for review in reviews]
print(datetime.now() - t1)

The reviews are structured as JSON objects, one per line. This code loads all the reviews, parses them into JSON, and stores them in a list called reviews. We also print out an indication of how long this took — on the AWS machine, it should run in an under a minute.

Each review in the Yelp dataset contains the text of the review and the associated star rating, left by the reviewer. Our task is to teach a classifier to differentiate between positive and negative reviews, looking only at the review text itself.

It is very important to have a balanced dataset for supervised learning. This means that we want the same amount of positive and negative reviews when we train our neural network to tell the difference between them. If we have more positive reviews than negative reviews, the network will learn that most reviews are positive, and adjust its predictions accordingly. We’ll, therefore, take a sample of the Yelp reviews which contains the same amount of positive (four or five-star reviews) and negative (one, two, or three-star reviews).

# Get a balanced sample of positive and negative reviews
texts = [review['text'] for review in reviews]

# Convert our 5 classes into 2 (negative or positive)
binstars = [0 if review['stars'] <= 3 else 1 for review in reviews]
balanced_texts = []
balanced_labels = []
limit = 100000  # Change this to grow/shrink the dataset
neg_pos_counts = [0, 0]
for i in range(len(texts)):
    polarity = binstars[i]
    if neg_pos_counts[polarity] < limit:
        neg_pos_counts[polarity] += 1

This gets 100 000 positive and 100 000 negative reviews. Feel free to use a higher or lower number, depending on your time constraints. For our dataset, we’ll need a couple of hours to train each of two different neural network models below. If you use less data, you’ll probably get significantly worse accuracy, as neural networks usually need a lot of data to train well. More data will result in longer training times for our neural network.

We can verify that our new dataset is balanced by using a Python Counter. In a new cell, run:

# >>> Counter({0: 100000, 1: 100000})
Tokenizing the texts

Machines understand numbers better than words. In order to train our neural network with our texts, we first need to split each text into words and represent each word by a number. Luckily, Keras has a preprocessing module that can handle all of this for us.

First, we’ll have a look at how Keras’ tokenization and sequence padding works on some toy data, in order to work out what’s going on under the hood. Then, we’ll apply the tokenization to our Yelp reviews.

Keras represents each word as a number, with the most common word in a given dataset being represented as 1, the second most common as a 2, and so on. This is useful because we often want to ignore rare words, as usually, the neural network cannot learn much from these, and they only add to the processing time. If we have our data tokenized with the more common words having lower numbers, we can easily train on only the N most common words in our dataset, and adjust N as necessary (for larger datasets, we would want a larger N, as even comparatively rare words will appear often enough to be useful).

Tokenization in Keras is a two step process. First, we need to calculate the word frequencies for our dataset (to find the most common words and assign these low numbers). Then we can transform our text into numerical tokens. The calculation of the word frequencies is referred to as ‘fitting’ the tokenizer, and Keras calls the numerical representations of our texts ‘sequences’.

Run the following code in a new cell:

tokenizer = Tokenizer(num_words=5)
toytexts = ["Is is a common word", "So is the", "the is common", "discombobulation is not common"]
sequences = tokenizer.texts_to_sequences(toytexts)

# >>> [[1, 1, 4, 2], [1, 3], [3, 1, 2], [1, 2]]
  • In line one, we create a tokenizer and say that it should ignore all except the five most-common words (in practice, we’ll use a much higher number).
  • In line three, we tell the tokenizer to calculate the frequency of each word in our toy dataset.
  • In line four, we convert all of our texts to lists of integers

We can have a look at the sequences to see how the tokenization works.

# >>> [[1, 1, 4, 2], [1, 3], [3, 1, 2], [1, 2]]

We can see that each text is represented by a list of integers. The first text is 1, 1, 4, 2. By looking at the other sequences, we can infer that 1 represents the word “is”, 4 represents “a”, and 2 represents “common”. We can take a look at the tokenizer word_index, which stores to the word-to-token mapping to confirm this:


which outputs:

{'a': 4,
 'common': 2,
 'discombobulation': 7,
 'is': 1,
 'not': 8,
 'so': 6,
 'the': 3,
 'word': 5}

Rare words, such as “discombobulation” did not make the cut of “5 most common words”, and are therefore omitted from the sequences. You can see the last text is represented only by [1,2] even though it originally contained four words, because two of the words are not part of the top 5 words.

Finally, we’ll want to “pad” our sequences. Our neural network can train more efficiently if all of the training examples are the same size, so we want each of our texts to contain the same number of words. Keras has the pad_sequences function to do this, which will pad with leading zeros to make all the texts the same length as the longest one:

padded_sequences = pad_sequences(sequences)


Which outputs:

array([[1, 1, 4, 2],
       [0, 0, 1, 3],
       [0, 3, 1, 2],
       [0, 0, 1, 2]], dtype=int32)

The last text has now been transformed from [1, 2] to [0, 0, 1, 2] in order to make it as long as the longest text (the first one).

Now that we’ve seen how tokenization works, we can create the real tokenized sequences from the Yelp reviews. Run the following code in a new cell.

tokenizer = Tokenizer(num_words=20000)
sequences = tokenizer.texts_to_sequences(balanced_texts)
data = pad_sequences(sequences, maxlen=300)

This might take a while to run. Here, we use the most common 20000 words instead of 5. The only other difference is that we pass maxlen=300 when we pad the sequences. This means that as well as padding the very short texts with zeros, we’ll also truncate the very long ones. All of our texts will then be represented by 300 numbers.

Building a neural network

There are different ways of building a neural network. One of the more complicated architectures, which is known to perform very well on text data, is the Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). RNNs are designed to learn from sequences of data, where there is some kind of time dependency. For example, they are used for time-series analysis, where each data point has some relation to those immediately before and after. By extension, they work very well for language data, where each word is related to those before and after it in a sentence.

The maths behind RNNs gets a bit hairy, and even more so when we add the concept LSTMs, which allows the neural network to pay more attention to certain parts of a sequence, and to largely ignore words which aren’t as useful. In spite of the internal complications, with Keras we can set up one of these networks in a few lines of code[1].

model = Sequential()
model.add(Embedding(20000, 128, input_length=300))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])In line two, we add an Embedding layer. This layer lets the network expand each token to a larger vector, allowing the network to represent words in a meaningful way. We pass 20000 as the first argument, which is the size of our vocabulary (remember, we told the tokenizer to only use the 20 000 most common words earlier), and 128 as the second, which means that each token can be expanded to a vector of size 128. We give it an input_length of 300, which is the length of each of our sequences.
  • In line one, we create an empty Sequential model. We’ll use this to add several “layers” to our network. Each layer will do something slightly different in our case, so we’ll explain these separately below.
  • In line two, we add an Embedding layer. This layer lets the network expand each token to a larger vector, allowing the network to represent a words in a meaningful way. We pass 20000 as the first argument, which is the size of our vocabulary (remember, we told the tokenizer to only use the 20 000 most common words earlier), and 128 as the second, which means that each token can be expanded to a vector of size 128. We give it an input_length of 300, which is the length of each of our sequences.
  • In line three, we add an LSTM layer. The first argument is 128, which is the size of our word embeddings (the second argument from the Embedding layer). We add a 20% chance of dropout with the next two arguments. Dropout is a slightly counter-intuitive concept: we reset a random 20% of the weights from the LSTM layer with every iteration. This makes it more difficult for the neural network to learn patterns, which results in a more robust network, as the rules the network learns are more generalisable.
  • In line four, we add a Dense layer. This is the simplest kind of Neural Network layer, where all neurons in the layer are connected to each other. This layer has an output size of 1, meaning it will always output 1 or 0. We will train the network to make this layer output 1 for positive reviews and 0 for negative ones.
  • In line five, we compile the model. This prepares the model to be run on the backend graph library (in our case, TensorFlow). We use loss=’binary_crossentropy’ because we only have two classes (1 and 0). We use the adam optimizer, which is a relatively modern learning strategy that works well in a number of different scenarios, and we specify that we are interested in the accuracy metric (how many positive/negative predictions our neural network gets correct).
Training our neural network

Now we are ready to train or “fit” the network. This can be done in a single line of code. As this is where the actual learning takes place, you’ll need a significant amount of time to run this step (approximately two hours on the Amazon GPU machine).

model.fit(data, np.array(balanced_labels), validation_split=0.5, epochs=3)

Here, we pass in our padded, tokenized texts as the first argument, and the labels as the second argument. We use validation_split=0.5 to tell our neural network that it should take half of the data to learn from, and that it should test itself on the other half. This means it will take half the reviews, along with their labels, and try to find patterns in the tokens that represent positive or negative labels. It will then try to predict the answers for the other half, without looking at the labels, and compare these predictions to the real labels, to see how good the patterns it learned are.

It’s important to have validation data when training a neural network. The network is powerful enough to find patterns even in random noise, so by seeing that it’s able to get the correct answers on ‘new’ data (data that it didn’t look at during the training stage), we can verify that the patterns it is learning are actually useful to us, and not overly-specific to the data we trained it on.

The last argument we pass, epochs=3, means that the neural network should run through all of the available training data three times.

You should (slowly) see output similar to the below. The lower the loss is, the better for us, as this number indicates the errors that the network is making while learning. The acc number should be high, as this represents the accuracy of the training data. After each epoch completes, you’ll also see the val_loss and val_acc numbers appear, which represent the loss and accuracy on the held-out validation data.

Train on 100000 samples, validate on 100000 samples

Epoch 1/3

100000/100000 [==============================] - 1780s - loss: 0.3974 - acc: 0.8237 - val_loss: 0.4305 - val_acc: 0.8158

Epoch 2/3

100000/100000 [==============================] - 1764s - loss: 0.2953 - acc: 0.8758 - val_loss: 0.3167 - val_acc: 0.8745

Epoch 3/3

100000/100000 [==============================] - 1754s - loss: 0.2305 - acc: 0.9057 - val_loss: 0.3296 - val_acc: 0.8589

We can see that for the first two epochs, the acc and val_acc numbers are similar. This is good. It means that the rules that the network learns on the training data generalize well to the unseen validation data. After two epochs, our network can predict whether a review is positive or negative correctly 87.5 percent of the time!

After the third epoch, the network is starting to memorize the training examples, with rules that are too specific. We can see that it gets 90% accuracy on the training data, but only 85.8% on the held-out validation data. This means that our network has “overfitted”, and we’d want to retrain it (by running the “compile” and “fit” steps again) for only two epochs instead of three. (We could also have told it to stop training automatically when it started overfitting by using an “Early Stopping” callback. We don’t show how to do this here, but you can read how to implement it at https://keras.io/callbacks/#earlystopping).

Adding more layers

Our simple model worked well enough, but it was slow to train. One way to speed up the training time is to improve our network architecture and add a “Convolutional” layer. Convolutional Neural Networks (CNNs) come from image processing. They pass a “filter” over the data, and calculate a higher-level representation. They have been shown to work surprisingly well for text, even though they have none of the sequence processing ability of LSTMs. They are also faster, as the different filters can be calculated independently of each other. LSTMs by contrast are hard to parallelise, as each calculation depends on many previous ones.

By adding a CNN layer before the LSTM, we allow the LSTM to see sequences of chunks instead of sequences of words. For example, the CNN might learn the chunk “I loved this” as a single concept and “friendly guest house” as another concept. The LSTM stacked on top of the CNN could then see the sequence [“I loved this”, “friendly guest house”] (a “sequence” of two items) and learn whether this was positive or negative, instead of having to learn the longer and more difficult sequence of the six independent items [“I”, “loved”, “this”, “friendly”, “guest”, “house”].

Add the definition of the new model in a new cell of the notebook:

model = Sequential()
model.add(Embedding(20000, 128, input_length=300))
model.add(Conv1D(64, 5, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(data, np.array(balanced_labels), validation_split=0.5, epochs=3)

Here we have a slightly different arrangement of layers. We add a dropout layer directly after the Embedding layer. Following this, we add a convolutional layer which passes a filter over the text to learn specific chunks or windows. After this, we have a MaxPooling layer, which combines all of the different chunked representations into a single chunk. If you’re interested in learning more about how this works (and seeing some GIFs which clarify the concept), have a look at http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/.

The rest of our model is the same as before. Training time should be significantly faster (about half an hour for all three epochs) and accuracy is similar. You should see output similar to the following.

Train on 100000 samples, validate on 100000 samples

Epoch 1/3

100000/100000 [==============================] - 465s - loss: 0.3309 - acc: 0.8561 - val_loss: 0.3181 - val_acc: 0.8640

Epoch 2/3

100000/100000 [==============================] - 462s - loss: 0.2325 - acc: 0.9048 - val_loss: 0.3335 - val_acc: 0.8549

Epoch 3/3

100000/100000 [==============================] - 457s - loss: 0.1666 - acc: 0.9349 - val_loss: 0.3833 - val_acc: 0.8570

Although the highest accuracy looks slightly worse, it gets there much faster. We can see that the network is already overfitting after one epoch. Adding more dropout layers after the CNN and LSTM layers might improve our network still more, but this is left as an exercise to the reader.

Comparing our results to a Support Vector Machine classifier

Neural Networks are great tools for many tasks. Sentiment analysis is quite straightforward though, and similar results can be achieved with much simpler algorithms. As a comparison, we’ll build a Support Vector Machine classifier using scikit-learn, another high-level machine learning Python library that allows us to create complex models in a few lines of code.

Run the following in a new cell:

from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import cross_val_score

t1 = datetime.now()
vectorizer = TfidfVectorizer(ngram_range=(1,2), min_df=3)
classifier = LinearSVC()
Xs = vectorizer.fit_transform(balanced_texts)

print(datetime.now() - t1)

score = cross_val_score(classifier, Xs, balanced_labels, cv=2, n_jobs=-1)

print(datetime.now() - t1)
print(sum(score) / len(score))

Now, instead of converting each word to a single number and learning an Embedding layer, we use a term-frequency inverse document frequency (TF-IDF) vectorisation process. Using this vectorisation scheme, we ignore the order of the words completely, and represent each review as a large sparse matrix, with each cell representing a specific word, and how often it appears in that review. We normalize the counts by the total number of times the word appears in all of the reviews, so rare words are given a higher importance than common ones (though we ignore all words that aren’t seen in at least three different reviews.

  • Line six sets up the vectorizer. We set ngram_range to (1,2) which means we’ll consider all words on their own but also look at all pairs of words. This is useful because we don’t have a concept of word order anymore, so looking at pairs of words as single tokens allows the classifier to learn that word pairs such as “not good” are usually negative, even though “good” is positive. We also set min_df to 3, which means that we’ll ignore words that aren’t seen at least three times (in three different reviews).
  • In line seven, we set up a support vector machine classifier with a linear kernel, as this has been shown to work well for text classification tasks.
  • In line eight, we convert our reviews into vectors by calling fit_transform. Fit transform actually does two things: first it “fits” the vectorizer, similarly to how we fitted out Tokenizer for Keras, to calculate the vocabulary across all reviews. Then it “transforms” each review into a large sparse vector.
  • In line 13, we get a cross-validated score of our classifier. This means that we train the classifier twice (because we pass in cv=2). Similarly to our validation_split for Keras, we first train on half the reviews and check our score on the other half, and then vice-versa. We set n_jobs=-1 to say that the classifier should use all available CPU cores — in our case, it will only use two as scikit-learn can only do basic parallelization and run each of the two cross-validation splits on a separate core.

You should see output similar to the following.


(200000, 774090)


[ 0.86372 0.88085]


We can see that the SVM is significantly faster. We need about a minute and a half to vectorize the reviews, transforming each of 200 000 reviews into a vector containing 775 090 features. It takes another minute and a half to train the classifier on 100 000 reviews and predict on the other 100 000.

Line four shows the accuracy for each cross-validation run, and line five shows the average of both runs. For our task, the SVM actually performs slightly better than the neural network in terms of accuracy, and it does so in significantly less time.

However, note that we have more flexibility with the neural network, and we could probably do better if we spent more time tuning our model. Because neural networks are not as well understood as SVMs, it can be difficult to find a good model for your data. Also note that if you want to train on even more data (we were using only a small sample of the Yelp reviews), you may well find that the neural network starts outperforming the SVM. Conversely, for smaller datasets, the SVM is much better than the neural network — try running all of the above code again for 2 000 reviews instead of 200 000 and you’ll see that the neural network really battles to find meaningful patterns if you limit the training examples. The SVM, on the other hand, will perform well even for much smaller datasets.

Packaging our models for later use

Being able to predict the sentiment of this review set is not very useful on its own — after all, we have all the labels at our disposal already. However, now that we’ve trained some models, we can easily use them on new, unlabeled data. For example, you could download thousands of news reports before an election and use our model to see whether mainly positive or mainly negative things are being said about key political figures, and how that sentiment is changing over time.

When saving our models for later use, it’s important not to forget the tokenization. When we converted our raw texts into tokens, we fitted the tokenizer first. In the case of the neural network, the tokenizer learned which were the most frequent words in our dataset. For our SVM, we fitted a TF-IDF vectorizer. If we want to classify more texts, we must use the same tokenizers without refitting them (a new dataset will have a different most-common word, but our neural network learned specific things about the word representations from the Yelp dataset. Therefore it’s important that we still use the token 1 to refer to the same word for new data).

For the Keras model, we can save the tokenizer and the trained model as follows (make sure that you have h5py installed with pip3 install h5py first if you’re not using the pre-configured AWS instance).

import pickle

# save the tokenizer and model
with open("keras_tokenizer.pickle", "wb") as f:
   pickle.dump(tokenizer, f)

If we want to predict whether some new piece of text is positive or negative, we can load our model and get a prediction with:

from keras.models import load_model
from keras.preprocessing.sequence import pad_sequences
import pickle

# load the tokenizer and the model
with open("keras_tokenizer.pickle", "rb") as f:
   tokenizer = pickle.load(f)

model = load_model("yelp_sentiment_model.hdf5")

# replace with the data you want to classify
newtexts = ["Your new data", "More new data"]

# note that we shouldn't call "fit" on the tokenizer again
sequences = tokenizer.texts_to_sequences(newtexts)
data = pad_sequences(sequences, maxlen=300)

# get predictions for each of your new texts
predictions = model.predict(data)

To package the SVM model, we similarly need to package both the vectoriser and the classifier. We can use the scikit-learn joblib module, which is designed to be faster than pickle.

from sklearn.externals import joblib

joblib.dump(vectorizer, "tfidf_vectorizer.pickle")
joblib.dump(classifier, "svm_classifier.pickle")

And to get predictions on new data, we can load our model from disk with:

from sklearn.externals import joblib

vectorizer = joblib.load("tfidf_vectorizer.pickle")
classifier = joblib.load("svm_classifier.pickle")

# replace with the data you want to classify
newtexts = ["Your new data", "More new data"]

# note that we should call "transform" here instead of the "fit_transform" from earlier
Xs = vectorizer.transform(newtexts)

# get predictions for each of your new texts
predictions = classifier.predict(Xs)

We took an introductory look at using Keras for text classification and compared our results to a simpler SVM. You now know:

  • How to set up a pre-configured AWS spot instance for machine learning
  • How to preprocess raw text data for use with Keras neural networks
  • How to experiment with building your own deep learning models for text classification
  • That an SVM is still often a good choice, in spite of advances in neural networks
  • How to package your trained models, and use them in later tasks.

If you have any questions or comments, feel free to reach out to the author on Twitter or github.

  1. Note that the architecture that we use for our neural networks closely follows the official Keras examples which can be found at https://github.com/fchollet/keras/tree/master/examples