About the Author:

Here Are the Two Things That Developers Value More Than Compensation When Choosing a Job

December 13th, 2019

We have no shortage of sayings about the primacy of money. “Money talks”. “Show me the money”. “Put your money where your mouth is”. But we sometimes overestimate money’s importance relative to many other things. This can be especially true when it comes to working and choosing which job to take.

When it comes to why software developers choose one job over another, many people would assume that comp would be the deal-breaker. Show me the money, right?

So that’s why it might be surprising to discover that perhaps it is not. At least that’s what HackerRank’s 2019 survey of 70,000+ software developers found.

Both professional growth and learning and work-life balance rank as more important (to software developers) than competitive compensation. Why?

Developers tend to be extremely curious and independent. They are constantly learning, self-teaching, and re-tooling. They also want to be able to show up to work when they want and occasionally work remotely.

To attract and keep their software developer talent, companies need to properly recognize L&D as a competitive advantage. Investing in professional growth, learning, and training is more important than just trying to outbid your competitors for talent.

Curious to learn more?

Our 2020 Developer Learning Survey Report is chock full of insights like the one found in this blog. We surveyed 800+ developers and combined our results with developer survey results from organizations like StackOverflow and HackerRank.

You can download the full survey report here.

About the Author:

Adding Authentication in AWS with Amplify

October 4th, 2019

If you’re familiar with using AWS for user authentication, DynamoDB, AppSync and other services in your app or website, you’ll love Amplify.

Amplify is a command-line interface that takes a few shortcuts, avoids the clicking and navigation and also makes a few wise decisions for you. Granted, you can customize things as you wish. And you can always go straight to the console source and make changes. But most of the time, using Amplify will do what you want faster and easier.

The first thing you need to do is install Amplify. You’ll need an AWS account and then run a few commands. You can see the details here: https://aws-amplify.github.io/docs/

Initialize a Project

Once it’s installed, you’re ready to go. I’ll use an iOS app but you can use basically the same steps for Android or a web app.

Navigate in the terminal to your project. From there, you’ll create your app’s existence with AWS with this command:

amplify init

The first thing it asks is for you to pick an editor:

I usually use Vim because it launches in the same terminal window.

Then it asks you to verify the type of project you’re working on. It can usually detect correctly based on the files:

Then you need to pick a profile (or create one on AWS which it helps you with):

Then it does it’s magic… and prints out a lot of lines. You can watch as it creates everything it needs. It will create a bucket in S3 for the deployment files, IAM roles as needed for running and accessing various pieces and a CloudFormation to manage it all.

The best part is you really don’t have to care! 🙂 Of course it’s always good to know what’s going on. I highly recommend going to each of the places listed in the AWS console to see what’s created.

Once the project is all setup, you’re ready to add features from AWS.

Help

If you run just “amplify” you get some basic help:

The key things you’ll tend to do are these:

amplify add <category> – This is how you add various services. If you add api, you’re adding AppSync (and possibly more like DynamoDB). If you add auth, that’s authorization using Cognito. Storage is S3 and so on.

Amplify does a great job of walking you through each one that while knowledge of each is great, it might not be necessary. Again, however, I highly recommend you understand what’s going on in the background.

I’d suggest using Amplify as a powerful tool to do what you already know about. I do not recommend using Amplify as a way to avoid learning the functionality of AWS.

Add Authorization

So let’s add a feature via Amplify to our app. We’ll use the command:

amplify add auth

One funny thing about amplify is that you can add a category with “amplify add <category>” or “amplify <category> add.” It’s like you can tell amplify “add this category” or you can tell a category to be added. Try not to let it bug you.

The first question you’ll be added is if you want to use the default configuration:

I like the default configuration. If you want to know more, select “I want to learn more” which displays this:

Again, I recommend learning about Cognito to understand the details. For this tutorial, we’ll go with the default.

It will set up the configuration for authorization to use a username, email and password for new accounts.

Amplify does this locally (and explains so at the end of the execution). So the configuration and everything is setup in files under the directory of your project.

It also mentions how you can push the change to AWS with the push command. To get it to AWS, you run:

amplify push

It will have you verify the changes before continuing:

In some cases there are more questions to answer and typically the default answer (e.g., Y/n – the capital letter being the default) is a good answer and you can just hit Enter on the keyboard.

Pushing to AWS can take a few minutes. Many lines will print out that look similar to when you created the project. Hopefully it ends with “All resources are updated in the cloud” and a satisfying green checkmark. 🙂

Cognito

Once it’s pushed to the server you can view the details at: https://console.aws.amazon.com/cognito/users

Of course you won’t have any users yet:

And locally you’ll have a new file that’s very important. It’s named awsconfiguration.json and it’s in the same directory your project is in:

This configuration file holds the details of the setup you created on AWS. It’s the file that you’ll include in your project (in this case Xcode) for access to the services and features.

As the extension implies, it’s a JSON file:

It lists the authentication related details the CocoaPods (iOS) will use to access AWS. As other categories are added via amplify, more items will be added to the awsconfiguration.json file.

If you include the file in your project where it is, you won’t need to update the file in your project as you add more services.

App Code

To add authentication to your code, you can visit the AWS documents per platform:

https://aws-amplify.github.io/docs/ios/authentication#initialization

https://aws-amplify.github.io/docs/android/authentication#initialization

https://aws-amplify.github.io/docs/js/authentication#configure-your-app

On the left side you’ll see a listing of other categories you can add similarly to your project.

For iOS, once you install the CocoaPods, checking to see if the user is logged in is pretty easy along with some other useful properties:

And showing the login/create account UI is similar:

Conclusion

Hopefully this removes some of the mystery of Amplify. I recommend trying it out and seeing how well it works for you. I’ve gotten pretty comfortable with it to the point that I don’t load up much of AWS to verify what it’s doing anymore.

Another great command in amplify is one that cleans everything up (for the given project). That way you can play around with it and remove it all easily. 🙂

amplify delete

 

About the Author:

What machine learning is today, and what it could be soon

February 18th, 2019

If AI is a broad umbrella that includes the likes of sci-fi movies, the development of robots, and all sorts of technology that fuels legacy companies and startups, then machine learning is one of the metal tongs (perhaps the strongest) that holds the AI umbrella up and open.

So, what is machine learning offering us today? And what could it offer us soon? Let’s explore the potential for ML technologies.

Intro to machine learning

Machine learning is the process of machines sorting through large amounts of data, looking for patterns that can’t be seen by the human eye. A theory for decades, the application of machine learning requires two major components: machines that can handle the amount of processing necessary, plus a lot (a lot!) of gathered, cleaned data.

Thanks to cloud computing, we finally have both. With cloud computing, we can speed through data processing. With cloud storage, we can collect huge amounts of data to actually sort through. Before all this, machines had to be explicitly programmed to accomplish a specific task. Now, however, computers can learn to find patterns, and perhaps act on them, without such programming. The more data, the more precise machine learning can be.

Current examples of machine learning

Unless you are a complete luddite, machine learning has already entered folds of your life. Choosing a Netflix title based on prompted recommendations? Browsing similar titles for your Kindle based on the book you just finished? These recommendations are actually tailor-made for you. (In the recent past, they relied on an elementary version of “if you liked x, you may like y”, culled from a list that was put together manually.)

Today, companies have developed proprietary algorithms that machine learnings train, or look for patterns, on, using your data combined with the data of millions of other customers. This is why your Netflix may be chock full of action flicks and superhero movies and your partner’s queue leans heavily on crime drama and period pieces.

But machine learning is doing more than just serving up entertainment. Credit companies and banks are getting more sophisticated with credit scores. Traditionally, credit companies relied on a long-established pattern of credit history, debt and loan amounts, and timely payments. This meant if you weren’t able to pay off a loan from over a decade ago, even if you’re all paid up now, your credit score likely still reflects that story. This made it very difficult to change your credit score over time – in fact, time often felt like the only way to improve your credit score.

Now, however, machine learning is changing how credit bureaus like Equifax determine your score. Instead of looking at your past payments, data from the very near past – like, the last few months – can actually better predict what you may do in the future. Data analysis from machine learning means that history doesn’t decide; data can predict your credit-worthiness based on current trends.

What the future holds for machine learning

Machine learning is just getting started. When we think of the future for machine learning, an example we also hear about are those elusive self-driving cars, also known as autonomous vehicles.

In this case, machine learning is able to understand how to respond to particular traffic situations based on reviewing millions of examples: videos of car crashes compared to accident-free traffic, how human-driven cars respond to traffic signs or signals, and watching how, where, and when pedestrians cross streets.

Machine learning is beginning to affect how we see images and videos – computers are using neural networks to cull thousands of images from the internet to fill in blanks in your own pictures.

Take, for instance, the photo you snapped on your holiday in London. You have a perfect shot of Big Ben, except for a pesky pedestrian sneaking by along a wall. You are able to remove the person from your image, but you may wonder how to fill the space on the wall that walker left behind. Adobe Photoshop and other image editors rely on an almost-standard API to cull other images of walls (that specific wall, perhaps, as well as other walls that look similar) and randomize it so that it looks natural and organic.

 

This type of machine learning is advancing rapidly and it could soon be as easy as an app on our phones. Imagine how this can affect the veracity of a video – is the person actually doing what the video shows?

Problems with machine learning

We are at a pivotal point where we can see a lot of potential for machine learning, but we can also see a lot of potential problems. Solutions are harder to grasp as the technology forges forward.

The future of machine learning is inevitable; the question is more when? Predictions indicate that nearly every kind of AI will include machine learning, no matter the size or use. Plus, as cloud computing grows and the world amasses infinite data, machines will be able to learn continuously, on limitless data, instead of on specific data sets. Once connected to the internet, there is a constant stream of emerging information and content.

This future comes with challenges. First, hardware vendors will necessarily have to make their computers and servers stronger and speedier to cope with these increased demands.

As for experts in AI, it seems there will be a steep and sudden shortage in the professional manpower who can cope with what AI will be able to day. Behind the private and pricey walls of Amazon, Google, Apple, Uber, and Facebook, most small- and medium-sized businesses (SMBs) actually aren’t stepping more than a toe or two into the world of machine learning. While this is due in part to a lack of money or resources, the lack of expert knowledge is actually the biggest reason that SMBs aren’t deeper into ML. But, as ML technologies normalize, they’ll cost less and become a lot more accessible. If your company doesn’t have experts who knows how you could be using ML to help your business, you’re missing out.

On a global level, machine learning provides some cause for concern. There’s the idea that we’ll all be replaced in our jobs by specific machines or robots – which may or may not come to fruition.

More immediately and troubling, however, is the idea that imaging can be faked. This trick is certainly impressive for an amateur photographer, but it begs an important question: how much longer can we truly believe everything that we see? Perhaps seeing is believing has a limited window as a standard truthbearer in our society.

 

About the Author:

Reaching the Cloud: Is Everything Serverless?

February 18th, 2019

As it goes in technology, as soon as we all adapt a new term, there will assuredly be another one ready to take its place. As we embrace cloud technology, migrating functions and software for organization, AI potential, timeliness, and flexibility, we are now encountering yet another buzzword: serverless.

Serverless and the cloud may sound similar, both floating off in some distant place, existing beyond your company’s cool server room. But are the cloud and serverless the same? Not quite. This article explores how serverless technology relates to the cloud, as well as, and more importantly, whether you have to adapt a serverless culture.

What is serverless?

Serverless is shorthand for two terms: serverless architecture and serverless computing.

Once we get past the name, serverless is a way of building and deploying software and apps on cloud computers. For all your developers and engineers who are tired of coping with server and infrastructure issues because they’d rather be coding, serverless could well be the answer.

Serverless architecture is the foundation of serverless computing. Generally, three types of software services can function well on serverless architecture: function-as-a-service (FaaS), backend-as-a-service (BaaS), and databases.

Serverless code, then, relies on serverless architecture to develop stand-alone apps or microservices without provisioning servers, as is required in traditional (server-necessary) coding. Of course, serverless coding can also be used in tandem with traditional coding. An app or software that runs on serverless code is triggered by events and its overall execution is managed by the cloud provider. Pricing varies but is generally based on the number of executions (as opposed to a pre-purchased compute capacity that other cloud services you use may rely on).

As for the name itself: calling something “serverless” is a bit of a misnomer because serverless anything isn’t possible. Serverless software and apps still rely on a server, it’s just not one that you maintain in-house. Instead, your cloud provider, such as Google, AWS, Azure, or IBM, acts as your server and your server manager, allocating your machine resources.

The cloud vs. serverless

While the cloud and serverless are certainly related, there’s a better reason why we are hearing about serverless technologies ad nauseum. Because cloud leaders like AWS, Google, Azure, and IBM are investing heavily in serverless (and that’s a ton of money, to be sure).

Just as these companies spearheaded a global effort to convince companies their apps and data can perform and store better in the cloud, they are now encouraging serverless coding and serverless architecture so that you continue to use their cloud services.

Serverless benefits

Is everything serverless? Will everything be serverless soon? In short, no and no.

The longer answer is that serverless architecture and serverless computing are good for simple applications. In serverless coding, your cloud provider takes care of the server-side infrastructure, freeing up your developers to focus on your business goals.

Your developers may already be working on serverless code – or they want to be. That’s because it frees them from the headache of maintaining infrastructure. They can dispense with annoying things like provisioning a server, ensuring its functionality, creating test environments, and maintaining server uptime, which means they are focused primarily on actual developing.

As long as the functionality is appropriate, serverless can provide the following benefits:

  • Efficient use of resources
  • Rapid testing and deployment, as multiple environments are a breeze to set up
  • Reduced cost (server maintenance, team support, etc.)
  • Focus on coding – may result in increased productivity around business goals
  • Familiar programming languages and environment
  • Increased scalability

Traditional code isn’t going anywhere (yet)

While focusing on your core business is always a good goal, the reality is that serverless isn’t a silver bullet for your coding or your infrastructure.

Depending on your business, it’s likely that some products and apps require more complex functions. For these, serverless may be the wrong move. Traditional coding still offers many benefits, despite still requiring fixed resources that require provisioning, states, and human maintenance. Networking is easier because everything lives within your usual environment. And, let’s face it: unless you’re a brand-new startup, you probably already have the servers and tech staff to support traditional coding and architecture.

Computationally, serverless has strict limits. Most cloud providers price serverless options based on time: how many seconds or minutes does an execution take? Unfortunately, the more complex your execution, the more likely you’re go past the maximum time allowed, which hovers around 300 seconds (five minutes). With a traditional environment, however, there is no timeout limit. Your servers are dedicated to your executions, no matter how long they take or how many external databases they have to reference. This can make activities like testing and external call up harder or impossible to accomplish.

From a business perspective, you have to decide what you value more: only paying for what you use (caveat emptor), with decreased opex costs. Or, perhaps control is tantamount, as you are skeptical of the trust and security risk factors that come with using a third party. Plus, not all developers work the same. While some devs want to use cutting-edge technology that allows them to focus on front-end logic, others prefer the control and holistic access that traditional architecture and coding provides.

About the Author:

When Technology Moves Faster Than Training, Bad Things Happen

February 18th, 2019

Technology is changing how we design training, and it should. Unfortunately, many instructional designers are not producing the learning programs and products that today’s technical talent needs. Not because they don’t want to, but because many companies don’t support their efforts to advance their work technologically or financially.

That’s a mistake. Technology has already changed learning design. Those who don’t acknowledge this appropriately are doing their organizations – and their technical talent – a disservice.

Bob Mosher, chief learning evangelist for Apply Synergies, a learning and performance solutions company, said we can now embed technology in training in ways we never could before. E-learning, for instance, has been around in some for or another, but it always sat in an LMS or outside of the technology or whatever subject matter it was created to support. That’s no longer the case.

“Now I don’t have to leave the CRM or ERP software, or cognitively leave my workflow,” Mosher explained. “I get pop ups, pushes, hints, lessons when I need them, while I’m staring at what I’m doing. These things guide me through steps; they take over my machine, they watch me perform and tell me when and where I go wrong. Technology has allowed us to make all of those things more adaptive.”

Of course, not all learning design affected by technology is adaptive, but before adaptive learning came on the scene, training was more pull than push, which can be problematic. If you don’t know what you don’t know, you may proceed blindly thinking that, “oh, I’m doing great,” when you’re really not. Mosher said adaptive learning technologies that monitor learner behavior and quiz and train based on an individual’s answers and tactics, can be extremely powerful.

But – there’s almost always a but – many instructional designers are struggling with this because they’re more familiar with event-based training design. Designing training for the workflow is very different animal.

The Classroom Is Now a Learning Lab

“It’s funny, for years we’ve been talking about personalized learning, but we’ve misunderstood it thinking we have to design the personalized experience for every learner,” Mosher said. “But how do I design something personalized for you? I can give you the building blocks, but in the end, no one can personalize better than the learners themselves. Designing training for the workflow is a very different animal.”

In other words, new and emerging technologies are brilliant because they enable learners to customize the learning experience and adapt it to the work they do every day. But it’s one thing to have these authoring technologies and environments; it’s something else for an instructional designer to make the necessary shift and use them well.

Further, learning leaders will have to use the classroom differently, leveraging the different tools at their disposal appropriately. “If I know I have this embedded technology in IT, that these pop ups are going to guide people through, say, filling out a CRM, why spend an hour of class teaching them those things? I can skip that,” Mosher said. “Then my class becomes more about trying those things out.”

That means learning strategies that promote peer learning, labs and experiential learning move to the forefront, with adaptive training technology as the perfect complement. Antiquated and frankly ineffective technical training methods filled with clicking, learning by repetition through menus, and procedural drilling should be retired post haste in favor of context-rich learning fare.

Then instructors can move beyond the sage-on-the-stage role, and act as knowledge resources and performance support partners, while developers and engineers write code and metaphorically get their hands dirty. “If I have tools that help me with the procedures when I’m not in class, in labs I can do scenarios, problem solving, use cases, have people bounce ideas and help me troubleshoot when I screw up,” Mosher said. “I’m not taking a lesson to memorize menus.”

Learning Leaders, Act Now

Learning leaders who want to adapt to technology changes in training design must first secure appropriate budget. Basically, you can’t use cool technology for training unless you actually buy said cool technology. Budgetary allocations and experimentation must be done, and instructional designers have to have the time and latitude to upgrade their skills as well because workflow learning is a new way of looking at design.

“Everyone wants agile instructional design, but they want to do it the old way,” Moshers said. “You’re not going to get apples from oranges. Leadership has to loosen the rope a little bit so instructional designers (IDs) can change from the old way of designing to the new way.

“IT’s been agile for how long now? Yet we still ask IDs to design in a waterfall, ADDIE methodology. That’s four versions behind. Leadership has to understand that to get to the next platform, there’s always a learning curve. There’s an investment that you don’t get a return on right away – that’s what an investment is.”

For learning leaders who want to get caught up quickly and efficiently, Mosher said it can be advantageous to use a vendor. They’re often on target with the latest instructional design approaches and have made the most up to date training technology investments. But leadership must communicate with instructional designers to avoid resistance.

“Good vendors aren’t trying to put anybody out of a job, or call your baby ugly,” he explained. “It’s more like, look. You’ve done great work and will continue to do great work, but you’re behind. You deserve to be caught up.”

The relationship should be a partnership where vendor and client work closely together. “Right,” Mosher said. “If you choose the right vendor.”

About the Author:

Working with ElasticSearch

January 26th, 2018

The Working with ElasticSearch training course teaches architects, developers, and administrators the skills and knowledge needed to use Elasticsearch as a data index or data store with Kibana as the front-end and programatic access using the Application Program Interfaces (APIs) using Python.

The ElasticSearch training course begins by examining how to install, configure, and run Elasticsearch and Kibana. With the foundation laid, the course then examines how to configure Elasticsearch data mappings and simple data loading. Next querying Elasticsearch using Kibana is discussed. Day two begins with a deeper dive into how Elasticsearch indexes and searches data, and how it provides clustering and fault tolerance. Next configuration of data indexing and analysis is reviewed. Finally the various major Elasticsearch APIs are explored and exercised.

The Working with ElasticSearch course assumes some familiarity with Python (limited), Extensible Markup Language (XML), JavaScript Object Notation (JSON), and command line tools.

About the Author:

How to Build a Simple Chat App with React Native and Firebase

November 30th, 2017

Nick is a mobile developer with wide experience in building iOS and Android applications at RubyGarage. He enjoys researching tech topics and sharing his knowledge with other developers.

In mobile development, cross-platform applications are appreciated for their short development cycle, low cost, and quick time to market in comparison with native apps.

One popular framework that enables developers to build hybrid mobile apps is React Native. React Native was created by Facebook developers during a hackathon back in 2013. Since then, the framework has become a core technology for numerous mobile applications including Instagram, Skype, Tesla, Airbnb, and Walmart. (more…)

About the Author:

Cleaning Dirty Data with Pandas & Python

August 10th, 2017

Pandas is a popular Python library used for data science and analysis. Used in conjunction with other data science toolsets like SciPy, NumPy, and Matplotlib, a modeler can create end-to-end analytic workflows to solve business problems.

While you can do a lot of really powerful things with Python and data analysis, your analysis is only ever as good as your dataset. And many datasets have missing, malformed, or erroneous data. It’s often unavoidable–anything from incomplete reporting to technical glitches can cause “dirty” data.

Thankfully, Pandas provides a robust library of functions to help you clean up, sort through, and make sense of your datasets, no matter what state they’re in. For our example, we’re going to use a dataset of 5,000 movies scraped from IMDB. It contains information on the actors, directors, budget, and gross, as well as the IMDB rating and release year. In practice, you’ll be using much larger datasets consisting of potentially millions of rows, but this is a good sample dataset to start with.

Unfortunately, some of the fields in this dataset aren’t filled in and some of them have default values such as 0 or NaN (Not a Number).

No good. Let’s go through some Pandas hacks you can use to clean up your dirty data.

Getting started

To get started with Pandas, first you will need to have it installed. You can do so by running:

$ pip install pandas

Then we need to load the data we downloaded into Pandas. You can do this with a few Python commands:

import pandas as pd

data = pd.read_csv(‘movie_metadata.csv’)

Make sure you have your movie dataset in the same folder as you’re running the Python script. If you have it stored elsewhere, you’ll need to change the read_csv parameter to point to the file’s location.

Look at your data

To check out the basic structure of the data we just read in, you can use the head() command to print out the first five rows. That should give you a general idea of the structure of the dataset.

data.head()

When we look at the dataset either in Pandas or in a more traditional program like Excel, we can start to note down the problems, and then we’ll come up with solutions to fix those problems.

Pandas has some selection methods which you can use to slice and dice the dataset based on your queries. Let’s go through some quick examples before moving on:

  • Look at the some basic stats for the ‘imdb_score’ column: data.imdb_score.describe()
  • Select a column: data[‘movie_title’]
  • Select the first 10 rows of a column: data[‘duration’][:10]
  • Select multiple columns: data[[‘budget’,’gross’]]
  • Select all movies over two hours long: data[data[‘duration’] > 120]
Deal with missing data

One of the most common problems is missing data. This could be because it was never filled out properly, the data wasn’t available, or there was a computing error. Whatever the reason, if we leave the blank values in there, it will cause errors in analysis later on. There are a couple of ways to deal with missing data:

  • Add in a default value for the missing data
  • Get rid of (delete) the rows that have missing data
  • Get rid of (delete) the columns that have a high incidence of missing data

We’ll go through each of those in turn.

Add default values

First of all, we should probably get rid of all those nasty NaN values. But what to put in its place? Well, this is where you’re going to have to eyeball the data a little bit. For our example, let’s look at the ‘country’ column. It’s straightforward enough, but some of the movies don’t have a country provided so the data shows up as NaN. In this case, we probably don’t want to assume the country, so we can replace it with an empty string or some other default value.

data.country = data.country.fillna(‘’)

This replaces the NaN entries in the ‘country’ column with the empty string, but we could just as easily tell it to replace with a default name such as “None Given”. You can find more information on fillna() in the Pandas documentation.

With numerical data like the duration of the movie, a calculation like taking the mean duration can help us even the dataset out. It’s not a great measure, but it’s an estimate of what the duration could be based on the other data. That way we don’t have crazy numbers like 0 or NaN throwing off our analysis.

data.duration = data.duration.fillna(data.duration.mean())

Remove incomplete rows

Let’s say we want to get rid of any rows that have a missing value. It’s a pretty aggressive technique, but there may be a use case where that’s exactly what you want to do.

Dropping all rows with any NA values is easy:

data.dropna()

Of course, we can also drop rows that have all NA values:

data.dropna(how=’all’)

We can also put a limitation on how many non-null values need to be in a row in order to keep it (in this example, the data needs to have at least 5 non-null values):

data.dropna(thresh=5)

Let’s say for instance that we don’t want to include any movie that doesn’t have information on when the movie came out:

data.dropna(subset=[‘title_year’])

The subset parameter allows you to choose which columns you want to look at. You can also pass it a list of column names here.

Deal with error-prone columns

We can apply the same kind of criteria to our columns. We just need to use the parameter axis=1 in our code. That means to operate on columns, not rows. (We could have used axis=0 in our row examples, but it is 0 by default if you don’t enter anything.)

Drop the columns with that are all NA values:

data.dropna(axis=1, how=’all’)

Drop all columns with any NA values:

data.dropna(axis=1, how=’any’)

The same threshold and subset parameters from above apply as well. For more information and examples, visit the Pandas documentation.

Normalize data types

Sometimes, especially when you’re reading in a CSV with a bunch of numbers, some of the numbers will read in as strings instead of numeric values, or vice versa. Here’s a way you can fix that and normalize your data types:

data = pd.read_csv(‘movie_metadata.csv’, dtype={‘duration’: int})

This tells Pandas that the column ‘duration’ needs to be an integer value. Similarly, if we want the release year to be a string and not a number, we can do the same kind of thing:

data = pd.read_csv(‘movie_metadata.csv’, dtype={title_year: str})

Keep in mind that this data reads the CSV from disk again, so make sure you either normalize your data types first or dump your intermediary results to a file before doing so.

Change casing

Columns with user-provided data are ripe for corruption. People make typos, leave their caps lock on (or off), and add extra spaces where they shouldn’t.

To change all our movie titles to uppercase:

data[‘movie_title’].str.upper()

Similarly, to get rid of trailing whitespace:

data[‘movie_title’].str.strip()

We won’t be able to cover correcting spelling mistakes in this tutorial, but you can read up on fuzzy matching for more information.

Rename columns

Finally, if your data was generated by a computer program, it probably has some computer-generated column names, too. Those can be hard to read and understand while working, so if you want to rename a column to something more user-friendly, you can do it like this:

data.rename(columns = {‘title_year’:’release_date’, ‘movie_facebook_likes’:’facebook_likes’})

Here we’ve renamed ‘title_year’ to ‘release_date’ and ‘movie_facebook_likes’ to simply ‘facebook_likes’. Since this is not an in-place operation, you’ll need to save the DataFrame by assigning it to a variable.

data = data.rename(columns = {‘title_year’:’release_date’, ‘movie_facebook_likes’:’facebook_likes’})

Save your results

When you’re done cleaning your data, you may want to export it back into CSV format for further processing in another program. This is easy to do in Pandas:

data.to_csv(‘cleanfile.csv’ encoding=’utf-8’)

More resources

Of course, this is only the tip of the iceberg. With variations in user environments, languages, and user input, there are many ways that a potential dataset may be dirty or corrupted. At this point you should have learned some of the most common ways to clean your dataset with Pandas and Python.

For more resources on Pandas and data cleaning, see these additional resources:

About the Author:

Building a Serverless Chatbot w/ AWS, Zappa, Telegram, and api.ai

August 2nd, 2017

If you’ve ever had to set up and maintain a web server before, you know the hassle of keeping it up-to-date, installing security patches, renewing SSL certificates, dealing with downtime, rebooting when things go wrong, rotating logs and all of the other ‘ops’ that come along with managing your own infrastructure. Even if you haven’t had to manage a web server before, you probably want to avoid all of these things.

For those who want to focus on building and running code, serverless computing provides fully-managed infrastructure that takes care of all of the nitty-gritty operations automatically.

In this tutorial, we’ll show you how to build a chatbot which performs currency conversions. We’ll make the chatbot available to the world via AWS Lambda, meaning you can write the code, hit deploy, and never worry about maintenance again. Our bot’s brain will be powered by api.ai, a natural language understanding platform owned by Google.

Overview

In this post we’ll walk you through building a Telegram Bot. We’ll write the bot in Python, wrap it with Flask and use Zappa to host it on AWS Lambda. We’ll add works-out-the-box AI to our bot by using api.ai.

By the end of this post, you’ll have a fully-functioning Chatbot that will respond to Natural Language queries. You’ll be able to invite anyone in the world to chat with your bot and easily edit your bot’s “brain” to suit your needs.

Before We Begin

To follow along with this tutorial, you’ll have to have a valid phone number and credit card (we’ll be staying within the free usage limits of all services we use, so you won’t be charged). Specifically, you’ll need:

  • …to sign up with Amazon Web Services. The signup process can be a bit long, and requires a valid credit card. AWS offers a million free Lambda requests per month, and our usage will stay within this free limit.
  • …to sign up with api.ai. Another lengthy sign-up process, as it requires integration with the Google Cloud Platform. You’ll be guided through this process when you sign up with api.ai. Usage is currently free.
  • …to sign up with Telegram, a chat platform similar to the more popular WhatsApp. You’ll need to download one of their apps (for Android, iPhone, Windows Phone, Windows, MacOS, or Linux) in order to register, but once you have an account you can also use it from web.telegram.org. You’ll also need a valid phone number. Telegram is completely free.
  • …basic knowledge of Python and a working Python environment (that is, you should be able to run Python code and install new Python packages). Preferably, you should have used Python virtual environments before, but you should be able to keep up even if you haven’t. All our code examples use Python 3, but most things should be Python 2 compatible.

If you’re aiming to learn how to use the various services covered in this tutorial, we suggest you follow along step by step, creating each component as it’s needed. If you’re impatient and want to get a functioning chatbot set up as fast as possible, you can clone the GitHub repository with all the code presented here and use that as a starting point.

Building an Echo Bot

When learning a new programming language, the first program you write is one which outputs the string “Hello, World!” When learning to build chatbots, the first bot you build is one that repeats everything you say to it.

Achieving this proves that your bot is able to accept and respond to user input. After that, it’s simple enough to add the logic to make your bot do something more interesting.

Getting a Token for Our New Bot

The first thing you need is a bot token from Telegram. You can get this by talking to the @BotFather bot through the Telegram platform.

In your Telegram app, open a chat with the official @BotFather Chatbot, and send the command /newbot. Answer the questions about what you’ll use for your new bot’s name and username, and you’ll be given a unique token similar to 14438024:AAGI6Kh8ew4wUf9-vbqtb3S4sIM7nDlcXj3. We’ll use this token to prove ownership of our new bot, which allows us to send and receive messages through the Bot.

We can now control our new bot via Telegram’s HTTP API. We’ll be using Python to make calls to this API.

Writing the First Code for Our New Bot

Create a new directory called currencybot to house the code we need for our bot’s logic, and create three Python files in this directory named config.py, currencybot.py, and bot_server.py The structure of your project should be as follows:

currency/
bot_server.py
config.py
currencybot.py

in config.py we need a single line of code defining the bot token, as follows (substitute with the token you received from BotFather).

bot_token = "14438024:AAGI6Kh8ew4wUf9-vbqtb3S4sIM7nDlcXj3"

In currencybot.py we need to put the logic for our bot, which revolves around receiving a message, handling the message, and sending a message. That is, our bot receives a message from some user, works out how to respond to this message, and then sends the response. For now, because we are building an echo bot, the handling logic will simply return any input passed to it back again.

Add the following code to currencybot.py:

import requests
import config

# The main URL for the Telegram API with our bot's token
BASE_URL = "https://api.telegram.org/bot{}".format(config.bot_token)

def receive_message(message):
    """Receive a raw message from Telegram"""
    try:
        message = str(msg["message"]["text"])
        chat_id = msg["message"]["chat"]["id"]
        return message, chat_id
    except Exception as e:
        print(e)
        return (None, None)
 
def handle_message(message):
    """Calculate a response to the message"""
    return message
 
def send_message(message, chat_id):
    """Send a message to the Telegram chat defined by chat_id"""
    data = {"text": message.encode("utf8"), "chat_id": chat_id}
    url = BASE_URL + "/sendMessage"
    try:
        response = requests.post(url, data).content
    except Exception as e:
        print(e)
        
def run(message):
    """Receive a message, handle it, and send a response"""
    try:
        message, chat_id = receive_message(message)
        response = handle_message(message)
        send_message(response, chat_id)
    except Exception as e:
        print(e)

Finally, bot_server.py is a thin wrapper for our bot that will allow it to receive messages via HTTP. Here we’ll run a basic Flask application. When our bot receives new messages, Telegram will send these via HTTP to our Flask app, which will pass them on to the code we wrote above. In bot_server.py, add the following code:

from flask import Flask
from flask import request
from currencybot import run

app = Flask(__name__)

@app.route("/", methods=["GET", "POST"])
def receive():
    try:
        run(request.json)
        return ""
    except Exception as e:
        print(e)
        return ""

This is a minimal Flask app that imports the main run() function from our currencybot script. It uses Flask’s request module (distinct from the requests library we used earlier, though the names are similar enough to be confusing) to grab the POST data from an HTTP request and convert this to JSON. We pass the JSON along to our bot, which can extract the text of the message and respond to it.

Deploying Our Echo Bot

We’re now ready to deploy our bot onto AWS Lambda so that it can receive messages from the outside world.

We’ll be using the Python library Zappa to deploy our bot, and Zappa will interact directly with our Amazon Web Services account. In order to do this, you’ll need to set up command line access for your AWS account as described here: https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/.

To use Zappa, it needs to be installed inside a Python virtual environment. Depending on your operating system and Python environment, there are different ways of creating and activating a virtual environment. You can read more about how to set one up here. If you’re using MacOS or Linux and have used Python before, you should be able to create one by running the following command.

virtualenv ~/currencybotenv

You should see output similar to the following:

~/git/currencybot g$ virtualenv ~/currencybotenv

Using base prefix '/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6'

New python executable in /Users/g/currencybotenv/bin/python3.6

Also creating executable in /Users/g/currencybotenv/bin/python

Installing setuptools, pip, wheel...done.

The result is that a clean Python environment has been created, which is important so Zappa will know exactly what dependencies to install on AWS Lambda. We’ll install the few dependencies we need for our bot (including Zappa) inside this environment.

Activate the environment by running:

source ~/currencybotenv/bin/activate

You should see your Terminal’s prompt change to indicate that you’re now working inside that environment. Mine looks like this:

(currencybotenv) ~/git/currencybot g$

Now we need to install the dependencies for our bot using pip. Run:

pip install zappa requests flask

At this point, we need to initialize our Zappa project. We can do this by running:

zappa init

This will begin an interactive process of setting up options with Zappa. You can accept all of the defaults by pressing Enter at each prompt. Zappa should figure out that your Flask application is inside bot_server.py and prompt to use bot_server.app as your app’s function.

You’ve now initialized the project and Zappa has created a zappa_settings.json file in your project directory. Next, deploy your bot to Lambda by running the following command (assuming you kept the default environment name of ‘dev’):

zappa deploy dev

This will package up your bot and all of its dependencies, and put them in an AWS S3 bucket, from which it can be run via AWS Lambda. If everything went well, Zappa will print out the URL where your bot is hosted. It should look something like https://l19rl52bvj.execute-api.eu-west-1.amazonaws.com/dev. Copy this URL because you’ll need to instruct Telegram to post any messages sent to our bot to this endpoint.

In your web browser, change the setting of your Telegram bot by using the Telegram API and your bot’s token. To set the URL to which Telegram should send messages to your bot, build a URL that looks like the following, but with your bot’s token and your AWS Lambda URL instead of the placeholders.

https://api.telegram.org/bot<your-bot-token>/setWebhook?url=<your-zappa-url>

For example, your URL should look something like this:

http://api.telegram.org/bot14438024:AAGI6Kh8ew4wUf9-vbqtb3S4sIM7nDlcXj3/setWebhook?url=https://l19rl52bvj.execute-api.eu-west-1.amazonaws.com/dev

Note that the string bot must appear directly before the token.

Testing Our Echo Bot

Visit your bot in the Telegram client by navigating to t.me/<your-bot’s-username>. You can find a link to your bot in the last message sent by BotFather when you created the bot. Open up a Chat with your bot in the Telegram client and press the /start button.

Now you can send your bot messages and you should receive the same message as a reply.

/Users/g/Desktop/small_copying.png

If you don’t, it’s likely that there’s a bug in your code. You can run zappa tail dev in your Terminal to view the output of your bot’s code, including any error messages.

Teaching Our Bot About Currencies

You’ll probably get bored of chatting to your echo bot pretty quickly. To make it more useful, we’ll teach it how to send us currency conversions.

Add the following two functions to the currencybot.py file. These functions allow us to use the Fixer API to get today’s exchange rates and do some basic calculations.

def get_rate(frm, to):
    """Get the raw conversion rate between two currencies"""
    url = "http://api.fixer.io/latest?base={}&symbols={}".format(frm, to)
    try:
        response = requests.get(url)
        js = response.json()
        rates = js['rates']
        return rates.popitem()[1]
    except Exception as e:
        print(e)
        return 0

def get_conversion(quantity=1, frm="USD", to="GBP"):
    rate = get_rate(frm.upper(), to.upper())
    to_amount = quantity * rate
    return "{} {} = {} {}".format(quantity, frm, to_amount, to)

We’ll now expect the user to send currency conversion queries for our bot to compute. For example, if a user sends “5 USD GBP” we should respond with a calculation of how many British Pounds are equivalent to 5 US Dollars. We need to change our handle_message() function to split the message into appropriate parts and pass them to our get_conversion() function. Update handle_message() in currencybot.py to look like this:

def handle_message(message):
    """Calculate a response to a message"""
    try:
        qty, frm, to = message.split(" ")[:3]
        qty = int(qty)
        response = get_conversion(qty, frm, to)
    except Exception as e:
        print(e)
        response = "I couldn't parse that"
    return response

This function now parses messages that match the required format into the three parts. If the message doesn’t match what we were expecting, we inform the user that we couldn’t deal with their input.

Save the code and update the bot by running the following command (make sure you are still within your Python virtual environment, in your project directory).

zappa update dev

Testing Our Currency Converter Bot

After the update has completed, you’ll be able to chat with your bot and get currency conversions. You can see an example of the bot converting US Dollars to South African Rands and US Dollars to British Pounds below:

/Users/g/Desktop/small_currency.png

Adding AI to Our Bot

Our bot is more useful now, but it’s not exactly smart. Users have to remember the correct input format and any slight deviations will result in the “I couldn’t parse that” error. We want our bot to be able to respond to natural language queries, such as “How much is 5 dollars in pounds?” or “Convert 3 USD to pounds”. There are an infinite number of ways that users might ask these questions, and extracting the three pieces of information (the quantity, from-currency, and to-currency) is a non-trivial task.

This is where Artificial Intelligence and Machine Learning can help us out. Instead of writing rules to account for each variation of the same question, machine learning lets us learn patterns from existing examples. Using machine learning, we can teach a program to extract the pieces of information that we want by ‘teaching’ it with a number of existing examples. Luckily, someone else has already done this for us, so we don’t need to start from scratch.

Create an account with api.ai, and go through their setup process. Once you get to the main screen, select the “Prebuilt Agents” tab, as shown below

/Users/g/Desktop/small_api.png

Select the “Currency Converter” agent from the list of options, and choose a Google Cloud Project (or create a new one) to host this agent. Now you can test your agent by typing in a query in the top right-hand corner of the page, as indicated below:

/Users/g/Desktop/small_curl.png

Hit the “Copy Curl” link, which will copy a URL with the parameters you need to programmatically make the same request you just made manually through the web page. It should have copied a string that looks similar to the following into your clipboard.

curl 'https://api.api.ai/api/query?v=20150910&query=convert%201%20usd%20to%20zar&lang=en&sessionId=fed2f39e-6c38-4d42-aa97-0a2076de5c6b&timezone=2017-07-15T18:12:03+0200' -H 'Authorization:Bearer a5f2cc620de338048334f68aaa1219ff'

The important part is the Authorization argument, which we’ll need to make the same request from our Python code. Copy the whole token, including Bearer into your config.py file, which should now look similar to the following:

bot_token = "14438024:AAGI6Kh8ew4wUf9-vbqtb3S4sIM7nDlcXj3"

apiai_bearer = "Bearer a5f2cc620de338048334f68aaa1219ff"

Add the following line to the top of your currencybot.py file:

from datetime import datetime

And add a parse_conversion_query() function below in the same file, as follows:

def parse_conversion_query(query):
    url_template = "https://api.api.ai/api/query?v=20150910&query={}&lang=en&sessionId={}"
    url = url_template.format(query, datetime.now())
    headers = {"Authorization":  config.apiai_bearer}
    response = requests.get(url, headers=headers)
    js = response.json()
    currency_to = js['result']['parameters']['currency-to']
    currency_from = js['result']['parameters']['currency-from']
    amount = js['result']['parameters']['amount']
    return amount, currency_from, currency_to

This reconstructs the cURL command that we copied from the api.ai site for Python. Note that the v=20150910 in the url_template is fixed and should not be updated for the current date. This selects the current version of the api.ai API. We omit the optional timezone argument but use datetime.now() as a unique sessionId.

Now we can pass a natural language query to the api.ai API (if you think that’s difficult to say, just look at the url_template which contains api.api.ai/api/!) It will work out what the user wants in terms of quantity, from-currency and to-currency, and return structured JSON for our bot to parse. Remember that api.ai doesn’t do the actual conversion–its only role is to extract the components we need from a natural language query, so we’ll pass these pieces to the fixer.io API as before. Update the handle_message() function to use our new NLU parser. It should look as follows:

def handle_message(message):
    """Calculate a response to a message"""
    try:
        qty, frm, to = parse_conversion_query(message)
        qty = int(qty)
        response = get_conversion(qty, frm, to)
    except Exception as e:
        print(e)
        response = "I couldn't parse that"
    return response

Make sure you’ve saved all your files, and update your deployment again with:

zappa update dev

Testing Our Bot’s AI

Now our bot should be able to convert between currencies based on Natural Language queries such as “How much is 3 usd in Indian Rupees”.

/Users/g/Desktop/small_nlu.png

If this doesn’t work, run zappa tail dev again to look at the error log and figure out what went wrong.

Our bot is by no means perfect, and you should easily be able to find queries that break it and cause unexpected responses, but it can handle a lot more than the strict input format we started with! If you want to teach it to handle queries in specific formats, you can use the api.ai web page to improve your bot’s understanding and pattern recognition.

Conclusion

Serverless computing and Chatbots are both growing in popularity, and in this tutorial you learned how to use both of them.

We showed you how to set up a Telegram Chatbot, make it accessible to the world, and plug in a prebuilt brain.

You can now easily do the same using the other pre-built agents offered by api.ai, or start building your own. You can also look at the other Bot APIs offered by Facebook Messenger, Skype, and many similar platforms to make your Bots accessible to a wider audience.

About the Author:

Python 2 vs. Python 3 Explained in Simple Terms

July 13th, 2017

Python is a high level, versatile, object-oriented programming language. Python is simple and easy to learn while also being powerful and highly effective. These advantages make it suitable for programmers of all backgrounds, and Python has become one of the most widely used languages across a variety of fields.

Python differs from most other programming languages in that two incompatible versions, Python 2 and Python 3, are both widely used. This article presents a brief overview of a few of the differences between Python 2 and Python 3 and is primarily aimed at a less-technical audience.

Python 2 (aka Python 2.x)

The second version of Python, Python 2.0, arrived in 2000. Upon its launch, Python introduced many new features that improved upon the previous version. Notably, it included support for Unicode and added garbage collection for better memory management. The Python Foundation also introduced changes in the way the language itself was developed; the development process became more open and included input from the community.

Python 2.7 is the latest (and final) Python 2 release. One feature included in this version is the Ordered Dictionary. The Ordered Dictionary enables the user to create dictionaries in an ordered manner, i.e., they remember the order in which their elements are inserted, and therefore it is possible to print the elements in that order. Another feature of Python 2.x is set literals. Previously, one had to create a set from another type, such as a list, resulting in slower and more cumbersome code.

While these are some prominent features that were included with Python 2.7, there are other features in this release. For instance, Input/Output modules, which are used to write to text files in Python, are faster than before. All the aforementioned features are also present in Python 3.1 and later versions.

Python 3 (aka Python 3.x)

Even though Python 2.x had matured considerably, many issues remained. The print statement was complicated to use and did not behave like Python functions, resulting in more code in comparison to other programming languages. In addition, Python strings were not Unicode by default, which meant that programmers needed to invoke functions to convert strings to Unicode (and back) when manipulating non-ASCII characters (i.e., characters which are not represented on the QWERTY keyboard).

Python 3, which was launched in 2008, was created to solve these problems and bring Python into the modern world. Nine years in, let’s consider how the adoption of Python 3 (which is currently at version 3.6) has fared against the latest Python 2.x release.

The most notable change in Python 3 is that print is now a function rather than a statement, as it was in Python 2. Since print is now a function, it is more versatile than it was in Python 2. This was perhaps the most radical change in the entire Python 3.0 release, and as a result, ruffled the most feathers. Users are now required to write print() instead of print, and programmers naturally object to having to type two additional characters and learn a new syntax. To be fair, the print() function is now able to write to external text files, something which was not possible before, and there are others advantages of it now being a function.

You might think that print becoming a function is a small change and having to type two more characters is not a big issue. But it is one of multiple changes that make Python 3 incompatible with Python 2. The problem of compatibility becomes complicated by the fact that organizations and developers may in fact have large amounts of Python 2 code that needs to be converted to Python 3.

Python 3.6 adds to these changes by allowing optional underscores in numeric literals for better readability (e.g., 1_000_000 vs. 1000000), and in addition extends Python’s functionality for multitasking. (Note that the new features which appear in each successive version of Python 3 are not “backported” to Python 2.7, and as a result, Python 3 will continue to diverge from Python 2 in terms of functionality.)

Should You Care?

It depends. If you are a professional developer who already works with Python, you should consider moving to Python 3 if you haven’t already. In order to make the transition easier, Python 3 includes a tool called 2to3 which is used to transform Python 2 code to Python 3. 2to3 will prove helpful to organizations which are already invested in Python 2.x, as it will help them convert their Python 2 code base to Python 3 as smoothly as possible.

If you are just starting out with Python, your best strategy is to embrace Python 3, although you should be aware that it is incompatible with Python 2, as you may encounter Python 2 code on websites such as stackoverflow and perhaps at your current (or future) workplace.

Conclusion

The overall consideration in 2017 whether to use Python 3 or Python 2 depends on the intended use. Python 2.7 will be supported till 2020 with the latest packages. According to py3readiness.org, which measures how many popular libraries are compatible with Python 3, 345 out of 360 libraries support Python 3. This number will continue to grow in the future as support for 2.7 drops. While Python 2.7 is sufficient for now, Python 3 is definitely the future of the language and is here to stay.

Takeaway: Python 2 is still widely used. Python 3 introduced several features that were not backward compatible with Python 2. It took a while for some popular libraries to support Python 3, but most major libraries now support Python 3, and support for Python 2 will eventually be phased out. Python 2 is still here in 2017 but is gradually on the way out.