The Python Ecosystem of 2017
An overview of the major libraries, frameworks, and uses.
Python in 2017
Python is a high-level, versatile, object-oriented programming language. Python is useful and powerful while also being readable and easy to learn. This makes it suitable for programmers of all backgrounds and is likely the reason Python is one of the most widely used programming languages (as of 2017).
The Python ecosystem of libraries, frameworks, and tools is enormous and growing. Python is used for web scraping, data analysis, web development, internet of things development (IoT), machine learning, DevOps, general scientific computing, and many other computing and scripting uses.
At DevelopIntelligence, Python is one of our favorite languages to work with and teach on. More of our clients, each year, are requesting Python training courses for their teams. For this reason, we created this short report to explain why, where, and how Python is used in 2017. We hope this helps teams like yours better appreciate this incredible language and ecosystem. If you’re ever interested in discussing Python training needs for your team or organization, don’t hesitate to contact us.
Strong standard library and plethora of third party modules
Python’s standard library (plain old vanilla Python) is large, powerful, and utilitarian. For everything the standard library doesn’t cover (or do well at), there are thousands of third-party modules and libraries. Awesome Python is a great place to get a sense for what these libraries/modules can be used for. Many of these libraries/frameworks are mature and have been battle-tested for 5-10+ years. This makes Python an attractive choice for startups and established enterprise companies alike.
Compared with other programming languages, Python code is often simple and even elegant to read. The Zen of Python (a set of principles that influence the language) starts like this:
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
This simple, explicit, and beautiful code make Python easier and more enjoyable to work with. Many developers adore Python and pine for the chance to work with it.
There are a number of factors that make Python productive. The aforementioned simplicity, large standard library, and module ecosystem certainly help. Python also benefits from a large community of developers, tutorials, and other resources. This makes it easier to build features and get help on tricky bugs.
Python’s dynamic typing and syntax often lets developers write application features with fewer lines of code than other languages would require. It’s easy to prototype quickly with Python (and use tests to catch runtime errors and bugs).
Internet of Things (IoT) potential
Python is the standard language for the Raspberry Pi computer (it’s in the name). The Raspberry Pi is a small computer used by millions of hobbyists to program things like smart mirrors, phones, and home automation devices. IoT is one of the fastest growing parts of the technology world and is poised to revolutionize industries like agriculture, retail, asset tracking, and more.
Python is one of the most popular languages used in machine and deep learning. These two terms are sometimes used interchangeably but they are subtly different. Before jumping into Python machine/deep learning libraries, let’s first define what these terms mean.
Machine learning (ML) is a subset of Artificial Intelligence (AI) and is a process where a machine ‘learns’ to make predictions by recognizing patterns in large datasets.
Deep learning is a subset of machine learning. Deep learning expands on ML by applying artificial neural networks with multiple layers and parameters to massive datasets. This article on Nvidia’s blog does a great job at explaining this in more depth. This friendly video introduction on machine learning, deep learning, and neural nets also does a great job at explaining these topics:
One prominent example of deep learning was in 2016 when AlphaGo beat the reigning three-time European Go champion Fan Hui by teaching AlphaGo to “discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using a trial-and-error process.”
Let’s take a look at some examples of machine/deep learning frameworks and libraries used with Python.
Scikit-learn is a popular machine learning Python library built on NumPy, SciPy, and Matplotlib to assist with data mining and data analysis. These tools are open source, commercially usable, and flexible for many different contexts.
Scikit-learn can be used to help you classify which category an object belongs to. This is useful in things like spam detection or image recognition. Another important use case of scikit-learn is clustering, which is used for tasks like segmenting customers into different groups.
For more on all of the features, as well as information on algorithms used with each and examples, visit scikit-learn’s website.
Caffe is a deep learning framework developed by Berkeley AI Research (BAIR) along with other community contributors. Caffe is free, open source, elegant, and especially useful for categorizing images.
Caffe has a large following and community in industry and academia. For more documentation, tutorials, and examples, visit Caffe’s website.
TensorFlow is a deep learning library built for numerical computation using data flow graphs. These graphs use the neural networks of deep learning to help you express and analyze patterns in large datasets.
Before you can understand this deep learning library, it’s important to first grasp tensors and graphs. Put simply, TensorFlow creates graphs that represent mathematical operations. It also has tensors that represent multidimensional data arrays. The graph edges, then, show how these tensors communicate among each other.
Tensorflow is especially useful for deep learning but claims to be general enough to have a wide variety of different applications as well. Tensorflow is used by a number of different companies, including Dropbox, eBay, and Snapchat.
Theano is a deep learning library created by MILA lab at the University of Montreal that helps you more simply evaluate large mathematical expressions, including those that involve multi-dimensional arrays.
Theano is great for those beginning to understand deep learning because it’s possible to design layers and a neural network structure, then see what’s happening as your code executes. Theano also models your computations as a graph so you can understand the communication that’s taking place.
One of the most important reasons to use Theano, though, is to speed up your neural network.
It accomplishes this by working together with NumPy, the package mentioned previously, as well utilizing a strong graphics processing unit (GPU). With this, Theano can quickly perform very large calculations.
In use since 2007, Theano has a wide range of possibilities, from its use in the classroom to powering many large-scale computationally intensive scientific investigations. For more information, visit their GitHub, tutorial page, and Developer Start Guide.
Natural Language Processing
The goal of Natural Language Processing (NLP) is to help computers process actual (and ambiguous) human use of language and derive meaning from it in the way humans do. Human language isn’t logical like programming languages, so getting computers to understand it is quite hard. NLP (an component of artificial intelligence) seeks to understand speech, including slang, abbreviations, and different linguistic structures. There are many mature Python libraries/frameworks used in NLP.
The Natural Language Toolkit (NLTK) is a suite of NLP libraries and tools, often used in Academia. NLTK also includes over 50 corpora and lexical resources, which are large datasets of real human language use that play well with machine learning algorithms.
NLTK can be used for NLP tasks like classifying, tokenizing, tagging, parsing and reasoning. NLTK could be used, for example to parse a paragraph for nouns, verbs and other parts of speech. Or it could be used two separate spam or junk mail from real emails.
To learn more about this complex tool suite, check out this guide that the creators of NLTK wrote , Natural Language Processing with Python.
Gensim is a library that helps with analyzing plain-text documents for their semantic structure, thus allowing them to access other documents that are semantically similar. Put more simply, Gensim is useful for trying to figure out what a piece of text is really about and if/how it’s related to other pieces of text.
Referred to as having “scalable statistical semantics,” Gensim is open source and has been in use for over four years. It was initially created to simply return a list of similar articles to any given articles, with Gensim being a shortened version of its purpose to “generate similar.”
It since has become a very robust and efficient way to analyze the semantic structure of plain text and can quickly index documents in their semantic representation in order to retrieve similar documents.
Web development includes creating, deploying, and managing web sites and applications. Python has a number of strong libraries/frameworks that are widely used in web development.
Django is an open source web application development framework. Django’s main goal is to help simplify web development by taking care of a lot of the common application setup tasks and patterns. This covers out of the box tasks like user authentication, site maps, and content administration. This allows developers to focus on building the unique parts of their application.
Django has been in production use for well over a decade and is one of the most mature python frameworks. Numerous large companies/products like Bitbucket, Instagram and Dropbox use Django in production.
Flask is a very popular minimalistic web application framework. Flask is explicit, terse, and simple to get started with.
This ‘microframework’ doesn’t offer the same amount of features as Django, but it does come with a built-in development server/debugger, integrated unit testing support, and support for secure cookies. It’s also Unicode based, extensively documented, and uses Jinja2 templating. For more information regarding Flask, take a look at their website.
Python’s strong standard library and ecosystem of libraries and frameworks makes it very popular for general scientific computing. Python is widely used by mathematicians, statisticians, and scientists.
NumPy has been mentioned multiple times in this article. It has risen to become one of the most popular Python science libraries and just secured a round of grant funding. NumPy’s multidimensional array can perform very large calculations much more easily and efficiently than using the Python standard data types.
SciPy contains many different packages and modules to assist in mathematics and scientific computing. It’s difficult to state a single use case for SciPy considering that it contains so many different useful packages (including Numpy).
Some of the important packages include:
- SciPy library – one of the core packages of the SciPy stack. This includes assistance with scientific computing, including those for numerical integration and optimization.
- Matplotlib – a 2D plotting library that can be used in Python scripts, the Python and IPython shell, web application servers, and more.
- IPython – an interactive console that runs your code like the Python shell, but gives you even more features, like support for data visualizations.
There are many Python modules and libraries for plotting and visualizing data. This section will explore some of the more popular ones.
StatsModels is a Python package that provides tools for working with different statistical models and performing statistical tests. Statsmodels is useful for replacing R functionality in Python and is built on NumPy and SciPy.
Matplotlib is a Python 2D plotting library included within the SciPy ecosystem. As its name suggests, Matplotlib is a library that includes many options for plotting mathematical expressions in various formats. It generates plots, power spectra, histograms, error charts, bar charts, scatterplots, and more, without much code.
Python is a great general-purpose language but is not especially strong at data analysis on its own. However, with libraries like Pandas, cleaning, processing and analyzing large datasets becomes easy and ‘R-like’. Pandas is the most popular Python data analysis library by a long shot.
Pandas (which stands for Python Data Analysis) is an open source data analysis library. Pandas is very fast and used by researchers who have large amounts of data that they need to clean, analyze, and organize. Pandas provides flexible and efficient data structures like Dataframe and Series that make working with tabular data, excel files, and csvs far easier in Python.
As we originally addressed in our guide DevOps Simplified for Non-Technical People, Ansible and SaltStack are both configuration management tools written in Python. Whether you have 10 servers or 10,000, configuration managers are helpful for things like provisioning environments and deploying applications.
Ansible was built in Python and works to simplify your team’s life through automation. Ansible believes that “complexity kills productivity,” so they work to make automation and configuration management more simple. Ansible helps simplify app deployment, workflow orchestration, the app lifecycle, and of course, configuration management.
For more information on Ansible, visit their website.
SaltStack was also developed in Python and accomplishes the same major DevOps goals as Ansible, with some important differences. First, SaltStack isn’t open source and is less customizable. Additionally, it is not agentless, but instead uses what it calls ‘minions’ to communicate among the servers in the given architecture; because of this, it is able to operate more quickly.
SaltStack is a great configuration manager if you have a large infrastructure because it automatically detects if one machine is configured in a different way from the others and alerts you or fixes it, depending on your settings.
It also provides continuous code integration and deployment, deploying the code as soon as it’s ready for production, as well as full-stack application orchestration, helping you manage complicated applications.
SaltStack is also good for scalability if your company needs to potentially expand its number of machines and minions. It’s able to operate on your choice of cloud providers, from AWS to Azure to OpenStack, and can even manage a heterogeneous computing environment.
SaltStack goes in-depth about their software, as well as solutions and community information, on their website.
The Python ecosystem of 2017 is mature, robust, and constantly growing. As you can see, Python can be used for a lot!
Even with all of the variety of Python uses shown here, there are even more that weren’t mentioned. For example, Python can be used for artistic software, home automation, and more. We would need another couple reports to cover all the Python internet of things use cases.
With its ease of use, large talent pool, usability/productivity, and open source status, Python is a great general purpose programming language that can be used for numerous different projects, from powering Instagram, to deep learning, data wrangling, automation, and more.
If your team or organization could ever use help exploring Python more, DevelopIntelligence offers Python courses and learning solutions for teams and organizations around the world.
We leave you with a speech made by Python’s inventor Guido van Rossum about the present and future of Python.