I’ve been enjoying a free little book titled Python for Informatics: Exploring Information. For the most part, it’s a gracefully and clearly written text that was adapted from another excellent free textbook called How to Think Like a Computer Scientist: Learning with Python.
In case you’re not familiar with it, let’s back up and explain a bit about Python. I’m not a bona fide programmer, but I have been (on and off) learning the open-source computer language for a few years now. Python’s well known for being an elegant, user-friendly, all-purpose language that has become a staple in the data science and predictive analytics world. That’s a bit strange because other languages, such as the groovy statistical language R, seem like a more natural fit for that kind of work. However, Python has benefited from having a ton of smart people working on its open-source libraries and packages, and they’ve created (among other things) the Python Data Analysis Library, better known as pandas.
Python also benefits from being widely taught as an introductory programming language in universities. Therefore, when they want to use a programming language to analyze data, many people trained in the natural and social sciences automatically turn to Python.
How to Think Like a Computer Scientist is one of the better places to start for the non-programmer. And, since that book was published under the terms of the GNU Free Document License, it’s widely distributed on the Web. In fact, one of my favorite Python websites is the interactive version of the book.
The author of Python for Informatics, Charles Severance, adapted the How to Think book for his own purposes: that is, to teach his students how to use Python for data analysis. I’ve been amazed at well the first ten chapters of the book serve both as an introduction to general programming (as befits its origin) and as a useful specialized text focused on informatics or, to be more specific, text analysis. Although I come to it with the advantage of knowing the basics of Python, I think I’d grasp most of the first ten chapters even if I were a true neophyte.
In the first ten chapters, Severance uses a minimum of jargon (which is the bane of so many computer books) and keeps his coding examples as simple and readable as possible. You learn the concepts as you need them, always with the goal of gleaning more sophisticated ways of analyzing text.
Severance takes to heart the wisdom of the Zen of Python: “Sparse is better than dense. Readability counts.”
Starting in Chapter 11, Severance delves in the more arcane subject of regular expressions, and the going gets tougher for the beginner. Of course, that’s inevitable in the area of text analytics. Ultimately, all text-analysis roads to “regex,” as these expressions are commonly known (in Python, they’re usually just called “re”).
The later chapters are slower going but cover a range of fascinating and useful topics: how to glean information through HTTP protocols, scrape and spider the Web, tap into the Twitter API, work with Structured Query Language (SQL), visualize online data, and more. The beginner programmer will probably find it a bit daunting but can still push through to the end having learned a ton about the basics of text and Web analytics. And, for the person who already has a basic understanding of Python in general and regex in particular, the learning curve isn’t nearly as steep. I’ve found it well worth the trip.
Prof. Severance is in the process of writing another version of the book geared toward Python 3 since Python for Informatics was written for Python 2. The new book will be welcome because Python 2 reportedly won’t be supported after 2020.
Overall, I recommend Python for Informatics to anyone who has an interest in learning Python in general and text-focused data analysis in particular.