Moving to a new home

After many years of not blogging, I’ve decided to go back into the habit of writing about technology, now that I am offering my services as a freelance consultor in Technical Leadership and  Software Team Acceleration.

I’ve created a new home for my endeavours: https://carles.barrobes.com/pro/ – check it out!

Advertisement

APItoPy – a Pythonic way to access HTTP APIs

How apitopy came to be

I recently had the need to create a Python client for an HTTP API (to be more specific, for Sprint.ly).

There was an existing implementation which did not work for us (it was failing due to authentication problems) based on urllib2. I knew better and was considering rewriting the client using requests.

At the same time, for a personal project I was working on a tool that will eventually need to talk to multiple APIs. So I thought that taking advantage of some nice dynamic Python features I could write a quick universal HTTP API client that still looked like accessing Python objects to the casual user.

This is roughly how I wanted the Python API to look (these examples are made-up and not based on any service I know of):

api = Api('http://api.example.com', ...)
data = api.people[24].projects(status='cancelled')
print(data[0].name)

This should generate a GET request to http://api.example.com/people/24/projects?status=cancelled and print the name of the first cancelled project.

An hour of hacking later and thanks to Python’s great __getattr__ and __getitem__ I had a working client that looked like a bespoke implementation to the casual eye. And it worked better than the bespoke client I was trying to use in the first place. The whole library needs a few more than 100 lines of code (not counting tests).

This is an example usage accessing the Sprint.ly API:

from apitopy import Api

sprintly = Api('https://sprint.ly/api/', (USER, TOKEN),
                verify_ssl_cert=False, suffix='.json')
# initialise the API. Sprint.ly does not honor content negotiation,
# you must add the ".json" suffix to API requests

product = sprintly.products[9122]
# generates an endpoint https://sprint.ly/api/products/9122
# but doesn't perform any HTTP request yet

items = product.items(assigned_to=2122, status='in-progress')
# HTTP GET https://sprint.ly/api/products/9122/items.json?assigned_to=2122&status=in-progress
# Returns a list of parsed JSON objects

for item in items:
    print(u"#{number:<4} {type:8} {status:12} {title:40}".format(
        **item
    ))

Some Internals

The key to implementing access to arbitrary attributes in a Python class is to override __getattr__. __getattr__ gets called when the interpreter does not find an attribute in the usual places, meaning it is the fallback method before throwing an AttributeError.

In order to implement a custom access to items (var[index]) you must implement __getitem__.

Our implementation of EndPoint makes use of __getattr__ and __getitem__, by making them behave the same. This allows for using the [] notation when you want to access a part of the URL that will vary in your code (and you want to populate from a variable) or that contains characters that are not valid as a Python identifier (e.g. all numbers). I find that numbers, which typically represent object identifiers, fit the [] notation better anyway in this case.

class EndPoint(object):
    """
    A potential end point of an API, where we can get JSON data from.

    An instance of `EndPoint` is a callable that upon invocation performs
    a GET request. Any kwargs passed in to the call will be used to build
    a query string.
    """

    def __init__(self, api, path, suffix=''):
        self.api = api
        self.path = path
        self.suffix = suffix

    def __getitem__(self, item):
        return EndPoint(self.api,
                        '/'.join([self.path, str(item)]),
                        self.suffix)

    def __getattr__(self, attr):
        return self[attr]

    def __call__(self, **kwargs):
        extra = ''
        if kwargs:
            url_args = ['{0}={1}'.format(k, v) for k, v in kwargs.items()]
            extra = '?' + '&'.join(url_args)
        return self.GET("{0}{1}{2}".format(self.path, self.suffix, extra))

    def GET(self, url):
        response = self.api.GET(url)
        return dotify(response.json())

Get it, use it, and send your feedback

You can find the whole library in Github

It’s in PyPI, so installation is easy:

pip install apitopy

Further Work

I’m specially interested in extending apitopy to support other HTTP verbs (at least POST, PUT, DELETE) and still thinking of the best syntax for that. How would you like that code to look like, from the point of view of the client? Suggestions are welcome.

Another open point is how to make autocompletion work on a shell like ipython. That would be a killer feature, it would need to rely on properly hyperlinked discoverable APIs. I’m all up for hearing your opinions on this as well.

Pretenders – fake servers for testing

Finally, we released pretenders to the general public, hooray! This is a project I have been developing with my friend and now ex-colleague Alex Couper. It has been a very interesting piece of work, and I am really glad it is out for other test-minded people to enjoy.

Pretenders are fake servers for testing purposes. They are used to mock external servers your code interacts with (such as HTTP APIs or SMTP servers). Mocking is done at the protocol level, making these appropriate for integration test environments, and usable to test applications written in any language (not just Python, which is the language we used to write pretenders).

As a starter, here are the slides for the lightning talk I gave at PyconUK 2012:

Pretenders is an open source project. You fill find the source code in Github, and the documentation in readthedocs. As it is just fresh out of the oven, it has some rough edges, mostly around documentation. Feedback and contributions are welcome.

Example usage

In order to use pretenders in your tests you will have to start a main server we call boss. The boss will spin off various fakes (pretenders) on demand, and assign them a free port from a configured range. The following examples assume a running pretenders boss on localhost at port 8000.

This is a taste of how you would write a test using pretenders to mock an external HTTP API your code depends on:

from pretenders.client.http import HTTPMock

# Assume a running boss server at localhost:8000
# Initialise the mock client and clear all responses
mock = HTTPMock('localhost', 8000)

# For GET requests to /hello reply with a body of 'Hello'
mock.when('GET /hello').reply('Hello')

# For the next POST  or PUT to /somewhere, simulate a BAD REQUEST status code
mock.when('(POST|PUT) /somewhere').reply(status=400)

# For the next request (any method, any URL) respond with some JSON data
mock.reply('{"temperature": 23}', headers={'Content-Type': 'application/json'})

# Point your app to the pretender's URL, and exercise it
set_service_url(mock.pretend_access_point)  # how you do this is app-specific

# ... run stuff

# Verify requests your code made
r = mock.get_request(0)
assert_equal(r.method, 'GET')
assert_equal(r.url, '/weather?city=barcelona')

Similarly, a test for an application that sends e-mails, by mocking the SMTP server:

from pretenders.client.smtp import SmtpMock

# Create a mock smtp service
smtp_mock = SMTPMock('localhost', 8000)

# Get the port number that this is faking on and assign as appropriate to the 
# system being tested (how yo do this will again depend on your application)
set_stmp_host_and_port("localhost", smtp_mock.pretend_port)

# ...run functionality that should cause an email to be sent

# Check that an email was sent
email_message = smtp_mock.get_email(0)
assert_equals(email_message['Subject'], "Thank you for your order")
assert_equals(email_message['From'], "foo@bar.com")
assert_equals(email_message['To'], "customer@address.com")
assert_true("Your order will be with you" in email_message.content)

Using Git submodules for unified build scripts

I have recently worked on unifying the build scripts for many of our software components at Glasses Direct.

At each repository for a component, we typically have two files build.sh (for developer builds, relying on an existing virtualenv) and jenkins-build.sh (for Jenkins builds, where the Jenkins job re-creates the virtualenv if it does not exist or if the requirements files have changed since last build – we rely on the md5 checksums for that).

We already had sort-of-standardised scripts, but they were replicated in many repos. It defeated the point of having a single source for these. I decided to give git submodules a try, as a way to make code from one repo (our build scripts) easily accessible from others.

Adding the submodule

Our build scripts are (far than ideally) in a specific branch buildtools of a repo called libraries.

Here is how we add that submodule:

git submodule add git@github.com:glassesdirect/libraries.git

If we run git status we will see git has added a .gitmodules and a libraries directory.

Now we want to point to the desired branch:

cd libraries
git checkout buildtools
cd ..
git add libraries  # for the branch change

Once that is done, we can commit and add the changes.

The resulting scripts

Now the build scripts in each repo can look as simple as… build.sh:

#!/bin/bash
source libraries/buildtools/build-functions.sh
do_build lab_export_service

And jenkins-build.sh:

#!/bin/bash
source libraries/buildtools/build-functions.sh
do_jenkins_build $*

The bulk of our build scripts is now in libraries/buildtools/build-functions.sh. There we can have the basic building blocks as bash functions, such as:

function check_if_requirements_changed() {
# Verify whether any of the PIP requirements file has changed.
# This may save us the creation of a virtual environment if all is equal.

   MD5FILE=".requirements.md5"
   newMD5=`md5sum *requirements.txt | md5sum`
   oldMD5=`cat $MD5FILE`

   if [ "${oldMD5}" == "${newMD5}" ]
   then
      REQS_CHANGED=0
   else
      echo "Requirements files changed."
      REQS_CHANGED=1
   fi
}

etc.

Running PyPy in a virtualenv

This is a quick guide to running your [python/django] project on PyPy, the fast JIT-based Python interpreter (and optionally benchmark re: cPython)

Install PyPy

Follow the clearly detailed instructions in: http://pypy.readthedocs.org

For example, for 32-bit linux I used:

$ wget https://bitbucket.org/pypy/pypy/downloads/pypy-1.8-linux.tar.bz2
$ tar xvf pypy-1.8-linux.tar.bz2

This will create a folder named pypy-1.8, which matches Python 2.7.2:

$ ./pypy-1.8/bin/pypy --version
Python 2.7.2 (0e28b379d8b3, Feb 09 2012, 19:41:19)
[PyPy 1.8.0 with GCC 4.4.3]

Create your virtual environment

Install distribute and pip for pypy:

NOTE: These instructions no longer work.
http://python-distribute.org is a domain for sale.

$ # wget http://python-distribute.org/distribute_setup.py  ## OBSOLETE
$ wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py
$ ./pypy-1.8/bin/pypy distribute_setup.py
$ ./pypy-1.8/bin/pypy get-pip.py

Make sure you have at least virtualenv 1.6, else it won’t work with PyPy:

$ virtualenv --version
1.6.4

Install virtualenvwrapper, and create a new virtual environment for your projectx:

$ ./pypy-1.8/bin/pip install virtualenvwrapper
$ mkvirtualenv --no-site-packages --distribute --python=/path/to/pypy-1.8/bin/pypy projectx-pypy
$ python --version
Python 2.7.2 (0e28b379d8b3, Feb 09 2012, 19:41:19)
[PyPy 1.8.0 with GCC 4.4.3]

Now we have a virtual environment named projectx-pypy based on pypy that we can use as just any other virtualenv.

Setup your project

Install your project’s requirements as usual:

$ workon projectx-pypy
$ pip install -r requirements.txt -r test-requirements.txt

Benchmark if desired

Now I can easily compare timings between cPython and PyPy. I created another identical virtualenv based on cPython (2.7.2) and run the same tests (one testcase replicated 100000 times using nose test generators, just to have something time consuming and CPU-intensive).

PyPy:

$ workon projectx-pypy
$ nosetests
...
----------------------------------------------------------------------
Ran 100000 tests in 8.624s

cPython:

$ workon projectx
$ nosetests
...
----------------------------------------------------------------------
Ran 100000 tests in 38.180s

Of course this is not a representative sample, but just a simple test. In this one case, PyPy takes approximately 20% the execution time of cPython. Not bad, huh?

The PyPy people have more comprehensive benchmarks in their speed center.

So give PyPy a try, and share your results.