Distributing Python Packages¶
Over the past weeks, you’ve been creating python code and putting it into repositories.
At the same time, you’ve been installing code created by other people using
pip
.
Wouldn’t it be nice to be able to install your own code?
Or for that matter, to share code you’ve written with others so that they can install it?
This brings us to the ideas of Packages and Distribution.
Python Packages¶
The code we’ve written so far has been pretty simple in structure.
We’ve limited ourselves to single Python files (modules
) which provide
symbols.
We’ve seen how we can make modules
runnable by adding __main__
blocks, and how we can use these to protect code we do not want to execute
when we import one of our modules.
We’ve also seen how we can import symbols from the modules we write into a Python interpreter, or even into other Python modules we write.
But what happens when our code gets more complex?
What happens when we have a number of modules
all of which are related?
We create packages.
Anatomy of a Package¶
A package is any folder that contains a file with the special name
__init__.py
:
portlets/
├── __init__.py
├── ad.py
├── collection.py
├── events.py
├── support.py
├── thisweekshighlights.py
└── yellowpurple.py
The presence of a file called __init__.py
in the directory called “portlets”
above turns the directory into a Python package.
Your First Package¶
Let’s create a “package” here in class.
Fire up a terminal and create a folder called mypackage
. In it, create
three files. When you’re done, your directory structure should look like this:
mypackage/
├── __init__.py
├── module1.py
└── module2.py
In each of the three files starting with module
, add the following function
(substitute the name of the Python file for <modulename>
):
def whoami():
print "I am <modulename>"
Now, once you’ve saved all that, fire up a Python interpreter in the directory
that contains the mypackage
directory.
>>> import mypackage
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'mypackage']
>>> dir(mypackage)
['__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'module1']
>>> import mypackage.module1
>>> dir(mypackage.module1)
['__builtins__', '__doc__', '__file__', '__name__', '__package__', 'whoami']
>>> mypackage.module1.whoami()
I am module1
>>> import mypackage.module2
>>> mypackage.module2.whoami()
I am module2
The modules and functions within the package are available to us via the package namespace
Turn Data-Structures Into a Package¶
You’ve been building a collection of data-structure implementations in Python.
Let’s use that repository as a gateway to learning Python packaging and distribution.
Our first step is to turn the collection of Python modules we have into a
package
.
At the moment, your data-structures repository should look something like this:
heffalump:packaging cewing$ cd data-structures
heffalump:data-structures cewing$ tree .
.
├── README.md
├── hasher.py
├── linked_list.py
├── test_linked_list.py
├── ll_stack.py
├── queue.py
├── test_queue.py
├── requirements.txt
└── test_stack.py
To turn this collection of modules into a Python package
, all we have to do
is add an __init__.py
file to the repository.
However, it’s worth thinking for a moment about the larger issue of
distributing
our package before we do so.
When you want to share your code with others, there are two basic ways to do so:
- Point them to your repository and say “go to it”.
- Provide them with a
distribution
of your code that they can install.
Clearly, #2 is the more friendly of the two.
But when you do this, it’s not great to have all your program code and all your tests all mashed together in one big blob.
So, let’s fix that for our package by separating the two. We’ll create a pair
of packages
in our repository, one to hold the program code, and one to
hold the tests. Then we’ll move all the program code to the first, and all the
tests to the second:
heffalump:data-structures cewing$ mkdir data_structures
heffalump:data-structures cewing$ mkdir tests
heffalump:data-structures cewing$ git add data_structures tests
...
heffalump:data-structures cewing$ git commit -m "coverting to package"
...
heffalump:data-structures cewing$ git mv test*.py tests/
heffalump:data-structures cewing$ git mv *.py data_structures/
Your directory structure should look something like this:
heffalump:data-structures cewing$ tree .
.
├── README.md
├── data_structures
│ ├── linked_list.py
│ ├── ll_stack.py
│ └── queue.py
├── requirements.txt
└── tests
├── test_linked_list.py
├── test_queue.py
└── test_stack.py
Then, add an __init__.py
file to each of our new sub-directories, turning
them into Python packages
:
heffalump:data-structures cewing$ touch data_structures/__init__py tests/__init__.py
heffalump:data-structures cewing$ tree .
.
├── README.md
├── data_structures
│ ├── __init__.py
│ ├── linked_list.py
│ ├── ll_stack.py
│ └── queue.py
├── requirements.txt
└── tests
├── __init__.py
├── test_linked_list.py
├── test_queue.py
└── test_stack.py
So now we have two packages
, but the issue that we have created is that the
code in our tests no longer has correct imports to get at the program code it
is designed to test. Try it:
heffalump:data-structures cewing$ py.test
...
__________________ ERROR collecting tests/test_linked_list.py __________________
tests/test_linked_list.py:1: in <module>
> import linked_list as LL
E ImportError: No module named linked_list
...
heffalump:data-structures cewing$
The answer to our problem is to turn this set of packages
into a
distribution
.
Python Distributions¶
Packaging and distribution in Python is a contentious issue. Debates rage over the right way. More than one strong-hearted developer has been broken on the rocks of trying to establish a standard that works.
Luckily, there is now a standard that points to the future.
You can just follow it to ensure that you are doing “the right thing™”.
Distutils and Setuptools¶
A distribution
is defined by the packaging glossary as:
A Python distribution is a versioned archive file that contains Python
packages, modules, and other resource files that are used to distribute a
Release. The distribution file is what an end-user will download from the
internet and install.
Distribution of Python packages was first established via a standard library module called distutils However, as packaging needs grew more complex, the limitations of that code led to the creation of a new library to extend it, setuptools.
Both of these libraries work off of the idea of a file called setup.py
,
which is responsible for establishing a set of metadata about a distribution
and the code it contains.
This file contains two main Python statements, an import statement that pulls
the setup
function into the module namespace, and a call of that function,
which builds package metadata.
Creating a Distribution¶
Let’s add such a file to our project code base:
heffalump:data-structures cewing$ touch setup.py
heffalump:data-structures cewing$ ls -1
README.md
requirements.txt
tests
data_structures
setup.py
Then open the setup.py
file in your editor and add the following code:
from setuptools import setup
long_description = """
This is a package that provides some basic data structures implemented in
Python.
"""
Continue adding this to the same setup.py
file:
setup(
name="data-structures",
version="0.1-dev",
description="Basic Data Structures",
long_description=long_description,
# The project URL.
url='http://github.com/<yourname>/data-structures',
# Author details
author='<Your Name>',
author_email='<your.email@domain.com',
# Choose your license
# and remember to include the license text in a 'docs' directory.
# license='MIT',
packages=['data_structures'],
install_requires=['setuptools', ]
)
Installing¶
This will turn our data-structures repository into an installable distribution.
The distribution will provide one Python package called data_structures
Let’s install our package into a virtualenv and try it out.
First we build the virtualenv and install pytest
:
heffalump:data-structures cewing$ mkvirtualenv dsenv
New python executable in dsenv/bin/python
Installing setuptools, pip...done.
[dsenv]
heffalump:data-structures cewing$ pip install pytest
...
[dsenv]
heffalump:data-structures cewing$
Then we execute the setup.py
script with the install
subcommand:
[dsenv]
heffalump:data-structures cewing$ python setup.py install
running install
...
Using /Users/cewing/virtualenvs/dsenv/lib/python2.7/site-packages
Finished processing dependencies for data-structures==0.1-dev
[dsenv]
heffalump:data-structures cewing$
Finally, we can fire up Python and import from our installed pacakge:
>>> import data_structures
>>> from data_structures.linked_list import LinkedList
>>>
However, we still can’t run our tests:
[dsenv]
heffalump:data-structures cewing$ py.test
...
__________________ ERROR collecting tests/test_linked_list.py __________________
tests/test_linked_list.py:1: in <module>
> import linked_list as LL
E ImportError: No module named linked_list
...
[dsenv]
heffalump:data-structures
The reason for this has to do with the format of our import. We are trying to
import the linked_list.py
module as if it were at the top level of our
package, but it isn’t. It’s actually an attribute in the namespace of the
data_structures
package.
Now we can fix that in our test_linked_list.py
file. Edit that file so that we
import our symbols from the right place:
from data_structures import linked_list as LL
def test_init_LinkedList():
"""Assert that a LinkedList object is instantiated."""
linked_list = LL.LinkedList()
assert type(linked_list) == LL.LinkedList
And now, our tests work:
heffalump:data-structures cewing$ py.test
============================= test session starts ==============================
platform darwin -- Python 2.7.5 -- py-1.4.20 -- pytest-2.5.2
collected 12 items
tests/test_linked_list.py ............
...
===================== 0 failed, 12 passed in 0.15 seconds ======================
Developing¶
Installing a package with python setup.py install
is great when you are
working with a finished product.
But your code here is not finished.
Installing makes copies of your code in
$VIRTUAL_ENV/lib/python2.x/site-packages
If you make changes to the code in your project directory, they won’t get picked up.
Setuptools provides a fix for this by allowing you to develop
packages.
To do so, simply run the setup.py
file with a different subcommand,
develop
:
[dsenv]
$ python setup.py develop
This will create a link to your code in site-packages
, but leave the code
in place.
You can continue working and be sure that what you are importing is the latest.