Python Packaging

Neil Shephard
18 September 2023 13:00

Python packaging is in a constant state of flux. There is the official Python Packaging User Guide and the Python Packaging Authority (PyPA) which is probably the best resource to read but things change, and often quickly. The focus here is on the PyPA Setuptools using pyproject.toml which works with Python >= 3.7, but you may wish to consider other packages such as Poetry or PDM which offer some advantages but with additional frameworks to learn.

A few examples of Python packages that I have packaged are listed below, most have also been released to PyPI.

Package Structure

You should place your code within a Git version controlled directory for your project. It is then normal to place all files in an organised hierarchy with a sub-directory of the same name for Python code, known as a "flat" structure and tests under tests directory. It is possible to have more than one directory containing code but for now I'm sticking to the flat structure.

.
    ├── ./build
    ├── ./dist
    ├── ./
    ├── ./my_package
    ├── ./my_package/__init__.py
    ├── ./my_package/module_a.py
    ├── ./my_package/module_b.py
    ├── ./my_package/something/module_c.py
    └── ./tests
        ├── ./tests/conftest.py
        ├── ./tests/resources
        ├── ./tests/test_module_a.py
        ├── ./tests/test_module_b.py
        └── ./tests/something/test_module_c.py

__init__.py

In older versions of Python (<3.3) a __init__.py was required in every directory and sub-directory that was to be a module/sub-module. In more recent versions of Python (>=3.3) they are not essential though as Python uses namespace packages. But in most cases its simpler to include such a file in the top level of your directory. __init__.py files can be completely empty or they can contain code that is used throughout your package, such as setting up a logger.

Configuration pyproject.toml

Package configuration has been and is in a state of flux, there was originally setup.py which was then complemented and gradually replaced by setup.cfg. The new method which is being standardised is pyproject.toml which, with a little tweaking and judicious choice of packages can handle everything.

Setuptools is shifting towards using pyproject.toml and whilst it is still under development its already highly functional. It’s written in Tom's Obvious Minimal Language and isn't too dissimilar in structure to setup.cfg.

A useful reference for writing your configuration in pyproject.toml is Configuring setuptools using pyproject.toml files. It is based around PEP 621 – Storing project metadata in pyproject.toml | peps.python.org.

A bare-bones pyproject.toml file should reside in the top level of your directory with the following (NB This includes the minimum versions and setuptools_scm extension for dynamically setting package version)…

build-system

[build-system]
requires = ["setuptools>=65.6.3", "setuptools_scm[tools]>=6.2", "wheel"]
build-backend = "setuptools.build_meta"

project

This is the main body of the project description detailing name, authors, description, readme, license, keywords, classifiers, dependencies and version amongst other things.

The type of license you have chosen to apply to your package. For guidance see Choose an Open Source License.

The README of your package which may be in Markdown or Restructured Text.

Sets the components of your package which are set dynamically. In this example we only set the version dynamically using setuptools_scm.

The dependencies are those that are required for running the code. They should not include packages that are required for development (e.g. black. flake8, ruff, pre-comit, pylint etc.), nor those required for testing (e.g. pytest, pytest-regtest, pytest-cov etc.), documentation (e.g. Sphinx, numpydoc, sphinx_markdown_table, sphinx-autodoc-typehints, sphinxcontrib-mermaid etc.) as these are defined in a separate section.

Note that the version is not explicitly stated, rather it is defined as being dynamic.

[project]
name = "my_package"
authors = [
  {name = "Author 1", email="author1@somewhere.com"},
  {name = "Author 2", email="author2@somewhere.com"},
  {name = "Author 3", email="author3@somewhere.com"},
]
description = "A package that does some magic!"
license = "GNU GPLv3 only"
readme = "README.md"
dynamic = ["version"]
dependencies = [
  "numpy",
  "pandas",
  "tqdm",
]

All other sections are considered subsections, either of project or tool and are defined under their own heading with [project|tool].<package>[.<options>].

project.urls

These are important as they define where people can find the Source, Documentation and Bug_Tracker amongst other things. There may be more fields that can be configured here but I've not used the yet. Substitute these to reflect where your package is hosted, your username and the package name.

[project.urls]
Source = "https://gitlab.com/username/my_package"
Bug_Tracker = "https://gitlab.com/username/my_package/issues"
Documentation = "https://username.gitlab.com/my_package"

project.optional-dependencies

This is where you list dependencies that are not required for running a package but are required for different aspects such as development, documentation, publishing to PyPI, additional Notebooks and so forth, the options are limitless.

[project.optional-dependencies]
dev = [
  "black",
  "flake8",
  "Flake8-pyproject",
  "pre-commit",
  "pylint",
  "ruff",
]
docs = [
  "Sphinx",
  "myst-parser",
  "numpydoc",
  "pydata_sphinx_theme",
  "sphinx-autodoc-typehints",
  "sphinx_markdown_tables",
  "sphinxcontrib-mermaid",
]
pypi = [
  "build",
  "pytest-runner",
  "setuptools-lint",
  "setuptools_scm",
  "twine",
  "wheel"
]
test = [
 "pytest",
 "pytest-cov",
]
notebooks = [
  "ipython",
  "ipywidgets",
  "jupyter_contrib_nbextensions",
  "jupyterthemes",
]

project.scripts (Entry Points)

Entry points or scripts are a neat method of providing a simple command line interface to your package that links directly into a specific module to provide a command line interface to your programme.

These are defined under project.scripts section.

[project.scripts]
tcx2gpx = "tcx2gpx:process"

tool

tool.setuptools

setuptools is perhaps the most common package for configuring Python packages and is the one that is being exposed here. Its configuration is multi-level depending on which component you are configuring.

tool.setuptools.packages.find

Uses the find utility to search for packages to include, based on my understanding it looks for __init__.py in a directory and includes it (see above note about these no longer being required in every directory). Typically you would want to exclude tests/ from a package you are making as most users won’t need to run the test suite (if they do they would clone from the source repository).

[tool.setuptools.packages.find]
where = ["."]
include = ["topostats"]
exclude = ["tests"]

tool.setuptools.package-data

This allows additional, non .py files to be included, they are listed on a per package basis and are a table (in toml parlance, list in Python terms).

  [tool.setuptools.packages-data]
  topostats = ["*.yaml", "*.json"]

tool.pytest

[tool.pytest.ini_options]
minversion = "7.0"
addopts = "--cov --mpl"
testpaths = [
    "tests",
]
filterwarnings = [
    "ignore::DeprecationWarning",
    "ignore::UserWarning"
]

tool.black

[tool.black]
line-length = 120
target-version = ["py38", "py39", "py310", "py311"]
exclude = '''

(
  /(
      \.eggs         # exclude a few common directories in the
    | \.git          # root of the project
    | \.venv
  )/
)
'''

tool.flake8

The developers of Flake8 will not be supporting pyproject.toml for configuration. This is a shame but a work around is available in the form of Flake8-pyproject. Make sure to add this to your requirements section to ensure it is installed when people use pre-commit.

[tool.flake8]
ignore = ['E231', 'E241']
per-file-ignores = [
    '__init__.py:F401',
]
max-line-length = 120
count = true

tool.setuptools_scm

setuptools_scm is a simple to use extension to setuptools that dynamically sets the package version based on git tags and the version control data. It is important to note that by default setuptools_scm will attempt to bump the version of the release. The following configuration forces the use of the current git tag. See below for more on versioning.

[tool.setuptools_scm]
write_to = "topostats/_version.py"
version_scheme = "post-release"
local_scheme = "no-local-version"
git_describe_command = "git describe --tags"

tool.ruff

ruff is a Python linter written in Rust which is therefore very fast. It provides the same functionality as black, flake8 and pylint and can auto-correct many issues if configured to do so. A GitHub Actions is also available. I'd recommend checking it out.

[tool.ruff]
fixable = ["A", "B", "C", "D", "E", "F", "R", "S", "W", "U"]
unfixable = []

Versioning

Typically the version is defined in the __version__ variable/object in the top-level __init__.py or as a value in [metadata] of either setup.cfg or pyproject.toml but this has some downsides in that you have to remember to update the string manually when you are ready for a release and it doesn't tie in with using tags in Git to tag versions of your commits.

It is worth taking a moment to read and understand about Semantic Versioning which you are likely to use when tagging versions of your software to work with setuptools_scm.

Setuptools-scm

setuptools_scm is simpler to setup and use than versioneer as it relies solely on configuration via pyproject.toml rather than being dependent on now deprecated setup.py.

As shown above you should have set the minimum versions of "setuptools>=45" and "setuptools_scm[toml]>=6.2", dynamic = ["version"] under project and set the write_to = "pkg/_version.py" (NB substitute pkg for your package directory, whether its src or the package name).

[build-system]
requires = ["setuptools>=65.6.3", "setuptools_scm[toml]>=6.2"]

[project]
dynamic = ["version"]

[tool.setuptools_scm]
write_to "pkg/_version.py"
version_scheme = "post-release"
local_scheme = "no-local-version"
git_describe_command = "git describe --tags"

Including Version in Sphinx Documentation

If you have Sphinx documentation you can add the following to ~docs/conf.py~

from importlib.metadata import version
release = version("myproject")
version = ".".join(release.split(".")[:2])

Building your Package

Generate Distribution Archive

In your package directory you can create a distribution of your package with the latest versions of setuptools and wheel. To do this in your virtual environment run the following. The documentation for how to do this is at Building and Distributing Packages with Setuptools.

[build-system]
requires = [
  "setuptools >= 65.6.3",
  "wheel",
]
build-backend = "setuptools.build_meta"

The package can now be built locally with…

python -m pip install --upgrade setuptools wheel
python -m build --no-isolation

…and the resulting package will be generated in the dist/ directory.

Publishing to PyPI

Before pushing the package to the main PyPi server it is prudent to test things out on TestPyPI first. You must first generate an API Token from your account settings page. It needs a name and the scope should be `Entire account (all projects)`. This token will be shown once so do not navigate away from the page until you have copied it.

You use twine to upload the package and should create a .pypirc file in the root of the package directory that contains your API key and the username __token__. For the TestPyPI server it follows the following format.

[testpypi]
  username = __token__
  password = pypi-dfkjh9384hdszfkjnkjahkjfhd3YAJKSHE0089asdf0lkjsjJLLS_-0942358JKHDKjhkljna39o854yurlaoisdvnzli8yw459872jkhlkjsdfkjhasdfadsfasdf

Once this is in place you are ready to use twine to upload the package using the configuration file you have just created.

twine upload --config-file ./.pypirc --repository testpypi dist/*

Testing Download

After having uploaded your package to the TestPyPI server you should create a clean virtual environment and try installing the package from where you have just uploaded it. You can do this using pip and the --index-url and --extra-index-url, the former installs your package from TestPyPI, the later installs dependencies from PyPI.

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ your-package

Once installed you can try running the code, scripts or notebooks associated with the package as you would normally.

Repeat for PyPI

Once you are happy this is working you can repeat the process on the main PyPI server. You can add the token that you generate to /.pypirc under a separate heading.

[testpypi]
  username = __token__
  password = pypi-dfkjh9384hdszfkjnkjahkjfhd3YAJKSHE0089asdf0lkjsjJLLS_-0942358JKHDKjhkljna39o854yurlaoisdvnzli8yw459872jkhlkjsdfkjhdfJZZZZZF
[pypi]
  username = __token__
  password = pypi-dfkjh9384hdszfkjnkjahkjfhd3YAJKSHE0089asdf0lkjsjJLLS_-0942358JKHDKjhkljna39o854yurlaoisdvnzli8yw459872jkhlkjsdfkjhdfJZZZZZF

GitHub Action

Manually uploading is somewhat time consuming and tedious. Fortunately though with setuptools_scm in place and tokens generated we can automate the process of building and uploading packages to PyPI using the GitHub Action gh-action-pypi-publish (read more about GitHub Actions). You will have already generated a PYPI token (and similarly one for test PyPI) and these can stored on the projects GitHub account under Settings > Secrets > Actions with the names PYPI_API_TOKEN and TEST_PYPI_API_TOKEN respectively. You can then add the following GitHub Action under .github/workflow/pypi.yaml.

name: Publish package to PyPi

on:
  push:
tags:
  - v*
jobs:
  build-release:
runs-on: ubuntu-latest
name: Publish package to PyPi
steps:
  - uses: actions/checkout@v3
    with:
      fetch-depth: 0
  - name: Setup Python
    uses: actions/setup-python@v4.3.0
    with:
      python-version: 3.9
      cache: 'pip'
  - name: Installing the package
    run: |
      pip3 install .
      pip3 install .[pypi]
  - name: Build package
    run: |
      python -m build --no-isolation
  - name: Publish package to PyPI
    uses: pypa/gh-action-pypi-publish@release/v1
    with:
      user: __token__
      password: $

Releasing via GitHub

With setuptools_scm in place and a GitHub Action setup and configured it is now possible to make a release to PyPI via GitHub Releases.

  1. Go to the Releases page (its linked from the right-hand side of the front-page).
  2. Draft a New release.
  3. Create a new tag using semantic versioning and select “Create new tag v#.#.# on publish”.
  4. Click the “Generate Release Notes” button, this adds all the titles for all Pull Requests, I'll often remove all these but leave the link to the ChangeLog that is generated for the release.
  5. Write your release notes.
  6. Select “Set as latest release”.
  7. Select “Create a discussion for this releases” and select “Announcements”.
  8. Click on “Publish Release”.

Packaging Frameworks

There are some frameworks that are meant to ease the pain of this process and make it easier. I'm yet to test these for two reasons. Firstly I wanted to understand what is going on rather than learn another framework. Secondly it was an additional framework to learn.

PDM

PDM (Python package and Dependency Manager) handles all stages of setting up and creating a package and managing its dependencies. In essence its a tool for interactively generating the configuration files described above. I've not yet.

Poetry

Poetry is another package for managing packaging and dependencies. Again, I've not yet used it.

Contact Us

For queries relating to collaborating with the RSE team on projects: rse@sheffield.ac.uk

Information and access to JADE II and Bede.

Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group.

Queries regarding free research computing support/guidance should be raised via our Code clinic or directed to the University IT helpdesk.