23rd of July, 2020: Following this writeup, I ran across Pipenv. This is a newer, and perhaps a better, way to maintain python environments. Unlike virtualenv where we generate requirements.txt file as an afterthought when needed, Pipenv uses pipenv files (a list of packages like requirements.txt files) to generate the environment, which means you have a portable descriptor of your environment at all times. Also the separation between first-order dependencies and higher-order dependencies are much more clearly delineated in Pipenv, as well as the separation between development dependencies and production dependencies. Read more about it at the Pipenv home page. Thank you Ken Reitz and all the contributors.

Inspired by My friend’s solution to my problem. Check out his blog post for a more detailed implementation of the idea for Mac/Linux users using virtualenv.

The Context

Lets start with some context. I do my work on 2 computers; my laptop and my desktop at work. I keep things synchronised by using Github as an intermediary. I also use python virtual environments (using venv from the standard library) to keep the different sub projects in my PhD isolated and keep me from going to dependency hell. Instead of committing gigabytes of virtual envs. to Github, I just commit the requirements.txt file with package details and use pip to generate/update virtual envs. if/when I need them.

One of the things that kept this setup from being seamless was the fact I had to manually generate the requirements.txt file before I commit to ensure that it is up to date with whatever updates I’ve made to the virtual env. locally. I was talking to my friend Janith about this and he came up with the following setup to automatically generate the requirements.txt file whenever pip is run. Check his blog post for a more comprehensive breakdown of the inner workings of pip and virtual environments. His post is primarily aimed at Mac/Linux users using virtualenv instead of venv on windows.

I am using python version 3.6 and pip version 20. The root path for the virtual environment I’ll be using is /.venv created using python -m venv .venv on windows.

The Fix

For those unaware, venv creates a copy of your system’s python installation. This its own copy of pip, the python package manager. On windows when you run pip it invokes .venv\Scripts\pip.exe so we have to dig a bit deeper to get to a place where we can inject our code. pip.exe seems to run .venv\Lib\site-packages\pip\_internal\cli\main.py. It looks something like this.

"""Primary application entrypoint.
"""
from __future__ import absolute_import

import locale
import logging
import os
import sys

from pip._internal.cli.autocompletion import autocomplete
from pip._internal.cli.main_parser import parse_command
from pip._internal.commands import create_command
from pip._internal.exceptions import PipError
from pip._internal.utils import deprecation
from pip._internal.utils.typing import MYPY_CHECK_RUNNING

if MYPY_CHECK_RUNNING:
    from typing import List, Optional

logger = logging.getLogger(__name__)

def main(args=None):
    # type: (Optional[List[str]]) -> int
    if args is None:
        args = sys.argv[1:]

    # Configure our deprecation warnings to be sent through loggers
    deprecation.install_warning_logger()

    autocomplete()

    try:
        cmd_name, cmd_args = parse_command(args)
    except PipError as exc:
        sys.stderr.write("ERROR: {}".format(exc))
        sys.stderr.write(os.linesep)
        sys.exit(1)

    # Needed for locale.getpreferredencoding(False) to work
    # in pip._internal.utils.encoding.auto_decode
    try:
        locale.setlocale(locale.LC_ALL, '')
    except locale.Error as e:
        # setlocale can apparently crash if locale are uninitialized
        logger.debug("Ignoring error %s when setting locale", e)
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
    return command.main(cmd_args)

Lets try messing with it by changing the end of the file to look like this…

		...
        ...
    print("Look ma! I'm in pip!")
    return command.main(cmd_args)

… and running pip install badpackage gets you something like this…

autogenreq/op1.png

Now that we know the flow of the code, lets get to work. We are going to inject our own bit of code into the file, to generate requirements.txt when pip finishes its other stuff successfully.

We are going to replace the last line of that file return command.main(cmd_args) with the following code.

#1
main_cmd_status = command.main(cmd_args)

#2
if (main_cmd_status == 0) and (cmd_name in ['install', 'uninstall']):
    #3
    req_file_name = "requrements.txt"
    req_file_path = os.path.dirname(os.path.dirname(__file__))
    for i in range(5):
        req_file_path = os.path.split(req_file_path)[0]
    req_file_path = os.path.join(req_file_path,req_file_name)

    #4
    req_file_data = io.StringIO()
    with redirect_stdout(req_file_data):
        #5
        freeze_command = create_command('freeze')
        freeze_cmd_status = freeze_command.main([])

    #6
    with open(req_file_path, 'w+') as req_file:
        req_file.writelines(req_file_data.getvalue())

#7
return main_cmd_status

We also need the following imports so add the following lines to the top of the file.

import io
from contextlib import redirect_stdout

Now when you run pip install or pip uninstall it will auto generate the requirements.txt file. Lets go through how it works. I’ve numbered the code so you can refer back to the exact lines in the code that does a particular thing.

  1. We are going to stop command.main from returning by changing return command.main(cmd_args) to main_cmd_status = command.main(cmd_args).
  2. Then we check if the pip was called to install or uninstall something and if this process ended successfully. That way we only run on successful pip install or pip uninstall calls and not for something like pip list.
  3. If the above condition is true, we set the requirement.txt file path to be the parent of .venv (i.e. the directory where you ran python -m venv .venv).
  4. We redirect std out so we can save the output from pip freeze to a file without using a subprocess call.
  5. Then we are going to invoke the pip freeze command by using the internal API.
  6. We capture the output from pip freeze and write it to the requirements file.
  7. Finally we return the result of the original command so we don’t break anything upstream.

And that’s it! When you run pip install successfully, pip will create an updated requirements.txt file in the same directory the virtual environment is.

However, on windows, at the moment, you have to run pip install command twice for this to happen. Because the exact functionality of pip is hidden in pip.exe without disassembly I dont know what it does after running the user commands. It seems that what happens after is responsible for updating the package list. Which means, until .venv\Lib\site-packages\pip\_internal\cli\main.py exits, whatever changes that were made to the environment while inside the script wont propagate out. For example, if I ran pip list from inside that file. after successfully installing a package, it will show the old package list. Not entirely sure how to get around this at the moment. However, if you had already installed a package, the second invocation of pip install should detect that and only update the requirements file.

You can check Janith’s blog post for something that works for Mac/linux without needing to run the command twice.

As you might expect, this would stop working the next time you update pip, since it’ll overwrite all the files.