Debugging a thread-related test failure

Recently I’ve been working on adding support for using nose to run the CommCare HQ test suite. Since CommCare HQ is a Django project, I’m using django-nose. This has been a tedious process of running the tests repeatedly and debugging each test failure encountered along the way. Typically on running the tests, I would see output like this:

$ ./manage.py test --settings=testsettings -x
nosetests --with-fixture-bundling --logging-clear-handlers -x --verbosity=1
...(dot for each passing test here)
Creating test database for alias 'default'...
......E
======================================================================
ERROR: some error message related to a specific test
----------------------------------------------------------------------
Traceback (most recent call last):
  ...
SomeError: message

----------------------------------------------------------------------
Ran 953 tests in 99.163s

FAILED (SKIP=17, errors=1)

First let’s explore the options I’m using on the command line:

  • --settings=testsettings tells Django to use a custom settings file that, among other things has this line: TEST_RUNNER = 'django_nose.NoseTestSuiteRunner'.
  • -x (also --stop) tells nose to stop on the first test failure or error.

At some point I had all the normal unit test cases passing and I had moved on to get doctests passing with the --with-doctest option. Instead of an error message related to a specific test I was seeing this output:

$ ./manage.py test --settings=testsettings -x --with-doctest
nosetests --with-fixture-bundling --logging-clear-handlers -x --with-doctest --verbosity=1
.........................................................................
E
======================================================================
ERROR: test suite for <corehq.util.nose.HqdbContext object at 0x108ee4990>
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../virtualenvs/hq/lib/python2.7/site-packages/nose/suite.py", line 208, in run
    self.setUp()
  File ".../virtualenvs/hq/lib/python2.7/site-packages/nose/suite.py", line 295, in setUp
    self.setupContext(context)
  File ".../virtualenvs/hq/lib/python2.7/site-packages/nose/suite.py", line 314, in setupContext
    try_run(context, names)
  File ".../virtualenvs/hq/lib/python2.7/site-packages/nose/util.py", line 469, in try_run
    return func()
  File ".../commcare-hq/corehq/util/nose.py", line 186, in setup
    super(HqdbContext, self).setup()
  File ".../django-nose/django_nose/plugin.py", line 392, in setup
    self.old_names = self.runner.setup_databases()
  File ".../django-nose/django_nose/runner.py", line 512, in setup_databases
    connection.close()
  File ".../virtualenvs/hq/lib/python2.7/site-packages/django/db/backends/__init__.py", line 196, in close
    self.validate_thread_sharing()
  File ".../virtualenvs/hq/lib/python2.7/site-packages/django/db/backends/__init__.py", line 515, in validate_thread_sharing
    % (self.alias, self._thread_ident, thread.get_ident()))
DatabaseError: DatabaseWrapper objects created in a thread can only be used in that same thread. The object with alias 'default' was created in thread id 140735298687360 and this is thread id 4443189968.
----------------------------------------------------------------------
Ran 861 tests in 60.614s

FAILED (SKIP=17, errors=1)

That’s a little confusing. Let me explain. In the process of setting up the database test context django-nose closes any existing database connections (there’s a smell there, but we’ll leave that for another day). Django was hollering about connection.close() being called from a different thread.

I exhausted options like looking for uses of threading in the codebase to see if we had anything weird going on there, and thought about the difference between running tests --with-doctest, and without that option. Nothing came to mind. I wasn’t using any options to make the test runner use multiple threads, and simply running doctests should not do that either, I thought. I added debugging statements to various places in the django and nose code to find out where in the call stack it was switching threads. The debugging statements looked something like this:

import sys, threading, traceback

# print where we are in the code
sys.__stdout__.write("".join(traceback.format_stack()[-1]))

# print a list of all threads and the current thread
sys.__stdout__.write("%s\n%s\n" % (threading.enumerate(), threading.current_thread())

At first the output looked like this (running on the main thread):

  File ".../virtualenvs/hq/lib/python2.7/site-packages/django/db/backends/__init__.py", line 36, in __init__
    sys.__stdout__.write("".join(traceback.format_stack()[-1]))
[<_MainThread(MainThread, started 140735298687360)>, <ConnectionReaper(Thread-1, started daemon 4533391360)>]
<_MainThread(MainThread, started 140735298687360)>

Later, once the thread had changed it looked like this:

  File "/Users/dimagi/code/virtualenvs/hq/lib/python2.7/site-packages/django/db/backends/__init__.py", line 200, in close
    sys.__stdout__.write("".join(traceback.format_stack()[-1]))
[<_MainThread(MainThread, started 140735298687360)>, <ConnectionReaper(Thread-1, started daemon 4384862208)>, <_DummyThread(Dummy-2, started daemon 4367295184)>]
<_DummyThread(Dummy-2, started daemon 4367295184)>

This process of walking the call stack to the place where the thread changed narrowed the problem down to the test discovery code. A light-bulb came on: the issue was caused by a module import. I knew the doctest runner discovers tests by importing every module in our code base and searching for docstrings containing interactive Python sessions (doctests). My guess was that some code running on module import was causing the interpreter to switch to a different thread.

I decided to write an import hook that would check the current thread and exit immediately on thread change. Here’s the code:

import sys
import traceback
import threading

def print_imports_until_thread_change():
    main = threading.current_thread()
    sys.__stdout__.write("setting up import hook on %s\n" % main)

    class InfoImporter(object):

        def find_module(self, name, path=None):
            thread = threading.current_thread()
            sys.__stdout__.write("%s %s\n" % (thread, name))
            if thread is not main:
                sys.exit()
            return None

    # Register the import hook. See https://www.python.org/dev/peps/pep-0302/
    sys.meta_path.append(InfoImporter())

print_imports_until_thread_change()

The output looked something like this:

...snip hundreds of lines...
<_MainThread(MainThread, started 140735298687360)> corehq.pillows.django_countries
<_MainThread(MainThread, started 140735298687360)> corehq.pillows.celery
<_MainThread(MainThread, started 140735298687360)> corehq.pillows.phonelog
<_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.django
<_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.couchdbkit
<_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.corehq
<_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.optparse
<_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.dimagi
<_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.gevent
<_MainThread(MainThread, started 140735298687360)> gevent.monkey
<_MainThread(MainThread, started 140735298687360)> gevent.thread
<_MainThread(MainThread, started 140735298687360)> gevent.local
<_MainThread(MainThread, started 140735298687360)> gevent.weakref
<_MainThread(MainThread, started 140735298687360)> gevent.copy
<_MainThread(MainThread, started 140735298687360)> gevent.threading
<_DummyThread(Dummy-2, started daemon 4435526352)> _threading_local

Now we’re on to something, and suddenly this is making a lot more sense. The gevent.threading module is importing _threading_local which causes the thread change, and the corehq.preindex.management.commands.gevent module is importing gevent. So we need to exclude corehq.preindex.management.commands from test discovery. After a few more iterations of this process to find other modules that import gevent, I had all tests passing --with-doctest. Success!

 

Thread photo by Tony Hisgett is licensed under CC BY 4.0


 

Start building your own app with CommCare! Click below to get started.

Try CommCare for free!

Share

Tags

Similar Articles

The World's Most Powerful Mobile Data Collection Platform

Start a FREE 30-day CommCare trial today. No credit card required.

Get Started

Learn More

Get the latest news delivered
straight to your inbox