Debugging a thread-related test failure

Recently I’ve been working on adding support for using nose to run the CommCare HQ test suite. Since CommCare HQ is a Django project, I’m using django-nose. This has been a tedious process of running the tests repeatedly and debugging each test failure encountered along the way. Typically on running the tests, I would see output like this:
$ ./manage.py test --settings=testsettings -x nosetests --with-fixture-bundling --logging-clear-handlers -x --verbosity=1 ...(dot for each passing test here) Creating test database for alias 'default'... ......E ====================================================================== ERROR: some error message related to a specific test ---------------------------------------------------------------------- Traceback (most recent call last): ... SomeError: message ---------------------------------------------------------------------- Ran 953 tests in 99.163s FAILED (SKIP=17, errors=1)
First let’s explore the options I’m using on the command line:
--settings=testsettings
tells Django to use a custom settings file that, among other things has this line:TEST_RUNNER = 'django_nose.NoseTestSuiteRunner'
.-x
(also--stop
) tells nose to stop on the first test failure or error.
At some point I had all the normal unit test cases passing and I had moved on to get doctests passing with the --with-doctest
option. Instead of an error message related to a specific test I was seeing this output:
$ ./manage.py test --settings=testsettings -x --with-doctest nosetests --with-fixture-bundling --logging-clear-handlers -x --with-doctest --verbosity=1 ......................................................................... E ====================================================================== ERROR: test suite for <corehq.util.nose.HqdbContext object at 0x108ee4990> ---------------------------------------------------------------------- Traceback (most recent call last): File ".../virtualenvs/hq/lib/python2.7/site-packages/nose/suite.py", line 208, in run self.setUp() File ".../virtualenvs/hq/lib/python2.7/site-packages/nose/suite.py", line 295, in setUp self.setupContext(context) File ".../virtualenvs/hq/lib/python2.7/site-packages/nose/suite.py", line 314, in setupContext try_run(context, names) File ".../virtualenvs/hq/lib/python2.7/site-packages/nose/util.py", line 469, in try_run return func() File ".../commcare-hq/corehq/util/nose.py", line 186, in setup super(HqdbContext, self).setup() File ".../django-nose/django_nose/plugin.py", line 392, in setup self.old_names = self.runner.setup_databases() File ".../django-nose/django_nose/runner.py", line 512, in setup_databases connection.close() File ".../virtualenvs/hq/lib/python2.7/site-packages/django/db/backends/__init__.py", line 196, in close self.validate_thread_sharing() File ".../virtualenvs/hq/lib/python2.7/site-packages/django/db/backends/__init__.py", line 515, in validate_thread_sharing % (self.alias, self._thread_ident, thread.get_ident())) DatabaseError: DatabaseWrapper objects created in a thread can only be used in that same thread. The object with alias 'default' was created in thread id 140735298687360 and this is thread id 4443189968. ---------------------------------------------------------------------- Ran 861 tests in 60.614s FAILED (SKIP=17, errors=1)
That’s a little confusing. Let me explain. In the process of setting up the database test context django-nose closes any existing database connections (there’s a smell there, but we’ll leave that for another day). Django was hollering about connection.close()
being called from a different thread.
I exhausted options like looking for uses of threading in the codebase to see if we had anything weird going on there, and thought about the difference between running tests --with-doctest,
and without that option. Nothing came to mind. I wasn’t using any options to make the test runner use multiple threads, and simply running doctests should not do that either, I thought. I added debugging statements to various places in the django and nose code to find out where in the call stack it was switching threads. The debugging statements looked something like this:
import sys, threading, traceback # print where we are in the code sys.__stdout__.write("".join(traceback.format_stack()[-1])) # print a list of all threads and the current thread sys.__stdout__.write("%s\n%s\n" % (threading.enumerate(), threading.current_thread())
At first the output looked like this (running on the main thread):
File ".../virtualenvs/hq/lib/python2.7/site-packages/django/db/backends/__init__.py", line 36, in __init__ sys.__stdout__.write("".join(traceback.format_stack()[-1])) [<_MainThread(MainThread, started 140735298687360)>, <ConnectionReaper(Thread-1, started daemon 4533391360)>] <_MainThread(MainThread, started 140735298687360)>
Later, once the thread had changed it looked like this:
File "/Users/dimagi/code/virtualenvs/hq/lib/python2.7/site-packages/django/db/backends/__init__.py", line 200, in close sys.__stdout__.write("".join(traceback.format_stack()[-1])) [<_MainThread(MainThread, started 140735298687360)>, <ConnectionReaper(Thread-1, started daemon 4384862208)>, <_DummyThread(Dummy-2, started daemon 4367295184)>] <_DummyThread(Dummy-2, started daemon 4367295184)>
This process of walking the call stack to the place where the thread changed narrowed the problem down to the test discovery code. A light-bulb came on: the issue was caused by a module import. I knew the doctest runner discovers tests by importing every module in our code base and searching for docstrings containing interactive Python sessions (doctests). My guess was that some code running on module import was causing the interpreter to switch to a different thread.
I decided to write an import hook that would check the current thread and exit immediately on thread change. Here’s the code:
import sys import traceback import threading def print_imports_until_thread_change(): main = threading.current_thread() sys.__stdout__.write("setting up import hook on %s\n" % main) class InfoImporter(object): def find_module(self, name, path=None): thread = threading.current_thread() sys.__stdout__.write("%s %s\n" % (thread, name)) if thread is not main: sys.exit() return None # Register the import hook. See https://www.python.org/dev/peps/pep-0302/ sys.meta_path.append(InfoImporter()) print_imports_until_thread_change()
The output looked something like this:
...snip hundreds of lines... <_MainThread(MainThread, started 140735298687360)> corehq.pillows.django_countries <_MainThread(MainThread, started 140735298687360)> corehq.pillows.celery <_MainThread(MainThread, started 140735298687360)> corehq.pillows.phonelog <_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.django <_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.couchdbkit <_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.corehq <_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.optparse <_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.dimagi <_MainThread(MainThread, started 140735298687360)> corehq.preindex.management.commands.gevent <_MainThread(MainThread, started 140735298687360)> gevent.monkey <_MainThread(MainThread, started 140735298687360)> gevent.thread <_MainThread(MainThread, started 140735298687360)> gevent.local <_MainThread(MainThread, started 140735298687360)> gevent.weakref <_MainThread(MainThread, started 140735298687360)> gevent.copy <_MainThread(MainThread, started 140735298687360)> gevent.threading <_DummyThread(Dummy-2, started daemon 4435526352)> _threading_local
Now we’re on to something, and suddenly this is making a lot more sense. The gevent.threading
module is importing _threading_local
which causes the thread change, and the corehq.preindex.management.commands.gevent
module is importing gevent. So we need to exclude corehq.preindex.management.commands
from test discovery. After a few more iterations of this process to find other modules that import gevent, I had all tests passing --with-doctest
. Success!
Thread photo by Tony Hisgett is licensed under CC BY 4.0
Start building your own app with CommCare! Click below to get started.
Share
Tags
Similar Articles
Another day, another Zero Day: What all Digital Development organizations should take away from recent IT security news
Even if you don’t work as a software developer, you probably heard about recent, high profile security issues that had IT Admins and developers frantically patching servers over the holidays and again more recently. Dimagi's CTO shares what these recent issues mean for Digital Development organizations.
Technology
January 28, 2022
Join the fight to support critical open source infrastructure
Open Source tools are a critical piece of global infrastructure, and need champions for long term investment
Technology
March 17, 2020
Two big lessons that Iowa and Geneva can teach us about technology in digital development
Last week brought two high profile technology failures into the global spotlight. Although these two mishaps may seem quite different at first glance, they both highlight challenges that are inherent in providing software in the public sector (regardless of locale) and illustrate cautionary lessons worth discussing for practitioners in Digital Development. The Iowa Caucus Debacle
Technology
February 7, 2020