Python Quit Unexpectedly Dialog - Details

Some time ago I had upgraded my work laptop from Mojave to Catalina. The result of that, as a lot of you probably know, is that due to the fact that it now only supports 64-bit applicatons this meant a lot of apps/libraries stopped working, complaining of mismatched dylibs. As a veteran Linux user, I had seen something similar many many times with mismatched .so’s, and knew the standard approach was to just reinstall everything, because with an OS version upgrade, the standard Mac OS libraries probably changed versions/ABIs causing linkage errors/other inconsistencies. Typically that meant just doing brew upgrade and stepping out for a coffee.

Python still complains

A few python things remained persistently problematic, which is why I had these scripts handy ready to blow out various parts of the environment:

#!/bin/bash

pushd ~/Library/Caches/pip && rm -rf http wheels && popd
pushd /usr/local/lib/python3.7/site-packages && rm -rf `find . -name __pycache__` && popd
#!/bin/sh

rm -rf dist/ *.egg-info/ build/ venv/ .pytest_cache/ .mypy_cache/
rm -rf `find . -name __pycache__`

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Despite all that, I still remained flummoxed by the fact that one service I was working on locally that was using snowflake-connector-python was still quitting unexpectedly with Abort trap: 6.

What is Abort trap: 6?

Mac uses this definition of trap; this message appears when the abort() function in the C standard library is called. It is akin to SIGABRT. There are various reasons to do an abort() over doing a standard application shutdown, but typically it’s more controlled than a segfault, but deeper into library/system internals and thus out of control of the developer. An abort() in the Linux kernel for instance will cause a kernel panic.

Getting Detailed Information

Weirdly enough, when abort() is called in an application in Mac OS X, a very helpful dialog titled “$program quit unexpectedly” pops up, and it’s only through there that you can see detailed error messages, despite the fact that the application may have been called on the command line.

Python Quit Unexpectedly Dialog

Either way, clicking on “Report” allowed me to dig in more into the messages to see what was the problem.

Crashed Thread:        2

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
/usr/lib/libcrypto.dylib
abort() called
Invalid dylib load. Clients should not load the unversioned libcrypto dylib as it does not have a stable ABI.

Huh. That’s odd.

Initial Googling

It turns out that I wasn’t the only one encountering this error; many other folks had encountered this, and despite being shown the proper fix, some folks had posted a workaround that consisted of overwriting the library with a symlink to a pinned version. I decided that ultimately that was pretty tacky and not a good long-term solution, since inevitably there was no consistent version to link to, folks would eventually run into the problem again.

I went to go report this as a bug to snowflake, but discovered, in fact, someone had already beat me to it:

Screenshot of snowflake connector issue on Github

Someone traced the potential cause to something in the oscrypto library loading openssl, causing it to crash, but that’s where my trail ended, since no one could figure out a workaround to that, especially since oscrypto was created as a split from asn1crypto.

Since by this time I had realized that there was no easy good fix, and that this was actively blocking my work, I decided that I might as well investigate how to fix this once and for all.

Tracing the source of the bug

The first step to fixing was of course, to identify the offending line of code in the dependency libraries that would trigger it. I use PyCharm as my development IDE at work, and some of its great features include both a way to click through to see original code definitions, and a really powerful graphical debugger that is very similar in usability to IntelliJ’s debugger.

However the debugger is not useful if I can’t figure out where to set possible breakpoints.

I tried tracing through from snowflake.connector.connect() but soon got bogged down since it runs through the requests library as well. I took a step away for lunch/coffee and when I came back, I realized based on the hint above that it was probably a problem in oscrypto, I looked for where oscrypto was referenced in the snowflake library and added two breakpoints; one where it was imported, and one where the function was being used.

First breakpoint

Second breakpoint

Cool, let’s try running the debugger and see if dig further.

GIF/Video of crash after clicking through one breakpoint

Woah that’s weird! It crashed on the import?

Digging into the import

After verifying that it was that one import line (by adding a third breakpoint and seeing if the next import succeeded at all), I decided to dig in further to see how that import was causing problems.

GIF/Video of setting breakpoints

I then stumbled upon how the libcrypto libraries were being determined on Mac, and how they were being loaded, in a file called _libcrypto_cffi.py:

libcrypto_path = _backend_config().get('libcrypto_path')
if libcrypto_path is None:
    libcrypto_path = find_library('crypto')
if not libcrypto_path:
    raise LibraryNotFoundError('The library libcrypto could not be found')

try:
    vffi = FFI()
    vffi.cdef("const char *SSLeay_version(int type);")
    version_string = vffi.string(vffi.dlopen(libcrypto_path).SSLeay_version(0)).decode('utf-8')
except (AttributeError):
    vffi = FFI()
    vffi.cdef("const char *OpenSSL_version(int type);")
    version_string = vffi.string(vffi.dlopen(libcrypto_path).OpenSSL_version(0)).decode('utf-8')

is_libressl = 'LibreSSL' in version_string

Aha! This might be where we’re running into problems! Let’s try the debugger and see if we can isolate the crash:

GIF/Video of isolating the crash

Bingo! We isolated the location of the problem!

Filing and fixing the bug

Screenshot of oscrypto issue on Github

Now that I found where the issue was, I decided to file an issue with the oscrypto project on github. But filing the bug probably was not going to mean it was going to get fixed overnight, so I decided I better investigate to see if I can find make a fix.

Since this seemed to be a Catalina-specific issue, and Apple recommended that we pin to specific versions of the libcrypto dylibs, it seemed like the obvious fix would be to do just that. So I proposed the following change:

# if we are on catalina, we want to strongly version libcrypto since unversioned libcrypto has a non-stable ABI
if sys.platform == 'darwin' and platform.mac_ver()[0].startswith('10.15') and \
        libcrypto_path.endswith('libcrypto.dylib'):
    # libcrypto.42.dylib is in libressl-2.6 which as a OpenSSL 1.0.1-compatible API
    libcrypto_path = libcrypto_path.replace('libcrypto.dylib', 'libcrypto.42.dylib')

and

# if we are on catalina, we want to strongly version libssl since unversioned libcrypto has a non-stable ABI
if sys.platform == 'darwin' and platform.mac_ver()[0].startswith('10.15') and libssl_path.endswith('libssl.dylib'):
    # libssl.44.dylib is in libressl-2.6 which as a OpenSSL 1.0.1-compatible API
    libssl_path = libssl_path.replace('libssl.dylib', 'libssl.44.dylib')

Why 42 for crypto vs 44 for ssl? As per this discussion:

$ for i in `ls libcrypto.*.dylib`; do echo -n $i:; strings $i | grep libressl | head -n1 | cut -d'/' -f9; echo; done
...
libcrypto.42.dylib:libressl-2.6
...
$ for i in `ls libssl.*.dylib`; do echo -n $i:; strings $i | grep libressl | head -n1 | cut -d'/' -f9; echo; done
...
libssl.44.dylib:libressl-2.6
...

We didn’t have a problem with libssl yet, but better safe than sorry!

Testing the fix

This was a little bit tricky, due to the fact that this was in a library pulled as a dependency. So to test this, I cloned a copy of the repo down, added my fix, and installed from that local copy by installing it as an egg:

pip install file:///Users/shh/Development/github/oscrypto#egg=oscrypto

Running the debugger again in fact confirms that the fix works!

Merge and release

As of this writing, I am happy to say that the fix was merged shortly after I made the PR, and released in v1.1.1 of oscrypto! Future users should no longer be running into this problem :).

Thank you for reading!