Add context to your error messages!

Something went wrong, please try again later.

Aug 19, 2024

Many of us, one way or the other saw an error message like this:

Something went wrong, please try again later.

If the error doesn't go away, what might be very frustrating on its own, the user have to call the support… All of us know, how daunting the experience could be. And even after the long waiting, the support team would do the troubleshooting and then… the support would suggest to reinstall your Operating System or return\replace the device.

Not all people equally know and care about how the "bits and bytes" work. And for many of the people, such error message luckily might mean the error will be fixed… somehow… on its own…

Also, one might see error messages like this:

"Failed make an RPC request to "http://backend.example.com (Status Code: 500)".
"Failed to parse JSON message on line 347".
"Server returns \r\r where \r\n is expected".

That kind of error message also do not bring much value even to skilled software engineer. Because complexity of the system might be to high to quickly jump into the right conclusion.

And, it seems like we're loosing some value there… and what does it have to do with context in error messages?! What does the error message lack of is the context!

Consider the following Python function which is supposed to fetch user information basing on user id and the given HTTP client:

def show_user_name(session):
    try:
        user_name = fetch_user_name(session.client, session.user_id)
        show('User Name: {}'.format(user_name))
    except Exception:
        show('Error: Something went wrong, please try again later.')

The possible outcome of the function is to print either the user name or an error message w/o any extra information on kind of error we have. The current solution's built in a such way, that it's hard to add any meaningful information which comes from fetch_user_name function. But, we still can improve the side:

logger = logging.getLogger(__name__)

def show_user_name(session):
    try:
        user_name = fetch_user_name(session.client, session.user_id)
        show('User Name: {}'.format(user_name))
    except Exception as e:
        logger.exception(
            'while fetching user name for user id {}.'.format(session.user_id)
        )

        show('Error: {} while fetching user name for {}'.format(repr(e)), user_id)

As you can see, we added few statements here and there and here're the improvements:

The error message contains the information about the user id and what kind of exception is occurring during fetching the information. In case developers receive a ticket, they already have much more context to address the problem quicker w/ fewer turn-arounds.
The logger statement adds very useful context information for developers… Stacktrace! Which on its own might contain a lot more, but might be less useful to the end users.
Logging provides useful tool for developers to proactively solve the problems and not to wait for the user feedback or support ticket. Because the errors can be addressed earlier, the cost of software development is lower.

Now, let's take a look into the function implementation.

def fetch_user_name(client, user_id):
    response = client.request(
        "/api/user/{}".format(user_id)
    )

    serialized_response = json.loads(response)

    return serialized_response['preferred_name']

From the first sight, there nothing bad. But, does it give enough information about the error? It is hard to say:

What kind of exceptions does client.request raises?
What do we do in case of failed JSON parsing?
What if the field is not present in the response?

The code might be doing an absolutely correct and expected thing (take a look into the unit-tests of the function to make sure), but there might be also an alternative view:

def fetch_user_name(client, user_id):
    try:
        response = client.request(
            "/api/user/{}".format(user_id)
        )
    except http_client.Error as e:
        raise Error(
            'Failed to fetch user info for client {} and user id {}.'.format(
                repr(client), repr(user_id)
            )
        ) from e

    try:
        serialized_response = json.loads(response)
    except json.JSONDecodeError as e:
        raise Error(
            'Failed to parse JSON response: {}, for client {}, user id {}'.format(
                repr(response),
                repr(client),
                repr(user_id)
            )
        ) from e

    try:
        return serialized_response['preferred_name']
    except KeyError as e:
        raise Error(
            """
            Failed to retrieve 'preferred_name' from the response.
            Available Keys {}.
            Response: {}.
            User id: {}.
            """.format(
                serialized_response.keys(),
                repr(response),
                repr(user_id)
            )
        ) from e

Now, developers and clients have much more available information stored in the context:

from e allows Python to "chain" exceptions and keeps stacktrace history. When such exception is logged, there is all information on the original exception. See Python Docs.
Each of the handler is explicit and gives a clue that those situations might happen an developers know about them, just because they explicitly handle them.
The information, given to the function, is stored in the error message, what make understanding of the issue easier.

The result function is a great improvement on providing context, but has a couple problems:

Function body is longer: Seems like not a big issue, taking into account the cognitive load is about the same.
Duplicates of the information on call-site and function. In case, you own both you can always eliminate the duplicate, but when you don't know, it's better to log more than less?
The function "records" the same information multiple times.

Some of the problems can be addressed knowing enough information about who and how uses the function. As an example:

def fetch_user_name(client, user_id):
    try:
        return _fetch_user_name_internal(client, user_id)
    except Error as e:
        raise Error(
            '{} while fetching user name for user id {} and client {}.'.format(
                repr(e),
                repr(user_id),
                repr(client)
            )
        ) from e

def _fetch_user_name_internal(client, user_id):
    try:
        response = client.request(
            "/api/user/{}".format(user_id)
        )
    except http_client.Error as e:
        raise Error(
            '{} while making api call to /api/user/<user_id>'.format(repr(e))
        ) from e

    try:
        serialized_response = json.loads(response)
    except json.JSONDecodeError as e:
        raise Error(
            '{} while serializing response {} into JSON'.format(repr(e), response)
        ) from e

    try:
        return serialized_response['preferred_name']
    except KeyError as e:
        raise Error(
            '{} while getting "preferred_name" from response: {}'.format(
                repr(e),
                repr(response)
            )
        )

Now we have fewer duplicates of the information and the functions are smaller.

As a results of the improvements we have much more information in case of errors. User has something to act on or to share w/ the support and developers can easier find the information in the logs and even proactively fix the errors for some users.

Add context to your error messages!

Something went wrong, please try again later.

Discussion about this post