Monday, March 7, 2011

On the matter of exceptions

There was a post recently on the visual c++ blog about exceptions in which some of the comments were interesting. It eventually ended up being about four things:
  • Exceptions vs. return values
  • Throwing objects vs. throwing fundamental types
  • Exception hierarchies vs. embedding values
  • Lack of a finally construct

Exceptions vs. return values: a database lookup

[update 9-mar-2010, thanks to G: The following is a simplified example of a database query to illustrate the different error handling strategies. A real program would have to handle many more different types of errors. However, I consider them variations on the same theme: logic errors, runtime errors and exceptional paths.

Additionally, programming is a job like others and therefore has constraints that usually have a higher priority, such as a delivering date or existing coding standards. However, paying attention to error handling will make it easier in the long term to maintain a codebase.]

A function get_item() searches a database for the item corresponding to a given id. It may encounter three problems:
  1. the backend finds the SQL statement to be malformed;
  2. the backend reports an error during the query (such as a broken connection); or
  3. the item is not found in the database.
 These three errors are on three different levels:
  1. this is a logic error that is preventable. It is a result of a programming error;
  2. this is a runtime error that is not preventable. It is a result of the environment in which the program is running being in an bad state;
  3. this is a runtime error that may or may not be preventable, depending on the context.
Error handling for these errors mostly depends on the application (especially the third one), although they happen frequently enough so that we can make some assumptions.


Let's try three different ways of dealing with these errors. I am assuming that the database backend uses return values for error checking, which simplifies the examples. I am also using the following functions:

std::string make_query(primary_key id)
{
  std::ostringstream oss;
  oss << "select * from items where id=" << id;
 
  return oss.str();
}

int make_item(resultset& rs, item& i)
{
  if (rs.empty())
    return item_not_found;
  
  // somehow take the values from the resultset and set the item
  
  return 0;  
}

The various types (resultset, primary_key, db_statement, etc.) are not important. Just assume they are defined somewhere.

Asserts

While asserting is not always the right choice, it is important for logic errors. When a precondition or a postcondition is not respected, it is because the content of a function is not coded correctly. In this case, it is helpful to inform the programmer immediately.

item get_item(primary_key id)
{
  int r = 0;
  
  db_statement s;
  r = db_create_statement(make_query(id), &s);
  assert(r != db_bad_query);
  
  resultset rs;
  r = db_execute(s, &rs);
  assert(r != db_not_connected);
  
  item i;
  r = make_item(rs, i);
  assert(r != item_not_found);
  
  return i;
}

void f(primary_key id)
{
  item i = get_item(id);
  // use i
}

Asserting for the first error is the correct choice. On some platforms, assert() will break directly into the debugger, allowing the programmer to verify what make_query() returned and to fix the SQL error.

The two subsequent asserts are wrong. Terminating the program because the connection to the database failed or because the given item was not found is a bad solution, especially if the latter was because the user searched for a non-existing item.

However, the code that uses get_item() is simple. All the error checking is done in get_item(), which means that f() is clean because it knows that if get_item() returns, there were no problems. Since execution stopped in get_item() in case of problems, no error checking whatsoever needs to be done in f().

Return values

Return values allow for fine-grained error checking. By returning discrete values, the programmer can know exactly what went wrong. Dealing with return values, however, is a pain. It is often hard to know exactly which values can be returned in what context. The same value can be returned from two different functions in two different circumstances. An additional problem is that they prevent a function from naturally returning the value it is computing, forcing the user to pass a reference to the return value (like make_item() is doing). This has two problems:
  1.  It forces an object to be default-constructed first and assigned later (the so-called "two-phase construction") instead of being constructed with initial values. This has two implications:
    • The class must have a meaningful default constructor even though a default-constructed object might not have a valid use. It allows users of the class to default-construct objects in situation where this might be invalid. This can be offset by making the default constructor private and making the class a friend of the contexts where default construction is valid. However, this adds coupling and complexity.
    • It may be inefficient for an object to be first default-constructed and then assigned to. A classic example is a string object for which the default constructor might allocate some memory for, let's say, 20 characters, only to find itself re-allocating memory in during a subsequent assignment of 50 characters.
  2. It prevents the user from using a const object. There are two ways around this:
    • Create a const object using the default constructor and const_cast it. This is undefined behavior as per §7.1.5.1(4): [...] any attempt to modify a const object during its lifetime results in undefined behavior.
    • Create a non-const object and copy-construct a const object with it:

      item temp_i;
      make_item(rs, temp_i);
                  
      const item i = temp_i;

      This is a mess: it requires a default constructor and a copy constructor, it introduces an unneeded name into this scope, it might be inefficient and it adds complexity to code that has no reason to be complex.
Let's look at the same get_item() with return values. As a way of isolating the code using get_item() and the implementation details of the database backend, get_item() will return its own bad_query and bad_db errors.

int get_item(primary_key id, item& i)
{
  int r = 0;
  
  db_statement s;
  r = db_create_statement(make_query(id), &s);
  if (r == db_bad_query)
    return bad_query;
  
  resultset rs;
  r = db_execute(s, &rs);
  if (r == db_not_connected)
    return bad_db;
  
  r = make_item(rs, i);
  if (r == item_not_found)
    return item_not_found;
  
  return all_okay;
}

void f(primary_key id)
{
  item i;
  int r = get_item(id, i);
  
  if (r == bad_query)
    std::cout << "bad query\n";
  else if (r == bad_db)
    std::cout << "bad db\n";
  else if (r == item_not_found)
    std::cout << "item not found\n";
  else if (r == all_okay)
    // use i
  else
    // ??
}

Here, f() knows exactly what went wrong in get_item() and can deal with the various error states. It might assert in response to bad_query, attempt to reconnect to the database for bad_db and display a user-friendly message for an item_not_found.

However, this has three major problems:
  1. The error checking code is four times larger than the normal execution code. It makes the function harder to understand because the normal path is mixed with the exceptional path. It also relies on the function to be documented properly and not to return an unexpected value.
  2. The error handling part mixes the different levels of errors. It seems to deal simililarly with logic errors, environmental errors and runtime errors, although the steps needed to recover from each is likely to be drastically different.
  3. It duplicates the error handling in all functions calling get_item(). Every function that deals with get_item() needs to check its return value to decide what to do. Although it is possible to define deal_with_get_item() that would handle common cases and return false for errors it does not know about, this adds a layer of complexity and makes the exceptional path harder to follow.

    Is is also possible to have f() propagate errors such as bad_db upwards, but this forces all the other functions in between to propagate errors correctly. This is error-prone and may in fact be impossible if one of these functions is part of a library.

Exceptions

Dealing with an exception is even more verbose than with a return value. You need to add an exception class (I'm ignoring for now throwing fundamental types), a try construct and one or more catch constructs for each exception that might be thrown. Depending on your coding style, it may also indent the code one more level.

However, for normal execution, it has the advantage of not using up the return value of the function, allowing to naturally return the value computed by the function.

class bad_query {};
class bad_db {};
class item_not_found {};

item get_item(primary_key id)
{
  int r = 0;
  
  db_statement s;
  r = db_create_statement(make_query(id), &s);
  if (r == db_bad_query)
    throw bad_query();
  
  resultset rs;
  r = db_execute(s, &rs);
  if (r == db_not_connected)
    throw bad_db();
  
  item i;
  r = make_item(rs, i);
  if (r == 0)
    throw item_not_found();
  
  return i;
}

void f(primary_key id)
{
  try
  {
    item i = get_item(id);
    // use i
  }
  catch(bad_query&)
  {
    std::cout << "bad query\n";
  }
  catch(bad_db&)
  {
    std::cout << "bad db\n";
  }
  catch(item_not_found&)
  {
    std::cout << "item not found\n";
  }
}

Error checking here takes 15 lines, whereas normal execution needs 2. The advantage is that, as for the assert case, the error-free case is simple. It also allows f() to safely ignore some exceptions so they can be propagated higher up, even if the functions in between know nothing of that exception (or of exceptions at all). However, this is more than offset by the verbosity of error handling and does not solve some of the problems with return values, namely the mixing of error levels and the duplication of error handling code.

Blending error managing constructs

Let's recap:
  1. Assertions makes error handling as simple as it gets: nothing. Its major drawback is of course unconditional program termination.
  2. Return values prevents from returning the computed value and adds more complexity to error handling, but allows for dealing with every specific case. This in turn duplicates error handling code in every function.
  3. Exceptions, like assertions, makes the normal execution path simple but at the cost of making error handling a lot more verbose. It propagates errors automatically and allows returning the computed value, but cannot easily return an error value such as item_not_found.
Before continuing, let's try to come up with the requirements of error handling:
  1. The normal execution path needs to be free of error handling.
  2. Logic errors should terminate the program, allowing the programmer to easily fix them.
  3. Errors of the same level should be handled in a single place.
  4. Error handling should not impose any additional requirements.
Obviously, a mix of the different error handling scenarios is needed. By using asserts, return values and exceptions at the same time, all these requirements can be met.

Let's get back to get_item().

If the SQL statement is malformed, an assertion is needed because this is a logic error. It is a bug that needs to be fixed. Passing this error on to the calling function makes no sense, because nothing can be done. This is an error state and should not impact the calling code.

If the database connection is broken, an exception needs to be thrown. This way, execution can be unrolled until a function can deal with it. This function is usually at a higher level and can 1) reconnect to the database and 2) either retry the operation or bail out. This is also an error state and should not impact the calling code.

If the item is not found, either return values or exceptions can be used, depending on what it means. If the item is assumed to exist, this is also an error state which should not impact the calling code. Therefore, it should be an exception. Like connection issues, this may need to be dealt with at a higher level. If the item may or may not exist, this is part of the normal execution path and should use a return value.

The problem of return values

Not every function can return an invalid value naturally. If get_item() returns an item, how can it return an invalid item?

There are several ways of dealing with this, assuming that returning an invalid item is part of the normal exection path (if it is not, throwing an exception is more appropriate). They all involve some kind of flag in the object signaling a bad value. At its simplest, this is a boolean flag. It might also be possible to use an existing member variable as a flag.

In the case of the item class, it might have the id of the item that came from the database. Since most database do not use 0 as a valid primary key, setting the id to 0 might be a way of signaling a bad item. Therefore, checking for item::id() == 0 might do the trick. However, this is relying on an implementation detail. Another way would be to have call item::is_bad(), which would do the same check internally, but might be modified to use an additional boolean flag for databases that may use 0 as a valid primary key.

There remains the problem of creating such an item. Creating an invalid item object might rely on a special constructor. It is good practice to hide this constructor so that users do not inadvertantly create such an object. However, this requires functions such as get_item() to be friends.

An elegant solution is a static member function which returns a bad item object. This:
  1. gives a reliable way of representing an invalid item, since the details of what makes an invalid item are encapsulated.
  2. avoids making public any way of constructing such an item.
  3. eliminates superfluous member functions that deal with error checking if the class has operator==() and operator!=() defined for it.
Let's see what an item class could look like:

class item
{
public:
  // returns a bad item
  static item bad();

private:
  // constructs a bad item
  item();
};

bool operator==(const item& left, const item& right);
bool operator!=(const item& left, const item& right);

void f(const item& i)
{
  if (i == item::bad())
    std::cout << "bad item\n";
}

This scheme needs to be used with caution. Classes that have no invalid state are easier to use and maintain. However, being able to return a "bad" object can make error handling simpler, especially when bad objects are part of the normal execution path.

Putting this together

Let's revisit make_item() and get_item() so they can use the various error checking constructs mentionned so far.

Note that the error handling strategy used by the database backend in this example matters not. Whether it uses exceptions or return values, it can be wrapped so that it behaves the same way.

This also assumes that not finding an item is not an error.

item make_item(resultset& rs)
{ 
  // nothing was found, return a bad item
  if (rs.empty())
    return item::bad();
  
  // somehow take the values from the resultset and set the item, preferably
  // using a constructor
  return item( ... );
}

item get_item(primary_key id)
{
  int r = 0;
  
  db_statement s;
  r = db_create_statement(make_query(id), &s);
  assert(r != db_bad_query);
  
  resultset rs;
  r = db_execute(s, &rs);
  if (r == db_not_connected)
    throw bad_db();
  
  return make_item(rs);
}

Here, get_item() is simpler than before because make_item() handles the case where a resultset is empty. It also asserts for bad queries and throws if the database connection is broken.

Let's look at the calling code:

void f(primary_key id)
{
  try
  {
    item i = get_item(id);

    if (i == item::bad())
      std::cout << "item not found\n";
  }
  catch(bad_db&)
  {
    std::cout << "bad db\n";
  }
}

This cleanly separates the three levels of errors: logic errors are asserted immediately in get_item(), runtime errors are handled in a catch and normal execution (including a non-existing item) is free of error checking.

In this case, the verbosity of the try/catch is still present, but disappears when multiple functions may fail:

void bar(primary_key id)
{
  try
  {
    f(id);
    g(id);
    h(id);
  }
  catch(bad_db&)
  {
    std::cout << "bad db\n";
  }
}

In this case, f() becomes:

void f(primary_key id)
{
  item i = get_item(id);

  if (i == item::bad())
    std::cout << "item not found\n";
}
 

Objects vs. fundamental types

Basically anything that has an accessible copy constructor and destructor can be thrown. This includes fundamental types such as ints and pointers.

Catching integral types

The catch construct was designed with catching by type in mind, not by value. Therefore, it is easy to have two handlers for two different types, but impossible to have two handlers for two different values of an int. In this case, switching on the int would be needed:

try
{
  f();
}
catch(int i)
{
  switch(i)
  {
    case item_not_found:
      // ...

    case bad_db:
      // ...
  }
}

This is basically treating an exception as a return value, but with the added complexity of a catch handler.

Handlers were also designed so that unhandled exceptions are automatically passed to a higher level. If one of the integral values is not handled here, it is silently ignored. To emulate this automatic behavior, a default case needs to be added to rethrow the exception.

This is messy and there is no reason for it, since catching by type eliminates the problems.

Catching pointers

Throwing a pointer to an object makes it possible to have a catch handler for each pointer type:

class item_not_found {};
class bad_db {};

try
{
  f();
}
catch(item_not_found*)
{
}
catch(bad_db*)
{
}

The problem here is what to do with the pointer. If the throwing code allocated the object on the heap, the catch handler needs to delete it. However, if the object is static, it must not be deleted. If the object was local, this is undefined behavior.

Catching objects

Throwing an object fixes all the previous problems. This is not particularly surprising since the try/catch construct was designed for throwing objects.

Although objects may be caught by value, they are usually caught by reference. Apart from the (usually negligible) performance hit of copying an object, catching by value will slice the object.


Hierarchies vs. embedded values

C++ defines a hierarchy of exceptions, all derived from std::exception. Some exceptions derive from std::logic_error or std::runtime_error. I find such a hierarchy to have little to no value in a real world scenario.

Some frameworks also define exception types that embed specific values, such as:

class network_exception
{
public:
  int specific_error() const;
};

where specific_error() could return bad_socket or host_not_found. Some frameworks also embed some sort of string object with a more descriptive message.

Hierarchies

Exception hierarchies look tempting but don't solve any real problems. In the case of the standard hierarchy, I cannot imagine a scenario where one part of the code deals in a different manner with std::length_error and std::out_of_range, whereas another part deals with the base class std::logic_error only.

An exception represents a generic problem within a module that can be dealt with in the same way regardless of the source of failure. That is, if an application both accesses a web service and a server-based database, both should throw a network_exception in case of a problem when accessing the resources. Code that deals with the web service knows it needs to reconnect to the web site; code that deals with the database knows it needs to reconnect to the database.

In a scenario where both the web service and database are used together, the failure of one might affect the other, in which case the distinction about what part caused the failure is irrelevant.

If a distinction needs to be made between a web_service_network_exception and a database_network_exception, then these two types need to exist. It might be tempting then to create a network_exception base class, but this does not solve any problem: if the network_exception can be used by itself, the derived classes do not need to exist. If the derived exceptions need to be handled separately, the base class is not needed.

Therefore, if specific exception types are needed, it is because a single class does not convey enough information to handle the error correctly. If it does, specific types are not needed.

Finally, a case might be made for a single base class (such as std::exception) for all the possible exceptions. But again, this does not help in any way with handling errors. An std::exception says nothing about the kind of error and the handler cannot take any action. The only reason for such a base class would be to either:
  1. prevent from throwing anything from a destructor or a function such as a thread entry point;
  2. display an error message from an unhandled exception if a virtual what()-like function exists in the hierarchy.
The first case is easily handled with a catch(...) construct. It is also safer since it might be possible for an exception not derived from the base class to be thrown.

The second case is dubious at best. As mentioned below, embedding strings in an exception is not a good idea nor does it play nice with internationalization and therefore should not be used to convey any meaningful information to the user. If more specific information is needed from the source of the error, logging it is better. If a message needs to be shown to the user, return values are better. The thread in which the exception was thrown might not have access to the user interface, in which case storing a return value for later use is the only solution.

Therefore, I cannot find any scenario in which a hierarchy is needed. My exceptions do not derive from a base class.

Embedding values

As mentioned previously, throwing integral values is messy because catch handlers cannot make the difference between one int and the other. A handler needs to switch on the value to take the appropriate action. Embedding a integral value in an exception amounts to the same thing.

Embedding a string in a exception is tempting. It allows a simplified form of logging where a deep function call may throw a generic exception with a customized error message. That message can then be displayed to the user or added to a log file.

Using any kind of string class that allocates memory to store the message is dangerous since the allocation might fail. If this happens while the exception is being constructed, a different exception might be thrown (such as std::bad_alloc for std::string). If this happens inside the copy constructor (invoked when the temporary exception object is initialized from the operand of the throw expression), std::terminate() is called.

An integral identifier for the string could probably work, allowing the handler to lookup the appropriate error message for it. However, this assumes that the throwing code has all the necessary information to construct a meaningful error message. For example, a problem arising from a call to ::recv() deep inside the network module might not know the reason for which it was called. Displaying "cannot receive from socket" to the user is not helpful, nor is "The connection to the web service is broken" if the socket call originated from the database module.

One way of dealing with memory allocation for strings is to define a virtual what() that returns a static string. This is the approach taken by most implementations of std::exception. However, this needs a specific exception type for every possible error message unless the exception keeps a pointer to the static string it was created from, which would scatter these strings all over the code. It also doesn't solve the problem of not having enough information at the point where the exception is thrown nor of localizing the string in different languages.

Therefore, I find that there is little to no advantage to bundle any sort of value with an exception. They are often thrown from a place where the context is unknown. In this case, the specific error (such as "::recv() failed") should be logged instead of propagated.


The finally construct

The finally construct is used to execute code regardless of whether an exception was thrown or not:

void f()
{
  acquire_resource();

  try
  {
    use_resource();
  }
  catch(exception&)
  {
    show_error();
  }
  finally
  {
    release_resource();
  }
}

This is a variation on the C-style error handling:

void f()
{
  acquire_resource();
  
  if (!use_resource())
    show_error();

  release_resource();
}

The finally construct implies that some code should be executed both in the normal and the exceptional paths. This is a fallacy. No code can be present in both situations. If this seems to be the case, the design is broken.

In the previous example, finally is needed because two execution paths are mixed: resource management and resource usage. If these two paths are separated, then it becomes apparent that the normal execution for the former is acquire/release, while the latter is use. An error in the use path has no relation with the acquire/release one.

This translates directly into C++ with RAII, resource acquisition is initialization. By moving the resource management to a class, the two execution paths are cleanly separated:

class resource
{
public:
  resource()
  {
    acquire_resource();
  }

  ~resource()
  {
    release_resource();
  }
};

void f()
{
  try
  {
    resource r;
    use_resource();
  }
  catch(exception&)
  {
    show_error();
  }
}

There are two common complaints about RAII:
  • When using a library that does not implement RAII (such as a C library), this requires wrapping every kind of resource into a class.
  • The "resource" might not be a resource at all. It might be one or more member variables that need to be reset. Wrapping this into a class is too verbose.
I do not find the first point to be particularly problematic in real life. A common example is the Windows API, where COM objects require only a Release() call, while some GDI types require different deletion functions, such as DeleteObject() or DeleteDC(). Wrapping the resources used in a single program is not an issue nor is it time-consuming. I find the excuse of "writing correct code takes too much time" to be somewhat weak.

The second point is understandable. However, I find code that needs to change the internal state in such a way that recovery is complicated often benefits from refactoring. If this is not the case, then this is usually a sign that a particular code segment needs to remember a current state, do operations and then restore the original state. Moving the state-management code to a separate class is usually better, especially if this scheme is used in more than one function.

Finally, there is the degenerate case of needing to set a single flag at the beginning of a function and reset it at the end, regardless of exceptions. This is the only situation in which a finally construct may be useful since other constructs may be too verbose for such a simple operation. However, a simple generic class does the trick:

template <class T>
class setter
{
public:
  setter(T& t, const T& initial, const T& rollback)
    : t_(t),
      rollback_(rollback)
  {
    t_ = initial;
  }

  ~setter()
  {
    t_ = rollback_;
  }

private:
  T& t_;
  T rollback_;
};

void f(bool& executing)
{
  setter<bool> s(executing, true, false);
  // do something
}

Conclusion

Assertions, return values and exceptions provide a way of signaling logic errors (coding mistakes), runtime errors (part of the normal execution flow) and exceptional cases (that break the normal execution flow). Using them together allows the normal execution path to be free of error-checking while still being able to distinguish between the different outcomes of an operation. Exceptional cases are propagated until a function knows how to deal with them.

Creating a hierarchy of exceptions does not solve any real problem: catching the base class does not give enough information about the error, while catching the derived classes eliminate the need for a base class. Having a single base class for all exceptions is also irrelevant: a catch(...) is better to swallow any exception and expecting a member function to return meaningful information is invalid in most cases. The point where the exception was thrown might not know enough about the context. Embedding values such as an integer to distinguish between different types of errors is invalid for the same reason.

Embedding a memory-allocating string is dangerous because it might throw an exception of its own. Static strings might work but are difficult to internationalize.

The finally construct is usually better implemented with a simple RAII class or a class that can remember and reset a current state if this state is too complex.

As is usual for all types of advice, there are no never or always, except for "one size never fits all" and "always be skeptical". YMMV. These are personal conclusions based on my experience.

[update 12-mar-2010: following a discussion on c.l.c++.m, I am adding several points:
  • A specific example in favor of hierarchies was given where a derived class represents a more specific problem than the base class, such as a retriable_network_exception derived from network_exception. If one module never retries (regardless of whether it is possible or not) while another module always retries (if possible), then the former catches a network_exception while the latter catches both.

  • I've said that using embedded values in exceptions is rarely useful (even for debugging) since the point of failure might not know enough to construct a good error message. I still stand by that statement for messages that are displayed to a user. However, if a kind of stack tracing can be added to an exception (à la Java), then it does make for a useful debugging tool. It does not eliminate the need for logging (some errors might happen for which no exceptions are thrown), but it might be a good complement.

    The main problems with this is that:
    • not all (few?) exceptions require a stack trace; most are used merely to signal some kind of unsurprising failure, such as getting disconnected from a server-based database.

    • a good stack trace is highly platform-dependent. GNU libc has several backtracing functions that can be used but these are not standard nor portable. Rolling your own using __LINE__ and __FILE__ is not very friendly (these are the only two standard context information macros, others like __FUNCTION__ are platform-specific) or easy (you'd need try/catch constructs in every function, some of them not necessarily being under your control).

    • you might still need logging for non-exceptional cases (these are useful too!)

  • I've said that embedding an integral value representing a string "could probably work, allowing the handler to lookup the appropriate error message for it". This is not possible if the exception is thrown from a library.

  • Some people use either an assert while debugging and throw an exception in production. I find this dangerous since the behavior of the program is radically different depending on compilation settings. Others do the opposite for unit testing, since the exception can be handled to verify that a particular code segment reports an error correctly. Asserting in this case makes automated unit testing more difficult.

  • There are cases where a program runs a series of independent jobs that might fail for a variety of reasons. If this program needs to continue running whether a job fails or not (possibly restarting the failing ones), then asserting from the job is not the way to go. However, from the point of view of the system, I'd consider these errors runtime errors, not logic errors, and therefore would need to use exceptions instead of assertions.

  • Finally, I mention the standard assert() macro several time in this article as a way of aborting a program and, ideally, breaking into a debugger or generating a core dump. The specific way used to abort the program is irrelevant. As a matter of fact, I usually roll my own assert-like function to force the program to break immediately. On Windows, for example, assert() pops up a window, which may end up giving a stack overflow (or other problems) because the message pump is still running. I prefer using DebugBreak() instead.
]

No comments: