(Apologies to readers that are not heavily steeped in Star Wars trivia.)
As with the old joke about duct tape, CPP is like The Force in that it has a light side and a dark side, and it holds the universe (application) together. The vast majority of large Fortran applications rely on some type of conditional compilation for specifying various configuration aspects. And some, such as GISS modelE, rely so heavily on conditional compilation that it can be said to be part of the overall software architecture. Conditional compilation, usually via some sort of source-code preprocessing, can yield a wide variety of benefits for scientific models, but there are also significant negative consequences that developers should be aware of. As I have emphasized for many other topics in this series, scientific developers are often so immersed in existing practice that we fail to see the alternatives and more importantly the consequences of our implicit choice to continue such practices.
By far the most common form of conditional compilation in the Fortran community is the use of the C preprocessor (CPP) which provides nested if-else blocks that effectively mask entire blocks of source-code from the compiler. However, other forms of conditional compilation are also is use such as m4 and sed as well as various Fortran-specific preprocessors such as COCO and Forpedo which are largey unused in our community, presumably for historical reasons. A more subtle form of conditional compilation can appear in build scripts (e.g. Makefile) where some mechanism controls just which files are and are not compiled. GISS ModelE uses a combination of CPP and Makefile controls to support a very large variety of configuration options.
Note that while CPP also provides macro expansion capabilities which have their own upsides and downsides, this article is only addressing the use of CPP for conditional compilation.
Rationale for conditional compilation
Because so many groups have independently decided to incorporate conditional compilation in their software, we are safe in concluding that compelling reasons exist for this choice or at least once did exist. Before moving on to the negative consequences, we should remind ourselves of what these positive motivations are. Here I briefly summarize the rationale for the most frequent uses.
Although Fortran 90 is itself very portable, many software applications must interface with external resources that may vary from platform to platform. For instance many compiler vendors provide optimized FFT and other numerical procedures that have unique interfaces. Fortunately, the growth of standardized interfaces (e.g. LAPACK, MPI, etc.) and publicly available portable packages (e.g. FFTW, ESMF, etc) has significantly eroded the need to customize Fortran applications to run on a given platform. Nonetheless, when external interfaces are nonstandard, conditional compilation is a very defensible mechanism for minimizing complexity in an application. Such issues can generally be isolated in well documented, heavily isolated portions of the software.
Most scientific models are intended to be usable for a variety of scenarios, some of which are significantly more computationally expensive than others. When the differences in the scenarios can be expressed at high-levels in the software, then a simple run-time conditional (i.e. a vanilla Fortran "if" statement), can be very effective. However, when the differences unavoidably appear in inner loops of the implementation, run-time conditionals may incur an unacceptable performance overhead. In these instances, conditional compilation provides much of the same flexibility as the run-time conditional, but with far-superior consequences to performance.
It is worth noting that compilation performance itself can also be improved by means of conditional compilation.
As with performance, some model scenarios may use far more memory than others. In the extreme, there may be combinations of scenarios which use mutually exclusive data structures such that no one scenario needs as much memory as the superset of all possible combinations. Conditional compilation can be used to reduce the memory footprint of an application when it is running a scenario that requires less memory than would otherwise be required to support all scenarios. Although this use of conditional compilation was quite defensible in early versions of Fortran, this practice must now be weighed against using dynamic memory allocation which was introduced in Fortran 90.
Mutually Exclusive Implementation Alternatives
Quite commonly a model may provide alternative implementations for a given piece of functionality and no sensible meaning can be made of using more than one implementation in a given run. For instance an atmospheric model might have 2 or more different dynamical cores that can be used, but only one dynamical core can be used for a given execution of the model. With conditional compilation, all of the alternatives can be implemented with identical interfaces and hide this complexity from other portions of the model. Modern capabilities such as frameworks (ESMF) and object-oriented language features can provide similar reductions in complexity and must be considered when choosing to use conditional compilation for this purpose.
Toggle for debugging/diagnostics
When an application breaks or produces unexpected results, developers often activate machinery which produces additional diagnostic data to help resolve the issue. These extra data would be prohibitively expensive/distracting for an ordinary run, and are deactivated with conditional compilation. This use of conditional compilation is often expected to be temporary and/or only for the primary developer of that section of the code.
As with many good things, conditional compilation can be just fine in moderation. The first bit of CPP that crept into a given model was probably very beneficial and paved the way to additional uses. At some threshold, however, the indirect costs began to compare with the presumed benefits, but these costs were not immediately recognized. It is unsurprising that most teams overshoot a reasonable balance, and turning back the clock is no trivial matter. First let us examine some of the undesirable consequences of conditional compilation in isolation.
Limited code coverage
The primary problem with conditional compilation is the effective volume of "dead code" not seen by the compiler. There are numerous aspects of this fairly obvious consequence. Perhaps most important is the increased difficulty in making correct changes while extending/maintaining an application. If our change violates code that is not being compiled, significant errors will go undetected until someone builds a configuration that uses those blocks of code. Of course code that is not executed also has such risks, so conditional compilation is technically only exacerbating the situation. Developers can choose to live life on the edge and hope that induced problems in dead code are minor, easily fixed, and not traced back to themselves. Alternatively, they can be more cautious at the expense of compiling and running multiple configurations of the model to ensure all blocks of code are compiled and produce expected results. The severity of this problem rises sharply with the volume of "unused" code and especially with the number of independent configurations required to cover all of the source code.
In an ideal world, we could compile all of the source code in one single executable and use different run-time options to ensure that all of the model is tested. This might still be a complex process for full system tests, but it would be much faster than the potentially exponential number of compilations. And with a healthy set of unit-tests, testing all of the components could ultimately be simple and fast. But so long as conditional compilation is heavily used it is very difficult to approach this ideal in a systematic fashion.
The occasional short CPP conditional block does not generally impair understanding of a section of code. However when the block becomes large - spanning more than one screen length and/or the conditionals become nested, code can become every bit as difficult to follow as traditional spaghetti code. In extreme cases, it can take some serious concentration to even determine if a given line of code is executed for a given set of compilation settings. This problem is in principle not much more severe than the analogous problems of long procedures and deeply nested conditionals. However the usual rules for indentation of CPP conditionals do not even provide the usual visual cue's that help us follow an algorithm.
DRY (Don't Repeat Yourself)
Although code duplication is not a direct consequence of using CPP, many developers seem especially prone to this problem when working with dense CPP conditionals. Duplicated logic whether conventional Fortran or CPP is an unnecessary maintenance burden and typically makes code harder to follow than if the duplicated logic is properly encapsulated.
If we want to reduce our reliance upon conditional compilation, are there steps we can take? Certainly. Although complete elimination would be very difficult and unwarranted, persistent attention to this problem can bring a model back to sensible levels of conditional compilation. Many of the same techniques that can be used to reduce complexity in standard Fortran can be applied to preprocessors as well. How any specific usage should be addressed, largely hinges upon its underlying rationale.
As mentioned above, there is sometimes no other choice than to use conditional compilation to deal with different computing environments. But even then, proper engineering of interfaces should enable restricting conditional compilation to very isolated (and well documented!) sections of the code. Developers should look for every opportunity to use standardized interfaces and portable 3rd party libraries. When such are not available, developers should attempt to isolate the nonstandard functionality in as few interfaces as possible. Conditional compilation will then be restricted to just those interfaces. If those interfaces are then implemented for each environment in a separate set of files, the makefile itself can be used to cleanly manage the conditional compilation and preprocessors disappear.
Note that we are generally not concerned about code-coverage when talking about portability.
When performance matters, there may well not be a good alternative to conditional compilation. Certainly developers should check to be certain that the performance really is impacted by other options rather than just assuming that the performance consequences are unacceptable. One strategy to reduce conditional compilation used for performance purposes is to pull the conditional up to a higher level by introducing duplication. E.g. a triply-nested loop with a CPP conditional inside could be written as two triply nested loops with the conditional controlling which loop nest is activated. At some level, the performance cost of switching to use of a run-time conditional (i.e. a standard Fortran "if-else" block) becomes negligible and the CPP can be eliminated. This approach works best when there are really only two configurations in the critical section of code, and the duplicated section is relatively short. Developers will need to weigh the costs of maintaining duplicate code sections against the advantages of simpler logic and better code coverage.
Of course the large memories on modern computers often make a mockery of concerns from earlier times and the rationale for conditional compilation to conserve memory may evaporate on its own. If not, propagating dynamic allocation throughout an entire model is the recommended approach, though it may require a significant effort. In our community, domain decomposition for parallel computing has largely already forced the use of dynamic allocation anyway.
At this time a number of mechanisms are available to developers to avoid the use of conditional compilation to manage configurations. One challenge may be to deal with namespace collisions when multiple options use the same name for analogous procedures and/or modules. If the configurations can be encapsulated behind a small number of well-defined interfaces then run-time conditionals in the driver can be quite effective. If the interfaces are more complex, then frameworks (e.g. ESMF) can enable seamless run-time configuration. And, of course, the emergence of object-oriented features in Fortran 2003 now allows a simple mechanism to hide configuration logic from other portions of the model.
At the very least, this use of conditional compilation should be supplemented by moving the diagnostic logic into a subroutine. Then CPP can be used to control whether that subroutine has any content. Even then, though, developers should consider the use of a runtime diagnostic such that the diagnostic logic is maintained when referenced data structures are modified by other developers.
Although conditional compilation is here to stay, I believe that a concerted effort to limit the use is an essential element of a long-term strategy to produce maintainable well-tested software.