Porting to the Solaris OS

Learn about issues and tips for porting C and C++ applications to the Solaris OS.

This article discusses porting C and C++ applications to the Solaris OS. While it generally addresses porting from any OS/hardware platform to the Solaris OS (on either SPARC or x86), the most commonly expected cases are ports from Linux/x86 to Solaris/x86 and from Solaris/SPARC to Solaris/x86.

Porting an application deals with differences caused by a change in these factors:

When porting from Linux or other Unix platforms, the interfaces will be nearly the same, although there are still some minor differences to accommodate.

Before actually porting any source code, the first step is to locate the correct version of all of the necessary third-party utilities and libraries needed to build the application. Common examples are:

Many of the open-source utilities used on other platforms are already integrated in Solaris in /bin or available in /sfw. Many others are available in source or executable form, for example, from these sources:

The build environment needs to be modified for the different location of include files, libraries, and user-level commands between the platforms. For Solaris, $PATH should likely include:

For historical reasons, the Solaris OS also provides include files and runtime libraries compatible with SunOS 4.X, in /usr/ucbinclude and /usr/ucblib. For new ports to the Solaris OS, these ucb functions should be avoided in preference to the normal system routines.

Other small interface differences will be obvious during compiling and linking, which makes them comparatively easy to find (and fix). A few examples are:

Other differences, however, might not be obvious until execution time. For example, by default the Solaris OS does not restart interrupted system calls. To get this behavior, instead of registering a signal handler with signal, use sigaction instead, as shown here:

struct sigaction act;
    
act.sa_handler = signal_handler_function;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_RESTART;
if (sigaction (signo, &act, 0) == -1)
    perror("sigaction);

Another difference, from Linux for example, is that the various implementations of malloc on Solaris do not return heap space to the OS upon a call to free (until process termination). If an application really needs to immediately return that allocated space, it should use mmap and munmap instead.

Any code using /proc needs to be changed because Linux implements /proc structures as text files, whereas Solaris implements binary files. Porting this code usually means finding the appropriate data in the Solaris /proc structures and then changing the access to read binary data rather than parsing text data.

While not strictly a difference between platforms, it's not uncommon for a platform port to uncover latent bugs in the application -- for example a bad pointer or buffer overrun corrupting the heap. These problems might require manual investigation using a debugger, however the Solaris OS also provides libumem, which is an alternate heap implementation that provides the option of runtime consistency checks. For a more complete description and an example of using libumem to uncover a heap problem, see Identifying Memory Management Bugs Within Applications using the libumem Library.

Before considering the details of compiler differences, you must first determine which compilers to use. For C, you can mix object code from compilers from different vendors. However C++ compilers do not implement identical ABIs, so the entire application should be built with the same brand of C++ compiler. For example, if your application depends on a library built with GNU C++, you should build your application with a GNU C++ compiler as well.

For C++, changing to a new compiler can sometimes be as difficult as porting to a new OS. Thus, if you are porting from Linux/GNU, you should carefully consider if you should use the GNU compiler on the Solaris OS as well. Similarly, the Sun Studio compilers are supported on Linux, so it would also be reasonable to use Sun Studio tools on both platforms. If you choose to port from GNU to Sun Studio (for multithreaded debugging, performance analysis, and Sun support), a reasonable strategy would be to port in two steps, first using GNU compilers and then migrating to Sun Studio compilers, rather than tackling both changes at once.

If you are migrating to the Sun Studio compilers, the first visible difference is that they accept different command-line options. This is a minor technical hurdle, but it might require some parameterization of the existing makefiles. Refer to the Compiler Option appendix in the C User's Guide and in the C++ User's Guide for an explanation of the compiler switches in the Sun Studio compilers.

If the application has been previously ported to some alternate operating system, hardware platform, or compiler, then it probably already contains #ifdefs to isolate the port-specific code sequences. This makes porting to the Solaris OS easier for a couple of reasons. First, code that runs on multiple platforms has already been generalized away from platform-specific dependencies. Second, the specific areas that have been previously #ifdef'd have already been identified as the areas most in need of inspection for the Solaris port.

For the common case where an application has been released on Solaris/SPARC and Linux/x86, porting to Solaris/x86 may mostly be an exercise in selecting the correct set of #ifdefs from the existing sequences.

One issue, however, is that existing code might have been written assuming that both sun and solaris imply SPARC and that linux implies x86. Therefore an existing conditional code sequence might look like this:

#ifdef sun
... big-endian sequence
#elif defined(linux)
... little-endian sequence
#endif

Obviously, however, this would be incorrect for Solaris/x86. Instead, there should be distinct discriminants for both hardware architecture and operating system. The most robust solution is for your build environment to define these symbols, because they can be made to work with any compiler on any platform. As a shortcut in the common cases, you could take advantage of symbols defined by both the GNU and Sun Studio compilers. Both sets of compilers provide a common definition of the symbol sun when on Solaris (as compared to linux for Linux). Both define __sparc for the SPARC architecture. And both sets of compilers define the symbols __i386 and __x86_64__ for the 32-bit and 64-bit x86 platforms, respectively.

#ifdef __sparc
... big-endian sequence
#elif defined(__i386) || defined(__x86_64__)
... little-endian sequence
#endif

Current C++ compilers recognize slightly different definitions for the C++ language. They all strive to conform to the ISO C++ standard, but most compilers implement some extensions and the standard allows some features to be implementation defined. When these particular features are used in the source (either purposefully or inadvertently), they must be dealt with during porting.

Later releases of Sun Studio compilers incorporate many of the GNU extensions, so using a newer version of Sun Studio will reduce the number of porting issues. However, some issues must be found and fixed manually.

Most of these language differences cause compiler diagnostics where the error/warning message explains the problem. The code can then be rewritten to match the well-defined areas of the ISO standard. Sometimes, however, the problem is sufficiently subtle that it requires research to understand what is wrong or how to fix it. For those so inclined, a copy of the C++ standard is the definitive source for investigating the language rules. However, an easier and often effective technique is to use an Internet search engine to find discussions about the error message. Finally, if you're not sure if the compiler is correctly accepting or rejecting a particular language construct, you can ask questions at the Sun Developer Network Tools Forum.

When porting C++ using Sun Studio tools, you must decide whether to use the default C++ Standard Library or the newer stlport4 library. The newer library provides better standards conformance and often better performance, but might require source changes (thus adding to your porting effort). For a discussion of the nuances of using and packaging the STLport library, see Using and Redistributing Sun Studio Libraries in an Application.

The most obvious hardware issue is the different instruction set architecture, which requires the conversion of any hand-generated assembly code. While there's no shortcut for removing all of it, there are several important cases where it might make sense to replace, rather than port, the assembly code:

Aside from the work needed to port assembly code to a new ISA, a change in toolset can also cause porting issues. Both Sun Studio and GNU compilers support the single-string asm() statement as specified for ANSI C/C++, and both support separate assembler source files. However, for embedded assembler code using high-level language expressions, the compilers support very different mechanisms.

GNU provides extensions to the asm() statement that specify dataflow and C/C++ parameters for each embedded assembly instruction. These statements are inserted directly into the C or C++ source code where the assembly instruction should be inlined. The Sun Studio compilers provide an equally powerful, though less convenient inline assembly expansion. This mechanism uses assembly templates that are placed in separate il files and passed into the compiler along with any source file that references the assembly template.

Another ISA issue comes from the functional differences in the graphics and vector capabilities of the underlying architecture (for example, SSE on x86 platforms or VIS on SPARC platforms). Generally, these differences should be hidden within libraries (for example, Sun Performance Library).

Finally, when porting from SPARC to x86 platforms, numerical results might differ due to the use of x86 80-bit floating-point calculations. To minimize these differences (and generally speed up computations), use -xarch=sse2 to use the 64-bit SSE registers and instructions.

Byte order or endian-ness refers to the ordering of individual bytes within multibyte data. This can be a porting issue when data is reinterpreted as a different type within a single process (for example, if an array of char is interpreted as type long) or between multiple processes when the memory image of a multibyte datatype is written directly to a file or socket.

The two byte orderings are big-endian (for example SPARC and PowerPC platforms, where the most significant byte of an integer is stored in the lowest address), and little-endian (for example, x86/x64 platforms, where the least significant byte of an integer is stored in the lowest addressed location). This issue is usually solved by changing the application to always use a consistent byte ordering for shared multibyte data.

Another processor-related issue is data alignment, where multibyte data needs to be stored on some minimally aligned address. As a general rule, SPARC platforms require data alignment to equal data size: two-byte types must be on a two-byte boundary, and four-byte types must be on a four-byte boundary. On other processors like the x86/x64, the hardware handles misaligned data, but there might be a performance penalty.

The compilers automatically arrange for data to be appropriately aligned, by adding padding within structs and before static data locations, and by the initial location of the stack and heap. However, alignment problems can be introduced via casting, for example, by taking the address of an arbitrary element of a char array and interpreting it as the address of an element of type double.

The highest performance and most general solution is to change the logic of the application to maintain correct alignment. However, on SPARC platforms it might be reasonable (at least during the initial stages of the port) to have the Sun Studio compilers convert multibyte loads and stores into a sequence of smaller accesses via the -xmemalign option. For example, use -xmemalign=2i to generate loads/stores no larger than two-byte (with the assumption of no more than two-byte alignment).