수치 해석 프로그램의 디버깅

참고

번역중

이 단원은 GSL을 사용한 프로그램의 디버깅에 사용할 수 있는 몇가지 조언과 우회법을 소개합니다.

GDB 사용

라이브러리에서 생기는 모든 오류는 gsl_error() 함수로 전달됩니다. GDB를 사용한 디버그 모드에서 프로그램을 구동시키고 이 함수에 중단점을 지정핻면 라이브러리에서 생기는 모든 오류를 감지할 수 있습니다. 다음의 내용을 프로그램이 있는 디렉토리의 파일 .gdbinit 에 넣어서 모든 세션에 중단점을 넣을 수 있습니다.

You can add a breakpoint for every session by putting

break gsl_error

into your .gdbinit file in the directory where your program is started.

중단점이 오류를 감지하면 backtrace( bt )를 사용해 호출-트리와 오류를 유발한 인자를 확인할 수 있습니다. If the breakpoint catches an error then you can use a backtrace (bt) to see the call-tree, and the arguments which possibly caused the error. By moving up into the calling function you can investigate the values of variables at that point. Here is an example from the program fft/test_trap, which contains the following line:

status = gsl_fft_complex_wavetable_alloc (0, &complex_wavetable);

The function gsl_fft_complex_wavetable_alloc() takes the length of an FFT as its first argument. When this line is executed an error will be generated because the length of an FFT is not allowed to be zero.

To debug this problem we start gdb, using the file .gdbinit to define a breakpoint in gsl_error():

$ gdb test_trap

GDB is free software and you are welcome to distribute copies
of it under certain conditions; type "show copying" to see
the conditions.  There is absolutely no warranty for GDB;
type "show warranty" for details.  GDB 4.16 (i586-debian-linux),
Copyright 1996 Free Software Foundation, Inc.

Breakpoint 1 at 0x8050b1e: file error.c, line 14.

When we run the program this breakpoint catches the error and shows the reason for it:

(gdb) run
Starting program: test_trap

Breakpoint 1, gsl_error (reason=0x8052b0d
    "length n must be positive integer",
    file=0x8052b04 "c_init.c", line=108, gsl_errno=1)
    at error.c:14
14        if (gsl_error_handler)

The first argument of gsl_error() is always a string describing the error. Now we can look at the backtrace to see what caused the problem:

(gdb) bt
#0  gsl_error (reason=0x8052b0d
    "length n must be positive integer",
    file=0x8052b04 "c_init.c", line=108, gsl_errno=1)
    at error.c:14
#1  0x8049376 in gsl_fft_complex_wavetable_alloc (n=0,
    wavetable=0xbffff778) at c_init.c:108
#2  0x8048a00 in main (argc=1, argv=0xbffff9bc)
    at test_trap.c:94
#3  0x80488be in ___crt_dummy__ ()

We can see that the error was generated in the function gsl_fft_complex_wavetable_alloc() when it was called with an argument of n = 0. The original call came from line 94 in the file test_trap.c.

By moving up to the level of the original call we can find the line that caused the error:

(gdb) up
#1  0x8049376 in gsl_fft_complex_wavetable_alloc (n=0,
    wavetable=0xbffff778) at c_init.c:108
108   GSL_ERROR ("length n must be positive integer", GSL_EDOM);
(gdb) up
#2  0x8048a00 in main (argc=1, argv=0xbffff9bc)
  at test_trap.c:94
94    status = gsl_fft_complex_wavetable_alloc (0,
        &complex_wavetable);

Thus we have found the line that caused the problem. From this point we could also print out the values of other variables such as complex_wavetable.

Examining floating point registers

The contents of floating point registers can be examined using the command info float (on supported platforms):

(gdb) info float
     st0: 0xc4018b895aa17a945000  Valid Normal -7.838871e+308
     st1: 0x3ff9ea3f50e4d7275000  Valid Normal 0.0285946
     st2: 0x3fe790c64ce27dad4800  Valid Normal 6.7415931e-08
     st3: 0x3ffaa3ef0df6607d7800  Spec  Normal 0.0400229
     st4: 0x3c028000000000000000  Valid Normal 4.4501477e-308
     st5: 0x3ffef5412c22219d9000  Zero  Normal 0.9580257
     st6: 0x3fff8000000000000000  Valid Normal 1
     st7: 0xc4028b65a1f6d243c800  Valid Normal -1.566206e+309
   fctrl: 0x0272 53 bit; NEAR; mask DENOR UNDER LOS;
   fstat: 0xb9ba flags 0001; top 7; excep DENOR OVERF UNDER LOS
    ftag: 0x3fff
     fip: 0x08048b5c
     fcs: 0x051a0023
  fopoff: 0x08086820
  fopsel: 0x002b

Individual registers can be examined using the variables $reg, where reg is the register name:

(gdb) p $st1
$1 = 0.02859464454261210347719

Handling floating point exceptions

It is possible to stop the program whenever a SIGFPE floating point exception occurs. This can be useful for finding the cause of an unexpected infinity or NaN. The current handler settings can be shown with the command info signal SIGFPE:

(gdb) info signal SIGFPE
Signal  Stop  Print  Pass to program Description
SIGFPE  Yes   Yes    Yes             Arithmetic exception

Unless the program uses a signal handler the default setting should be changed so that SIGFPE is not passed to the program, as this would cause it to exit. The command handle SIGFPE stop nopass prevents this:

(gdb) handle SIGFPE stop nopass
Signal  Stop  Print  Pass to program Description
SIGFPE  Yes   Yes    No              Arithmetic exception

Depending on the platform it may be necessary to instruct the kernel to generate signals for floating point exceptions. For programs using GSL this can be achieved using the GSL_IEEE_MODE environment variable in conjunction with the function gsl_ieee_env_setup() as described in IEEE 부동 소수점 대수:

(gdb) set env GSL_IEEE_MODE=double-precision

GCC warning options for numerical programs

Writing reliable numerical programs in C requires great care. The following GCC warning options are recommended when compiling numerical programs:

gcc -ansi -pedantic -Werror -Wall -W
  -Wmissing-prototypes -Wstrict-prototypes
  -Wconversion -Wshadow -Wpointer-arith
  -Wcast-qual -Wcast-align
  -Wwrite-strings -Wnested-externs
  -fshort-enums -fno-common -Dinline= -g -O2

For details of each option consult the manual Using and Porting GCC. The following table gives a brief explanation of what types of errors these options catch.

-ansi -pedantic

Use ANSI C, and reject any non-ANSI extensions. These flags help in writing portable programs that will compile on other systems.

-Werror

Consider warnings to be errors, so that compilation stops. This prevents warnings from scrolling off the top of the screen and being lost. You won’t be able to compile the program until it is completely warning-free.

-Wall

This turns on a set of warnings for common programming problems. You need -Wall, but it is not enough on its own.

-O2

Turn on optimization. The warnings for uninitialized variables in -Wall rely on the optimizer to analyze the code. If there is no optimization then these warnings aren’t generated.

-W

This turns on some extra warnings not included in -Wall, such as missing return values and comparisons between signed and unsigned integers.

-Wmissing-prototypes -Wstrict-prototypes

Warn if there are any missing or inconsistent prototypes. Without prototypes it is harder to detect problems with incorrect arguments.

-Wconversion

The main use of this option is to warn about conversions from signed to unsigned integers. For example, unsigned int x = -1. If you need to perform such a conversion you can use an explicit cast.

-Wshadow

This warns whenever a local variable shadows another local variable. If two variables have the same name then it is a potential source of confusion.

-Wpointer-arith -Wcast-qual -Wcast-align

These options warn if you try to do pointer arithmetic for types which don’t have a size, such as void, if you remove a const cast from a pointer, or if you cast a pointer to a type which has a different size, causing an invalid alignment.

-Wwrite-strings

This option gives string constants a const qualifier so that it will be a compile-time error to attempt to overwrite them.

-fshort-enums

This option makes the type of enum as short as possible. Normally this makes an enum different from an int. Consequently any attempts to assign a pointer-to-int to a pointer-to-enum will generate a cast-alignment warning.

-fno-common

This option prevents global variables being simultaneously defined in different object files (you get an error at link time). Such a variable should be defined in one file and referred to in other files with an extern declaration.

-Wnested-externs

This warns if an extern declaration is encountered within a function.

-Dinline=

The inline keyword is not part of ANSI C. Thus if you want to use -ansi with a program which uses inline functions you can use this preprocessor definition to remove the inline keywords.

-g

It always makes sense to put debugging symbols in the executable so that you can debug it using gdb. The only effect of debugging symbols is to increase the size of the file, and you can use the strip command to remove them later if necessary.

References and Further Reading

The following books are essential reading for anyone writing and debugging numerical programs with gcc and gdb.

  • R.M. Stallman, Using and Porting GNU CC, Free Software Foundation, ISBN 1882114388

  • R.M. Stallman, R.H. Pesch, Debugging with GDB: The GNU Source-Level Debugger, Free Software Foundation, ISBN 1882114779

For a tutorial introduction to the GNU C Compiler and related programs, see