Continuing on from my last post about fixing some issues related to MySQL and Python on Solaris, I came across another issue this morning which also necessitated digging out a copy of good ole gdb. The basic issue was that CVSGraph segfaulted every time I attempted to generate a revision graph. This was reproducible every time, no matter what the input. Turning the verbosity to the maximum allowed level did not produce anything useful.
First thing I did was download a copy of gdb, and did a configure/make/make install. I initially thought that if I could get cvsgraph to produce a core dump on exit, I would be able to examine it within gdb and get some clues about the cause. However, I couldn’t get CVSGraph to automatically core dump, even after setting core policy using coreadm
, as shown here.
UPDATE: I found the missing piece of the puzzle – the maximum core dump size had not been set via ulimit
. Setting this enabled automatic core dumping.
The next step was to actually load cvsgraph into gdb and run a test session inside the debugger. After a couple of runs, I had isolated the problem to a specific routine. I set a breakpoint and ran through the test case again:
First, I set up the command line arguments:
(gdb) set args -c /usr/local/viewcvs-1.0-dev/cvsgraph.conf -r /usr/cvsroot cobra/build.xml,v
Then set a breakpoint at the relevant location:
(gdb) break cvsgraph.c:1092
Breakpoint 1 at 0x1311c: file cvsgraph.c, line 1092.
Then kick off the target program:
(gdb) run
Starting program: /root/cvsgraph-1.5.1/cvsgraph -c /usr/local/viewcvs-1.0-dev/cvsgraph.conf -r /usr/cvsroot cobra/build.xml,v
Once gdb hits the breakpoint, it stops and waits for instructions:
Breakpoint 1, expand_string (s=0x50d51 “d”, rcs=0x54450, r=0x545b8, rev=0x50770, prev=0x0, tag=0x0) at cvsgraph.c:1092
1092 t = mktime(&tm);
I manually stepped forward a couple of times until I hit the problem:
(gdb) n
1094 if(env)
(gdb)
1095 setenv(“TZ”, env, 1);
(gdb)Program received signal SIGSEGV, Segmentation fault.
0xff3a0510 in memcpy () from /usr/platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1
Now we can do a stack backtrace to see where we were at the time:
(gdb) bt
#0 0xff3a0510 in memcpy () from /usr/platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1
#1 0x0001ed6c in setenv (name=0x1f4e0 “TZ”, value=0xffbfffac “GB”, replace=1) at ../../../libiberty/setenv.c:156
#2 0x00013140 in expand_string (s=0x50d51 “d”, rcs=0x54450, r=0x545b8, rev=0x50770, prev=0x0, tag=0x0) at cvsgraph.c:1095
#3 0x00016a28 in make_layout (rcs=0x54450) at cvsgraph.c:2937
#4 0x00019ecc in main (argc=327696, argv=0x50010) at cvsgraph.c:3879
So know we have isolated the problem down to setenv(), which is implemented in the GNU libiberty adapter library. I exited gdb and wrote a simple test case based on what CVSGraph was doing at the point in question, and found that the problem can be easily reproduced by calling putenv()
to create an environment variable, and then immediately calling setenv()
to reset the value. This may be due to a bug in the libiberty putenv implementation.
In reality, we dont really need the call to putenv()
here – its actually redundant, as setenv()
will allocate space for the new variable if necessary. So I simply commented out the offending line, remade CVSGraph, and now we have the (very useful IMHO) graphical branching display from ViewCVS.