Debugging with the PTrace(1) Utility
Jim Blakey
Introduction
Ok, we've all been there. You arrive at a client site, expected to
solve strange and wonderful problems, some of which have been lingering
unsolved for years. Usually there are no tools to use, no debugger,
maybe even no original source code. What do you do?
Well, that's why we get paid the big bucks, right?
In this whitepaper I will discuss two little known tools that will allow
you access into the inner workings of processes on unix systems.
These tools are the ptrace system calls and the /proc filesystem.
Did you
ever wonder how debuggers work? How they gain complete control
over the process they are debugging? Well, they all use some combination
of ptrace and the /proc filesystem. The following examples will give you
the basics to write your own quick and dirty 'debugger' application,
specifically targeted to solving the problem at hand.
Ptrace is basically a kernel hook into the task dispatch logic.
It provides a mechanism by which you can write a program that can
attach to a target process, breakpoint it, single step it, and
have access to its entire address space, stack, and registers.
With this, you will be able have your program trace execution of a target
process, monitor 'watch points', and check for various program conditions,
even when the client is too cheap to spring for a license for the debugger
package.
The /proc filesystem is a pseudo-filesystem that is maintained by
the kernel. It provides a direct interface to many kernel data structures,
as well as most process data structures. It also will allow you read (and
sometimes write) access to a process virtual address space.
Ptrace and the /proc file system exist on most Unix systems. Most
debuggers are based on calls to ptrace. The exception to this is Solaris.
Whereas ptrace exists on Solaris, it is poorly implemented and has
terrible documentation. For some unknown reason, Sun decided that they
would implement debugging through extension of the /proc filesystem.
I'll cover both approaches here.
This is not a tutorial. Although the man pages for ptrace are usually poorly
written (they are written for debugger writers, who usually know this stuff
already), they provide a good reference point. I'll provide some interesting
examples of code, and some basic descriptions of how it works. The rest is
up to you. Again, that's why we get paid the big bucks. But the hope here
is that this will provide you with some tools that will help you gain
information to solve those 'unsolveable' problems.
ptrace(1) And How To Use It
Example Problem 1: The mysterious infinite loop
Assume we have a client application that goes into an infinite loop.
We have no further information, and we have no access to any debuggers.
All we know is that the application locks up tight, the system bogs
down, and the CPU usage pegs out near 100%. Through the 'ps' command,
we can identify which process is ringing up the CPU time, but that's
about all the info we have.
What we would really like to do is to be able to stop the process
and see exactly where the program counter is, and what its stack context
looks like. With a debugger this is trivial. Without one, it is a challenge.
Once we have the address we're executing, then we can use a link map (or
the 'nm' utility) to find the name of the routine being executed. A little
more math, and we get the exact offset into the routine. Finally, if we have
source, we can narrow it down to the lines.
The following program uses the ptrace system calls to attach to the
supplied target PID. The act of attaching effectively breakpoints the
target process. Since we know the target is in an infinite loop, we'll
fetch the current Instruction Pointer (IP) and Stack Pointer (SP) and
then single step the process. As we single step the process, we'll record
each new IP and SP, until we come back to the original point we started
from. This will give us one complete iteration of the offending infinite
loop.
/* ******************************************************************
** pt.c
**
** This program is an example of how to use the ptrace(2) feature
** of unix. ptrace provides a means by which a 'debug' process may
** observe and control the execution of a target process. It provides
** mechanisms to examine and change the target core image, registers
** and flow of execution.
**
** This example will attach to a currently running process and
** put it in single step mode (x86 supports this). From then on,
** each instruction the target executes will cause a breakpoint trap
** in the debugging process (received through the wait(2) call). The
** debugging process will read the target's current instruction and
** stack pointers and write them to stdout
*
** For this example, we're expecting the target process to be in an
** infinite loop. We want to trace exactly one iteration of this
** loop for later analysis.
**
** To run this program, invoke it with the PID of the target process
** as the first argument. Output is to a stdout
**
** LINUX x86 SPECIFIC VERSION. Other Unix systems will be similar
**
** jdblakey@innovative-as.com
**
** ******************************************************************
*/
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <wait.h>
#include <sys/ptrace.h>
#include <sys/reg.h>
#include <sys/user.h>
#include <sys/signal.h>
#define M_OFFSETOF(STRUCT, ELEMENT) \
(unsigned int) &((STRUCT *)NULL)->ELEMENT;
#define D_LINUXNONUSRCONTEXT 0x40000000
int main (int argc, char *argv[])
{
int Tpid, stat, res;
int signo;
int ip, sp;
int ipoffs, spoffs;
int initialSP = -1;
int initialIP = -1;
struct user u_area;
/*
** This program is started with the PID of the target process.
*/
if (argv[1] == NULL) {
printf("Need pid of traced process\n");
printf("Usage: pt pid \n");
exit(1);
}
Tpid = strtoul(argv[1], NULL, 10);
printf("Tracing pid %d \n",Tpid );
/*
** Get the offset into the user area of the IP and SP registers. We'll
** need this later.
*/
ipoffs = M_OFFSETOF(struct user, regs.eip);
spoffs = M_OFFSETOF(struct user, regs.esp);
/*
** Attach to the process. This will cause the target process to become
** the child of this process. The target will be sent a SIGSTOP. call
** wait(2) after this to detect the child state change. We're expecting
** the new child state to be STOPPED
*/
printf("Attaching to process %d\n",Tpid);
if ((ptrace(PTRACE_ATTACH, Tpid, 0, 0)) != 0) {;
printf("Attach result %d\n",res);
}
res = waitpid(Tpid, &stat, WUNTRACED);
if ((res != Tpid) || !(WIFSTOPPED(stat)) ) {
printf("Unexpected wait result res %d stat %x\n",res,stat);
exit(1);
}
printf("Wait result stat %x pid %d\n",stat, res);
stat = 0;
signo = 0;
/*
** Loop now, tracing the child
*/
while (1) {
/*
** Ok, now we will continue the child, but set the single step bit in
** the psw. This will cause the child to exeute just one instruction and
** trap us again. The wait(2) catches the trap.
*/
if ((res = ptrace(PTRACE_SINGLESTEP, Tpid, 0, signo)) < 0) {
perror("Ptrace singlestep error");
exit(1);
}
res = wait(&stat);
/*
** The previous call to wait(2) returned the child's signal number.
** If this is a SIGTRAP, then we set it to zero (this does not get
** passed on to the child when we PTRACE_CONT or PTRACE_SINGLESTEP
** it). If it is the SIGHUP, then PTRACE_CONT the child and we
** can exit.
*/
if ((signo = WSTOPSIG(stat)) == SIGTRAP) {
signo = 0;
}
if ((signo == SIGHUP) || (signo == SIGINT)) {
ptrace(PTRACE_CONT, Tpid, 0, signo);
printf("Child took a SIGHUP or SIGINT. We are done\n");
break;
}
/*
** Fetch the current IP and SP from the child's user area. Log them.
*/
ip = ptrace(PTRACE_PEEKUSER, Tpid, ipoffs, 0);
sp = ptrace(PTRACE_PEEKUSER, Tpid, spoffs, 0);
/*
** Checkto see where we are in the process. Using the ldd(1) utility, I
** dumped the list of shared libraries that were required by this process.
** This showed:
**
** libc.so.6 => /lib/i686/libc.so.6 (0x40030000)
** /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
**
** So basically, we can assume that any execuable address pointed to by
** the IP that is *over* 0x40000000 is either in ld.so, libc.so, or in
** some sort of kernel state. We really don't care about these addresses
** so we'll skip 'em. Also, nm(1) showed that all the local symbols
** we would be interested in start in the 0x08000000 range.
*/
if (ip & D_LINUXNONUSRCONTEXT) {
continue;
}
if (initialIP == -1) {
initialIP = ip;
initialSP = sp;
printf("---- Starting LOOP IP %x SP %x ---- \n",
initialIP, initialSP);
} else {
if ((ip == initialIP) && (sp == initialSP)) {
ptrace(PTRACE_CONT, Tpid, 0, signo);
printf("----- LOOP COMPLETE -----\n");
break;
}
}
printf("Stat %x IP %x SP %x Last signal %d\n",stat, ip, sp,
signo);
/*
** If we're back to where we started tracing the loop, then exit
*/
}
printf("Debugging complete\n");
sleep(5);
return(0);
}
The nm(1) utility was run against the offending process's executable
image. nm(1) reads symbols from the ELF image and prints
out starting addresses of any statically linked area. Dymanically resolved
symbols are also printed out, but obviously with no address information.
Since all local routines are statically linked, you can see the names
of each routine in the process.
Other useful utilities along this line are objdump(1) and
readelf(1). The objdump(1) utility provides much
more useful information than the nm(1) utility, dumping all ELF sections,
as well as providing disassembly listings of all code sections. Objdump(1)
comes as part of the GCC toolset, but may not be available on all systems.
The nm(1) utility is, and that is why I used it in this example.
results of the 'nm' command. Note that the target process was *not*
compiled/linked with the -g (debugging) option.
080484e8 T CalcIteration
08048558 T SubFunc1
08049664 ? _DYNAMIC
08049638 ? _GLOBAL_OFFSET_TABLE_
080485e4 R _IO_stdin_used
0804962c ? __CTOR_END__
08049628 ? __CTOR_LIST__
08049634 ? __DTOR_END__
08049630 ? __DTOR_LIST__
08049624 ? __EH_FRAME_BEGIN__
SNIP (for space... we all know
what the output from nm looks like)
080485c0 t gcc2_compiled.
080484c0 t gcc2_compiled.
080484b0 t init_dummy
080485b0 t init_dummy
080484c0 T main
0804972c b object.2
0804961c d p.0
U printf@@GLIBC_2.0
U sleep@@GLIBC_2.0
Ok, a few minor things to note. The main function starts at
0x080484c0 and runs for (at most) 0x126c bytes. There are a bunch of
obvious library calls, and two suspicously named functions,
CalcIteration, which starts at 0x080484e8 and runs for 0x6d bytes,
and SubFunc1, which starts at 0x08048558 and runs for 0x110c
bytes.
Now, the output from our little 'debugger' program
Tracing pid 2469
Attaching to process 2469
Wait result stat 137f pid 2469
---- Starting LOOP IP 8048568 SP bffff900 ----
Stat 57f IP 8048568 SP bffff900 Last signal 0
Stat 57f IP 804856b SP bffff910 Last signal 0
Stat 57f IP 804856e SP bffff910 Last signal 0
Stat 57f IP 8048571 SP bffff910 Last signal 0
Stat 57f IP 8048573 SP bffff910 Last signal 0
Stat 57f IP 8048575 SP bffff910 Last signal 0
Stat 57f IP 8048577 SP bffff910 Last signal 0
Stat 57f IP 8048578 SP bffff91c Last signal 0
Stat 57f IP 804852a SP bffff920 Last signal 0
Stat 57f IP 804852d SP bffff930 Last signal 0
Stat 57f IP 804852f SP bffff930 Last signal 0
Stat 57f IP 8048532 SP bffff930 Last signal 0
Stat 57f IP 8048534 SP bffff930 Last signal 0
Stat 57f IP 8048537 SP bffff92c Last signal 0
Stat 57f IP 804853a SP bffff928 Last signal 0
Stat 57f IP 804853d SP bffff924 Last signal 0
Stat 57f IP 8048542 SP bffff920 Last signal 0
Stat 57f IP 8048390 SP bffff91c Last signal 0
Stat 57f IP 8048547 SP bffff920 Last signal 0
Stat 57f IP 804854a SP bffff930 Last signal 0
Stat 57f IP 804854d SP bffff930 Last signal 0
Stat 57f IP 804854f SP bffff930 Last signal 0
Stat 57f IP 804850c SP bffff930 Last signal 0
Stat 57f IP 8048510 SP bffff930 Last signal 0
Stat 57f IP 804851c SP bffff930 Last signal 0
Stat 57f IP 804851f SP bffff928 Last signal 0
Stat 57f IP 8048522 SP bffff924 Last signal 0
Stat 57f IP 8048525 SP bffff920 Last signal 0
Stat 57f IP 8048558 SP bffff91c Last signal 0
Stat 57f IP 8048559 SP bffff918 Last signal 0
Stat 57f IP 804855b SP bffff918 Last signal 0
Stat 57f IP 804855e SP bffff910 Last signal 0
Stat 57f IP 8048561 SP bffff904 Last signal 0
Stat 57f IP 8048563 SP bffff900 Last signal 0
Stat 57f IP 8048370 SP bffff8fc Last signal 0
----- LOOP COMPLETE -----
Debugging complete
Ok, this should be one complete iteration of our infinite loop.
Our loop starts at 0x08048568, which we can calculate to be at
SubFunc() + 0x10 bytes. From there, we progress normally for
8 instructions, until we jump to 0x0804852a, which we know
to be in CalcIteration() + 0x42 bytes. From there, we progress
for 9 more instructions till we hit 0x08048390. nm(1) does not
show this, but dumpobj() would have shown that this is part of
the dynamic relocation jump table, where a call to printf()
is resolved. So we can infer that the 0x08048390 is a printf()
statement.
Proceeding after the printf() jump we're still in CalcIteration().
We stay in that for the next 10 instructions, till we hit
0x08048558, the starting address of SubFunc(), which we execute
for 5 instructions. The last address, 0x08048370 is again part
of the dynamic relocation jump table, which resolves to a
call to sleep().
So, what have we found out. Well, without being able to
see the source code, and without access to a debugger, we
know our infinite loop bounces between the routines CalcIteration()
and SubFunc1(), and includes one call to printf() and one call to
sleep(). Enough to solve the problem? Maybe, maybe not, but it
sure is a lot more information than we had when we started.
Example Problem 2: The munged memory location
Consider the problem where we know that some memory address is getting
corrupted, but we have no idea when or where it is getting overwritten.
We've all at times wished for a way to implement a 'watch point', or
a way to monitor a known memory location to see when it changes, and
exactly what code we were executing when it changed. Some systems now
allow this through debuggers, others don't. But we're assuming that we
have no access to debuggers, right?
The following example is a program that implements a watchpoint
on a memory location using ptrace(2) (assuming that the target system does
not have an intrinsic watchpoint functionality... more on this further down).
It will tell you the exact address of the instruction that changed the memory
location supplied.
/* ******************************************************************
** watchpoint.c
**
** Implements a watchpoint on a supplied memory location. This will
** let you know when this address gets overwritten.
**
** usage:
**
** watchpoing PID 0xaddress
**
** LINUX x86 SPECIFIC VERSION. Other Unix systems will be similar
**
** jdblakey@innovative-as.com
**
** ******************************************************************
*/
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <wait.h>
#include <sys/ptrace.h>
#include <sys/reg.h>
#include <sys/user.h>
#include <sys/signal.h>
#define M_OFFSETOF(STRUCT, ELEMENT) \
(unsigned int) &((STRUCT *)NULL)->ELEMENT;
int main (int argc, char *argv[])
{
int Tpid, stat, res;
int signo;
int ip, sp;
int ipoffs, spoffs;
struct user u_area;
unsigned int memcontents = 0, startcontents = 0, watchaddr = 0;
/*
** This program is started with the PID of the target process and
** the watched address
*/
if ((argv[1] == NULL) || (argv[2] == NULL)) {
printf("Need pid of traced process\n");
printf("Usage: pt pid 0xwatchaddress\n");
exit(1);
}
Tpid = strtoul(argv[1], NULL, 10);
watchaddr = strtoul(argv[2], NULL, 16);
printf("Tracing pid %d. checking for change to %x \n",Tpid ,watchaddr);
/*
** Get the offset into the user area of the IP and SP registers. We'll
** need this later.
*/
ipoffs = M_OFFSETOF(struct user, regs.eip);
spoffs = M_OFFSETOF(struct user, regs.esp);
/*
** Attach to the process. This will cause the target process to become
** the child of this process. The target will be sent a SIGSTOP. call
** wait(2) after this to detect the child state change. We're expecting
** the new child state to be STOPPED
*/
printf("Attaching to process %d\n",Tpid);
if ((ptrace(PTRACE_ATTACH, Tpid, 0, 0)) != 0) {;
printf("Attach result %d\n",res);
}
res = waitpid(Tpid, &stat, WUNTRACED);
if ((res != Tpid) || !(WIFSTOPPED(stat)) ) {
printf("Unexpected wait result res %d stat %x\n",res,stat);
exit(1);
}
printf("Wait result stat %x pid %d\n",stat, res);
stat = 0;
signo = 0;
/*
** Get the starting value at the requested watch location. The PTRACE_PEEKEXT
** option allows you to reach into the tartet process address space, using
** its relocation maps, and read/change values. Nice, huh?
*/
startcontents = ptrace(PTRACE_PEEKTEXT, Tpid, watchaddr, 0);
printf("Starting value at %x is %x\n",watchaddr, startcontents);
/*
** Loop now, tracing the child
*/
while (1) {
/*
** Ok, now we will continue the child, but set the single step bit in
** the psw. This will cause the child to exeute just one instruction and
** trap us again. The wait(2) catches the trap.
*/
if ((res = ptrace(PTRACE_SINGLESTEP, Tpid, 0, signo)) < 0) {
perror("Ptrace singlestep error");
exit(1);
}
res = wait(&stat);
/*
** The previous call to wait(2) returned the child's signal number.
** If this is a SIGTRAP, then we set it to zero (this does not get
** passed on to the child when we PTRACE_CONT or PTRACE_SINGLESTEP
** it). If it is the SIGHUP, then PTRACE_CONT the child and we
** can exit.
*/
if ((signo = WSTOPSIG(stat)) == SIGTRAP) {
signo = 0;
}
if ((signo == SIGHUP) || (signo == SIGINT)) {
ptrace(PTRACE_CONT, Tpid, 0, signo);
printf("Child took a SIGHUP or SIGINT. We are done\n");
break;
}
/*
** get the current value from the watched address and see if it is
** different from the starting value. If so, then get the instruction
** pointer from the target's user area, 'cause this is the instruction
** that changed the value!
*/
memcontents = ptrace(PTRACE_PEEKTEXT, Tpid, watchaddr, 0);
if (memcontents != startcontents) {
ip = ptrace(PTRACE_PEEKUSER, Tpid, ipoffs, 0);
printf("!!!!! Watchpoint address changed !!!!!\n");
printf("Instruction that changed it at %x\n",ip);
printf("New contents of address %x\n",memcontents);
break;
}
}
printf("Debugging complete\n");
return(0);
}
Once this prints out the address of the instruction that changed
the watched address, then we can use the same techniques as above
to find out more about where the program was. Again, sometimes just
knowing which routine we were in when the memory address was corrupted
is enough to jumpstart the problem solving.
Other ptrace(1) Functionality
The following table lists some of the other useful arguments
to the ptrace(1) call. Note that different vendors implement
various flavors of these, so it would be best to refer to the
man pages on the target system. This list provides a sort of
generic set of capabilities.
PTRACE_ATTACH |
This 'attaches' the parent process to the target PID. In otherwords,
the process with the supplied PID is stopped and becomes a child of the
parent, allowing the parent access into its process space. The parent
will be trapped on all child state changes. This doesn't work so
well on Solaris |
PTRACE_TRACEME |
This is a call from a target process to tell the parent to trace it.
This is usually done after the parent fork(2)s the child and before the
child exec(3)s the new image. The parent will then be able to trace
the child from the start
|
PTRACE_SINGLESTEP |
This places the target child in 'singlestep' mode. Since the
target process is a child of the ptrace debugging process, the
parent will get a child state changed trap (child changed to STOPPTED)
that can be detected with the wait(2) call. |
PTRACE_SYSCALL |
This will trap the parent process for each system call (context
change into kernel mode) made by the child. Appendix A describes the
strace(2) utility which uses this.
|
PTRACE_CONT |
This call continues execution of the STOPPED target process.
If the target process stopped on a signal, the parent must deliver
the signal on to the child through this call.
|
PTRACE_PEEKTEXTPTRACE_POKETEXT |
Allows the program to read or write the word at addr in the
target's TEXT memory area. |
PTRACE_PEEKUSERPTRACE_POKEUSER |
Read or write a word from the target process's USER structure. This
structure holds the registers, stack and text start address, context
information, and other process information. See sys/user.h |
PTRACE_GETREGSPTRACE_SETREGS |
Read/write a copy of the target's general purpose registers to/from the
supplied location. This will be OS and machine architecture specific. |
PTRACE_GETFPREGSPTRACE_SETFPREGS |
Read/write a copy of the target's floating point registers to/from the
supplied location. This will be OS and machine architecture specific. |
The /proc File System
/proc is a pseudo-filesystem that allows us to read (and sometimes
write) both kernel and process data structures. Before, if we wanted to
do this, we had to tip-toe through /dev/kmem, and know all the
various kernel data structures. /proc makes this easy. For example,
if I want to know the status of a process with the PID 806, I don't
have to read /dev/kmem, or write some shell script to loop ps and use
grep/awk to filter the ps output. All I have to do is open /proc/806/stat
and read it. As a matter of fact, almost all modern implementations of
ps(1) use the /proc feature.
Keep in mind that when you are accessing these data structures, you are
not accessing copies of the data, you are reading/writing the actual kernel
data structures. The /proc filesystem manager is simply a filesystem driver
that translates I/O requests into read/writes of known kernel data structures.
The values you read are immediate.
This allows for amazing power, but also entails the usual risk of crashing
the kernel.
For example, in Linux if I use vi and modify the file
/proc/sys/net/ipv4/ip_forward to be a 1 instead of a 0, I've dynamically
enabled IP forwarding in the kernel from that point on. If I overwrite
/proc/kcore, I've overwritten the kernel. Bad news. For that reason,
many of the /proc files are read-only, and access to process specific
/proc files is limited to the normal user ownership access rules.
Kernel Access Through /proc
The /proc filesystem driver on Linux allows for access to many of
the kernel data structures, as well as read/write access to kernel
tunable parameters. Other Unix flavors implement this differently.
Solaris for example, does not allow access to kernel data structures
through /proc.
Appendix B describes some of the kernel data structures available
on Linux through the /proc filesystem.
Individual Process Access Through /proc
This works for almost all Unix implementations that offer the
/proc filesystem. Each vendor implements the structures differently, but
the concepts described here should be valid across platforms.
Every process on a Unix system has its own directory under the /proc
filesystem. The directory name is the process ID. Under that directory
are files that give visibility (and, for some files, control) into
that process. For example the /proc/PID/maps will give information
on how and where each segment is mapped for that process. The stat (or ps)
file will give information on how the process is running (same as the ps
command).
On Linux, these files are formatted in ASCII by the /proc filesystem
driver. On most other types of Unix, you read them as binary structures.
These structures are usually defined in /usr/include/sys/procfs.h.
The following table lists some common files available for each
process. Again, this is OS specific.
/proc/PID/stat |
This file contains information about the current status of the
process. This is the file the ps utility uses for its listing. Under
Linux, this is an ASCII printout. Under Solaris, this is a binary
structure found in sys/procfs.h
|
/proc/PID/maps |
This file contains entries for each currently mapped memory region,
its address offset and size, and the read/write/execute permission on it.
|
/proc/PID/cmdline |
This is a copy of the command line that started the process.
|
/proc/PID/fd |
This is a subdirectory containing one entry for each File Descriptor
the process has open. The entries in this subdirectory are special 'links'
to the actual files opened.
|
/proc/PID/mem |
This is an access point into the memory space for the target process.
On Solaris, this is called /proc/PID/as.
|
These are just some of the available files in /proc. Each Unix flavor
offers more, but these are the common ones.
An example of using /proc
The following code example searches the target process's address
space for valid segments. When it finds valid segments, it reads in
1k chunks from the target's memory and searches these for the supplied
text string. When it finds the string, modifies the first character
and writes it back to the target. The next time the target prints
the string, the first character will be different.
The /proc/PID/mem file can be treated just like any other
file. Once opened, you can use lseek() to seek to the target address.
The proc driver takes care of mapping that to process address space.
Then you can read/write as necessary. This example works under both
Linux and Solaris, with some exceptions noted below).
/* ------------------------------------------------------------------
** readmem.c
**
** This program is a quick example of using the /proc filesystem
** to access a process memory space.
**
** This process will scan through all addresses on 1k boundaries
** looking for readable segments by lseek()ing through the /proc/PID/mem
** file. Once it finds a valid segment, it will read buffers and search
** for specific values
**
** On Non-Linux systems, it will then change the first character of the
** buffer to an 'A'
**
** To run:
**
** readproc PID "string to search for "
**
** ------------------------------------------------------------------
*/
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/user.h>
#include <sys/signal.h>
#define D_LINUX 1
main(int argc, char *argv[])
{
int Tpid;
int pfd;
char buff[1024];
char *srchstr;
char *eptr;
unsigned int addr;
int goodaddr;
int goodread;
int srchlen;
int j;
/*
** Get arguments. Since this is an example program, I won't validate them
*/
if ((argv[1] == NULL) || (argv[2] == NULL)) {
printf("Need PID and string to search for\n");
printf("usage: readmem PID string\n");
exit(1);
}
Tpid = strtoul(argv[1], &eptr, 10);
srchstr = (char *)strdup(argv[2]);
srchlen = strlen(srchstr);
printf("readproc: Tracing PID %d for string [%s] len %d\n",Tpid,
srchstr, srchlen);
/*
** In order to read /proc/PID/mem from this process, I have to have already
** stopped it via Ptrace(). (This is not documented anywhere, by the
** way). Anyway, this will leave the process in a STOPPED state. We'll
** start it again in a minute...
*/
if ((ptrace(PTRACE_ATTACH, Tpid, 0, 0)) != 0) {
printf("procexa: Attached to process %d\n",Tpid);
}
/*
** Create the string and open the proc mem file. Note under Linux this
** file is /proc/PID/mem while under Solaris this is /proc/PID/as
** Also, even though we open this RDWR, Linux will not allow us to write
** to it.
*/
sprintf(buff,"/proc/%d/as", Tpid);
printf("procexa:opening [%s]\n",buff);
if ((pfd = open(buff,O_RDWR)) <= 0) {
perror("Error opening /proc/PID/mem file");
exit(2);
}
/*
** Start at zero, lseek and try to read. increment by 1024 bytes. If
** lseek returns a good status, then the address range is mapped. It
** should be readable. Print out start and end of valid mapped address
** ranges.
*/
goodaddr = 0; goodread = 0;
for (addr = 0; addr < (unsigned int)0xf0000000; addr += 1024) {
if (lseek(pfd, addr, SEEK_SET) != addr) {
if (goodaddr == 1) {
printf("Address: %x RANGE END\n",addr);
goodaddr = 0;
}
continue;
} else {
if (goodaddr == 0) {
printf("Address %x RANGE START\n",addr);
goodaddr = 1;
}
}
/*
** Read a 1k buffer and search it for the supplied text string
*/
if (read(pfd, buff, 1024) <= 0) {
if (goodread == 1) {
printf("READ address %x RANGE END\n",addr);
goodread = 0;
}
continue;
} else {
if (goodread == 0) {
printf("READ address %x RANGE START\n",addr);
goodread = 1;
}
for (j = 0; j < 1024; j++) {
if (memcmp(&buff[j], srchstr, srchlen) == 0) {
printf("*****Pattern found %x\n",
addr + j);
buff[j] = 'A';
}
}
/*
** If NOT Linux, then write the modified buffer back to the address
** space.
*/
#ifndef D_LINUX
lseek(pfd, addr, SEEK_SET);
if (write(pfd, buff, 1024) <= 0) {
printf("Nope on write\n");
}
#endif
}
}
printf("Stopped at address %x\n",addr);
}
There are a few things to note about this example. First, although
it is not documented anywhere, you must first PTRACE_ATTACH to the target
process before you can open the /proc memory image file. This is a
security feature. Also, unless you're attaching to yourself (/proc/self/mem),
you must be root.
Second, there is a bug in the Linux implementation of the /proc/mem
driver's lseek logic. Technically, stack space should be available for
reading. Stack space starts at 0xbffffxxx. Note that this has the sign
bit set. The lseek() code checks for a negative seek offset, and returns
an error, where it should treat the offset as unsigned as it is a valid
part of the process address space. The kernel /dev/mem driver does this
correctly. This will be fixed in a later release of Linux.
Third, under Linux you can not write to the /proc/PID/mem
file. The standard distribution kernel has the write code in the
/proc/mem driver conditionally compiled out. This was done for
security reasons, as a non-suid process could fork and ptrace a
suid process, then overwrite its executable memory with evil
code. Solaris either handles this differently or does not consider
this a problem.
Finally, you may notice that the string seached for is found twice.
This is because the dynamic loader (ld.so) mmaps two version of the
text/data segments into process space. One is read/execute-only, the
other is enabled for write. See the /proc/PID/maps (or similar)
file to verify this.
Issues With These Approaches
Heisenbugs
A 'Heisenbug' is a change in the behavior of a bug through the actions
of debugging. Sometimes the act of debugging a process will cause the
bug you're searching for to disappear, or new, fun and exciting bugs to
appear.
Anytime you introduce a debugger into a process, things change. For
simple problems, this is usually not an issue. However for timing related
problems, or for debugging real-time processes, using a standard debugger
can cause more headaches than it solves.
Debugging is an expensive process. For each breakpoint or single
step trap, an interrupt is generated, the child's context is saved,
the parent's context is loaded, and the parent is dispatched. This
can take a lot of time. Then, with a traditional debugger, the meat
behind the keyboard has to manually press the 'next' button.
The process is then reversed. In computer time, this is eons.
Now, being able to write a program that automatically single-steps
a process, or handles a breakpoint, cuts those eons down to computer-
managable times. Further, the program can be tailored in such a way
that the initial debugging is very lightweight (monitoring, say, an address
using the /proc/PID/mem file) until a set of conditions is met.
Only then does expensive breakpoint/single step debugging kick in.
So it may be possible using these techniques to avoid some of the
normal problems encountered debugging timing critical or real-time
problems.
Summary
The usefulness of these features should be obvious. The
examples given show several interesting cases, but by no means is
that all we can do. For example, we can attach to several processes
at once, and watch their interaction. We can use a trigger in one
process to start debugging in another. We can tailor our debugging
to wait silently for a certain event (memory change, process output,
etc) before attaching and debugging. In effect, we are no longer
constrained by the limitations of available debuggers.
Appendix A: Special Bonus: the strace(2) utility
The strace(2) utility is tool written using ptrace(2) calls
that allow you to trace and log system calls a target process
makes. This is very useful for profiling a process's execution,
as well as gaining some insite into what the process is doing.
For each system call executed, the name, arguments, and return code
are logged to stdout.
Note that under Solaris, the strace(1) utility is a Streams
utility. To trace process system call execution on Solaris, use
truss(1).
The following is a listing of a very simple "Hello World" program
under Linux. Note that there is a lot happening here under the covers.
I'll discuss that in a minute.
[jimb@chaos PTrace]$ strace world
execve("./world", ["world"], [/* 34 vars */]) = 0
uname({sys="Linux", node="chaos", ...}) = 0
brk(0) = 0x8049620
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=95241, ...}) = 0
old_mmap(NULL, 95241, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40017000
close(3) = 0
open("/lib/i686/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \306\1"..., 1024) = 1024
fstat64(3, {st_mode=S_IFREG|0755, st_size=5772268, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4002f000
old_mmap(NULL, 1290088, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40030000
mprotect(0x40162000, 36712, PROT_NONE) = 0
old_mmap(0x40162000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x131000) = 0x40162000
old_mmap(0x40167000, 16232, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40167000
close(3) = 0
munmap(0x40017000, 95241) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
write(1, "Hello world\n", 12Hello world
) = 12
munmap(0x40017000, 4096) = 0
_exit(12) = ?
This is not difficult to interpret, given what we know about
ptrace and a few other bits of information.
Strace forks the target process and uses the PTRACE_TRACEME call
before execing it. Therefore, as we would expect, the first system call
is execv(). Brk() is the system call to change a process data
segment size. The malloc(3) library call will often result in a brk(2)
system call.
All ELF format executables run under a dynamic loader (ld.so) that
resolves undefined symbols as needed. With that in mind, the next few
system calls are from ld.so on behalf of our world.c task. We see it
opening several important files (/etc/ld.so.preload and /etc/ld.so.cache)
that tell ld.so where to find library files, then opening /lib/i686/libc.so.6
to resolve printf() and other library calls. The calls to old_mmap() are
ld.so mapping segments of libc.so.6 into our process space (around
0x400xx000).
Finally, the one executable line of world.c (printf("Hello World\n");)
breaks down to a write() system call, where the string "Hello World" is
visable.
Keep in mind when analyzing strace logs that ld.so plays an important
part during the entire execution life of a process. You will often see
signs of its activity as it dynamically resolves symbols during runtime.
Appendix B: Interesting /proc system files in Linux
Note that this is Linux specific. The Solaris implementation
allows for access to all processes, but kernel data structures are not
exposed.
There are several interesting files in the /proc filesystem that
relate to kernel information. For example, the /proc/interrupts file
contains information relating to the IRQ allocations. Note how the
filesystem driver takes the raw kernel structure and formats it for
us as readable ASCII text.
[chaos] cat /proc/interrupts
CPU0
0: 3016820 XT-PIC timer
1: 53769 XT-PIC keyboard
2: 0 XT-PIC cascade
8: 1 XT-PIC rtc
9: 107195 XT-PIC usb-uhci, Ricoh Co Ltd RL5c475, ymfpci, eth0
12: 177043 XT-PIC PS/2 Mouse
14: 94042 XT-PIC ide0
NMI: 0
ERR: 0
|
So from this we can see which kernel modules are using which IRQs,
and a running count of the number of interrupts received on each IRQ.
Remember, that these reflect immediate current values for IRQs.
Two other interesting files are the /proc/ioports and /proc/iomem files.
These show which I/O ports are used by which kernel modules, and exactly
how the I/O memory is mapped. This is very useful for solving memory and
IRQ conflicts when writing PC device drivers.
[chaos] cat /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
03c0-03df : vga+
03e8-03ef : serial(auto)
03f6-03f6 : ide0
03f8-03ff : serial(set)
0cf8-0cff : PCI conf1
1040-105f : Intel Corp. 82371AB/EB/MB PIIX4 ACPI
4000-40ff : PCI CardBus #02
4400-44ff : PCI CardBus #02
8000-803f : Intel Corp. 82371AB/EB/MB PIIX4 ACPI
fc38-fc3f : Conexant HSF 56k Data/Fax Modem (Mob WorldW SmartDAA)
fc40-fc7f : Intel Corp. 82557/8/9 [Ethernet Pro 100]
fc40-fc7f : eepro100
fc8c-fc8f : Yamaha Corporation YMF-744B [DS-1S Audio Controller]
fc90-fc9f : Intel Corp. 82371AB/EB/MB PIIX4 IDE
fc90-fc97 : ide0
fc98-fc9f : ide1
fca0-fcbf : Intel Corp. 82371AB/EB/MB PIIX4 USB
fca0-fcbf : usb-uhci
fcc0-fcff : Yamaha Corporation YMF-744B [DS-1S Audio Controller]
[chaos] cat /proc/iomem
00000000-0009f7ff : System RAM
0009f800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-07feffff : System RAM
00100000-00211f08 : Kernel code
00211f09-0028006b : Kernel data
07ff0000-07fff7ff : ACPI Tables
07fff800-07ffffff : ACPI Non-volatile Storage
10000000-10000fff : Ricoh Co Ltd RL5c475
10001000-100013ff : Sony Corporation Memory Stick Controller
10400000-107fffff : PCI CardBus #02
10800000-10bfffff : PCI CardBus #02
40000000-40ffffff : Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge
a0000000-a0000fff : card services
fc000000-fdffffff : PCI Bus #01
fc000000-fdffffff : Neomagic Corporation NM2380 [MagicMedia 256XL+]
fe400000-febfffff : PCI Bus #01
fe400000-fe7fffff : Neomagic Corporation NM2380 [MagicMedia 256XL+]
feb00000-febfffff : Neomagic Corporation NM2380 [MagicMedia 256XL+]
fec00000-fecfffff : Intel Corp. 82557/8/9 [Ethernet Pro 100]
fede0000-fedeffff : Conexant HSF 56k Data/Fax Modem (Mob WorldW SmartDAA)
fedf6000-fedf6fff : Intel Corp. 82557/8/9 [Ethernet Pro 100]
fedf6000-fedf6fff : eepro100
fedf7000-fedf77ff : Sony Corporation CXD3222 i.LINK Controller
fedf7c00-fedf7dff : Sony Corporation CXD3222 i.LINK Controller
fedf8000-fedfffff : Yamaha Corporation YMF-744B [DS-1S Audio Controller]
fedf8000-fedfffff : ymfpci
fff80000-ffffffff : reserved
|
Finally, as an example of the flexability of this implementation,
follow out some of the directory structure under /proc/sys. For example,
look in the /proc/sys/net/ipv4 directory. This contains many files, each
describing tunable IP options. You can dynamically modify many of these
to change the behavior of the system. These changes take effect immediately.
Appendix C: Solaris Implementations of /proc Debugging
Ptrace exists on Solaris, but is poorly implemented. Several important
features do not work as expected, and the ptrace call itself is very
poorly documented. Sun has choosen instead to implement debugging through
extending the /proc filesystem. Solaris provides several new files, including
/proc/PID/ctl and /proc/PID/lwpctl, which are writable
control files that provide a rich set of process control and debug
features.
|