http://www.cyberciti.biz/tips/compiling-linux-kernel-26.html
http://www.cyberciti.biz/tips/compiling-linux-kernel-module.html
http://www.cyberciti.biz/tips/build-linux-kernel-module-against-installed-kernel-source-tree.html
http://www.sysdesign.ca/guides/linux_kernel.html
Saturday, January 2, 2010
Friday, January 1, 2010
lsof+strace
The Basics
This section covers briefly the terminology that you need to familiarize yourself with in order to understand the results of the operations described later in the article. Do not worry, it is not more difficult when answering to Windows Vista UAC question ;)
System Call (a.k.a. "syscall")
System Call is a mechanism for the application to make a call to the operating system. The OS usually has a limited number of the "entry points" that can be used for the basic and fundamental tasks like file I/O, memory allocation, networking etc. For example, most of the UNIX systems have only 1-3 different system calls to open a file (although various libraries offer dozens of ways to deal with a file, they all finally make the same system call). The important thing about the system call that you need to understand is that there are few of them and they are the only way for any program to interact with the OS.
Process ID (PID)
Any process in the UNIX OSes is defined by its unique numeric PID, even if the process runs several threads. From the moment you start a program till the moment it is finished it maintains the same PID. However, some processes can span other processes (usually by using one of the fork() syscalls).
File Handle
Any I/O operations in UNIX is associated with a file handle which is a positive integer number. It applies to any file on the disk, a network socket etc. A typical operation with a file on the disk, for example, is one of the following:
1. open("filename") -> returns a handle that from this moment will be associated with this open file
2. read(handle, where, how_much)
3. write(handle, from_where, how_much)
4. close(handle) -> removes the association established with open()
Handles are unique only for an individual process. There are well-known handles available by default for any application, such as 0 for the standard input, 1 for standard output and 2 for the standard error stream.
Turn out your pockets, process!
We will start from a tool that can show us a snapshot of what is happening right now with our system. There is an excellent tool that is available for many UNIX systems called "lsof" ("LiSt Open Files"). You can find more information about this tool here. Most of the Linux distributions and Mac OS X have this tool included.
Here is what this tool can do:
* Show the list of files and other resources open by a selected process or by all processes in the system
* Show the list of processes using a particular resource (like a TCP socket, for example)
You can do various types of sorting, filtering and formatting for getting exactly the results you want. The program has about 3 dozens of command line options, we will try a couple of typical and most representative examples now. Note: most of these examples require root access to the system (or you will be limited to seeing only the information that is related to your processes).
How to see the list of files opened by a running process
For example, we would like to see which resources are currently used by Apache HTTP server. First, we need to identify the process we are interested in:
[root@localhost]$ ps ax | grep httpd | grep -v grep
2383 ? Ss 0:00 /usr/sbin/httpd
2408 ? S 0:00 /usr/sbin/httpd
2409 ? S 0:00 /usr/sbin/httpd
2410 ? S 0:00 /usr/sbin/httpd
2411 ? S 0:00 /usr/sbin/httpd
2412 ? S 0:00 /usr/sbin/httpd
2414 ? S 0:00 /usr/sbin/httpd
2415 ? S 0:00 /usr/sbin/httpd
2416 ? S 0:00 /usr/sbin/httpd
Hmm...which one I am interested in? Unfortunately, there is no universal recipe to answer this question. However...lets look at the problem from another perspective. At some point I have mentioned that when you start a program, the OS creates a process. And a process can spawn other processes. So, we can assume that there may be a tree of sub-processed related to one parent. Lets see how we can get this information (this command will work on both Linux and OS X):
[root@localhost]$ ps axo pid,ppid,command | egrep '(httpd)|(PID)' |
grep -v grep
2383 1 /usr/sbin/httpd
2408 2383 /usr/sbin/httpd
2409 2383 /usr/sbin/httpd
2410 2383 /usr/sbin/httpd
2411 2383 /usr/sbin/httpd
2412 2383 /usr/sbin/httpd
2414 2383 /usr/sbin/httpd
2415 2383 /usr/sbin/httpd
2416 2383 /usr/sbin/httpd
PPID is simply Parent PID, i.e. Parent Process ID. As you can see, most of these processes have the same PPID (2383) and there is one process with the PID of 2383. This must be the parent process. If you are curious to find out who is its parent, I will let you to do this exercise yourself.
Linux has another fancy tool called "pstree". We could use it to get the same kind of information:
[root@localhost]$ pstree -p | grep httpd
|-httpd(2383)-+-httpd(2408)
| |-httpd(2409)
| |-httpd(2410)
| |-httpd(2411)
| |-httpd(2412)
| |-httpd(2414)
| |-httpd(2415)
| `-httpd(2416)
Looks better and quite self-explaining, isn't it? OK, back to the original question. Now we know that there are certain relationships with these "httpd"s. So, our target will be the process 2383.
In its simples form, "lsof" can be called like this:
[root@localhost]$ lsof -p 2383
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
httpd 2383 root cwd DIR 3,6 1024 2 /
httpd 2383 root rtd DIR 3,6 1024 2 /
httpd 2383 root txt REG 3,8 328136 261136 /usr/sbin/httpd
httpd 2383 root mem REG 3,6 130448 8167 /lib64/ld-2.5.so
httpd 2383 root DEL REG 0,8 8072 /dev/zero
httpd 2383 root mem REG 3,6 615136 79579 /lib64/libm-2.5.so
httpd 2383 root mem REG 3,6 117656 8186 /lib64/libpcre.so.0.0.1
httpd 2383 root mem REG 3,6 95480 8213 /lib64/libselinux.so.1
...
httpd 2383 root mem REG 3,8 46160 1704830 /usr/lib64/
php/modules/ldap.so
httpd 2383 root DEL REG 0,8 8120 /dev/zero
httpd 2383 root DEL REG 0,8 8122 /dev/zero
httpd 2383 root 0r CHR 1,3 1488 /dev/null
httpd 2383 root 1w CHR 1,3 1488 /dev/null
httpd 2383 root 2w REG 3,12 1077 4210 /var/log/
httpd/error_log
httpd 2383 root 3r CHR 1,9 1922 /dev/urandom
httpd 2383 root 4u IPv6 8052 TCP *:http (LISTEN)
httpd 2383 root 5u sock 0,5 8053 can't identify protocol
httpd 2383 root 6u IPv6 8057 TCP *:https (LISTEN)
httpd 2383 root 7u sock 0,5 8058 can't identify protocol
httpd 2383 root 8r FIFO 0,6 8069 pipe
httpd 2383 root 9w FIFO 0,6 8069 pipe
httpd 2383 root 10w REG 3,12 1077 4210 /var/log/
httpd/error_log
httpd 2383 root 11w REG 3,12 711 4215 /var/log/
httpd/ssl_error_log
httpd 2383 root 12w REG 3,12 0 4138 /var/log/
httpd/access_log
httpd 2383 root 13w REG 3,12 0 4151 /var/log/
httpd/ssl_access_log
httpd 2383 root 14w REG 3,12 0 4152 /var/log/
httpd/ssl_request_log
Looks quite interesting, doesn't it? Now I am going to briefly explain the meaning of each kind of entry (you can get more information from the manual page for "lsof").
COMMAND
Contains first 9 characters of the UNIX command name associated with the process
PID
The process ID. In our example it is simply the PID we have asked the information for
USER
The UNIX user owning the process
FD
File Descriptor number. Actually, it is more than the descriptor I have mentioned before. There are special descriptors identifying various resources in the UNIX system. For more details I recommend to check the man page. As you can see, everything is there: the current working directory (cwd), the process image itself (txt), memory-mapped libraries etc. The most interesting for us are the open files and sockets: you can see the numbers 0..14 followed by a suffix ("w" for writing only, "r" for reading only, "u" for both reading and writing and others). For example, you can see that Apache httpd redirects all its error output (2w) to /var/log/https/error_log. It also has a bunch of other regular files open for writing (handles 10-14, various log files). You can also see that it is listening on two TCP sockets (the port numbers shown are resolved through /etc/services file, you can always use "-n" option to disable any name resolution - useful if you, for example, know the port number and want to look for it specifically). You can see two more sockets (do not be afraid of the message "can't identify...", it is just application-specific protocol), Apache is using them to talk to its sub-processes.
TYPE
Type of the node. There is several dozens of various node types, like "REG" for the regular files, "IPv4" (or "IPv6 for IP version 6) for the Internet sockets, "CHR" for the character devices, "DIR" for the directories etc.
DEVICE
Shows the UNIX device numbers (major, minor) associated with the resource. Another interesting parameter. For example, on my system the numbers 3,12 correspond to:
[root@localhost]$ ls -l /dev/* | egrep '3,\s*12'
brw-r----- 1 root disk 3, 12 Jul 17 19:17 /dev/hda12
[root@localhost]$ mount | grep hda12
/dev/hda12 on /var type ext3 (rw)
This is true, the files are locates on one of the partitions on the first IDE disk mounted at /var.
SIZE
The size of the file or the current offset (in bytes). This number does not make sense for some of the resource types so you see blank values there.
NODE
Node number of the file, protocol identifier (for Internet sockets) etc.
Name
The name of the file or resource, if applicable. The value is quite self-explaining and probably one of the most useful.
Try using this command for other processes in your system. Probably one of the most interesting ones to look at will be your X11 server.
There is a number of filters you can use to see only the information you are interested in. For example, I want to know does my iTunes talk to someone over the network or not. Here is a command I would use something like:
[root@localhost]$ lsof -p 1559 -a -i
Which produces empty answer (exactly what I was hoping for :) ). But if I try my SSH client instead of iTunes:
[root@localhost]$ lsof -p 1586 -a -i
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ssh 1586 nik 3u IPv4 0x40e7394 0t0 TCP macbook:50648
->oberon:ssh (ESTABLISHED)
Summary: you can get a snapshot of any live process using this command. Sometimes it is simpler to use "lsof" (especially on a system you are not familiar with) to find the location of the log file, config files for a service etc.
Who is using that file?
From time to time you may become very curious about who (or at least which process :) ) is using certain resource(s) on your system. There may be a number of reasons: a server that does not start complaining about the port that is already being used, a suspicious network connection that you see when using "netstat" command, inability to unmount a network disk or a external storage unit etc.
A traditional "fuser" command is not always good enough for these purposes. For example, if I want to see all the processes that have any open files under /etc directory. Here is the command that give me the answer:
[root@localhost]$ lsof +D /etc
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
dbus-daem 2206 dbus 5r DIR 3,6 1024 59500 /etc/dbus-1/system.d
acpid 2305 root 6r REG 3,6 236 63639 /etc/acpi/
events/video.conf
avahi-dae 2541 avahi cwd DIR 3,6 1024 63468 /etc/avahi
avahi-dae 2541 avahi rtd DIR 3,6 1024 63468 /etc/avahi
prefdm 2662 root 255r REG 3,6 1465 59322 /etc/X11/prefdm
000-delay 3831 root 255r REG 3,6 577 59446 /etc/cron.
daily/000-delay.cron
When you use "+D path" option, lsof checks every single file under the specified path and shows the information about the processes using it. This command may take some time to execute if you have a lot of files there.
Another situation: a network connection. "-i" option of lsof allows you to search for a specific network connection. The format is following: lsof -i [4|6][protocol][@hostname|hostaddr][:service|port]
"4" or "6" defines the IP protocol version. If you do not use any filters and just use "-i" you will get the list of all connections in your system. Lets find all the UDP connections that we have on a OS X box:
[root@localhost]$ lsof -i UDP
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
launchd 1 root 10u IPv4 0x393ed40 0t0 UDP *:netbios-ns
mDNSRespo 39 root 7u IPv4 0x393e790 0t0 UDP *:mdns
mDNSRespo 39 root 8u IPv6 0x393e6c0 0t0 UDP *:mdns
mDNSRespo 39 root 9u IPv4 0x41556c0 0t0 UDP *:mdns
mDNSRespo 39 root 11u IPv4 0x4155ee0 0t0 UDP *:mdns
mDNSRespo 39 root 12u IPv4 0x393dd00 0t0 UDP macbook:49727
netinfod 40 root 6u IPv4 0x393ee10 0t0 UDP localhost:
netinfo-local
syslogd 41 root 16u IPv4 0x393ec70 0t0 UDP *:49175
Directory 52 root 10u IPv4 0x3fbd1a0 0t0 UDP *:*
Directory 52 root 12u IPv4 0x4155d40 0t0 UDP *:*
ntpd 141 root 5u IPv4 0x393e520 0t0 UDP *:ntp
ntpd 141 root 6u IPv6 0x393e110 0t0 UDP *:ntp
ntpd 141 root 7u IPv6 0x393ead0 0t0 UDP localhost:ntp
ntpd 141 root 8u IPv6 0x393e860 0t0 UDP [fe80:1::1]:ntp
ntpd 141 root 9u IPv4 0x393e5f0 0t0 UDP localhost:ntp
ntpd 141 root 10u IPv6 0x393e450 0t0 UDP [fe80:5::216:
cbff:febf:a08d]:ntp
ntpd 141 root 11u IPv4 0x393e380 0t0 UDP macbook:ntp
prl_dhcpd 147 root 5u IPv4 0x393db60 0t0 UDP *:*
automount 168 root 8u IPv4 0x3fbd000 0t0 UDP localhost:1023
automount 179 root 8u IPv4 0x393e930 0t0 UDP localhost:1022
nmbd 1078 root 0u IPv4 0x393ed40 0t0 UDP *:netbios-ns
nmbd 1078 root 1u IPv4 0x393ed40 0t0 UDP *:netbios-ns
nmbd 1078 root 8u IPv4 0x393ea00 0t0 UDP *:netbios-dgm
nmbd 1078 root 9u IPv4 0x3fbd4e0 0t0 UDP 10.37.129.2:
netbios-ns
nmbd 1078 root 10u IPv4 0x3fbeba0 0t0 UDP 10.37.129.2:
netbios-dgm
nmbd 1078 root 11u IPv4 0x4155380 0t0 UDP macbook:
netbios-ns
nmbd 1078 root 12u IPv4 0x4155450 0t0 UDP macbook:
netbios-dgm
cupsd 1087 root 6u IPv4 0x3fbd340 0t0 UDP *:ipp
Do not be surprised if you do not see anything on your machine - you will not see the information about the processes that do not belong to you unless you are root.
Summary: using lsof with various filters allows you to search through your system and find out the detailed information about any local or network resource used.
Watch your steps
We have learned how to get a snapshot or your UNIX system (or any individual process living in it). But sometimes it is not enough to look at the static information. Dynamic runtime information may answer a lot of other questions - why does the program crash, why does it say "cannot open resource" without providing the resource location, is it accepting the network connection or not etc.
In the fisrt section it was mentioned that the way the UNIX program talks to the operating system (i.e. deals with almost any resource on your computer) is called "system calls". Most of the UNIX kernels, including the Linux and OS X provide a way for the user to see the list of the system calls a process makes in real time, including the parameters the process passes to the kernel. There are different tools on different systems that provide this functionality. On Linux it is called "strace", on BSD (including OS X) there are two toolss working together - "ktrace" and "kdump". We will briefly cover both of them starting from the Linux one.
The really great thing about these tools is that they use the kernel tracing facility in order to work. Thus, they work with ANY program that is running in your system, it does not have to be something you have developed and you do not need the source code of the program in order to understand (in general, of course) what it is doing.
It is important to mention again that just like in case of "lsof" you cannot see anything that you are not supposed to see, i.e. you cannot see what process is doing if it is not yours (or unless you are root user).
The simplest way to run the "strace" would be as: strace command Lets see what a simple well-known like "w" (the command that shows the users logged in and their current activities) does:
[root@localhost]$ strace /usr/bin/w 2>&1 | less
execve("/usr/bin/w", ["/usr/bin/w"], [/* 30 vars */]) = 0
brk(0) = 0x603000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2aaaaaaab000
uname({sys="Linux", node="oberon.anglab.com", ...}) = 0
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such
file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=103874, ...}) = 0
mmap(NULL, 103874, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aaaaaaac000
close(3) = 0
open("/lib64/libproc-3.2.7.so", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\
240;`\314"..., 832) = 832
...
open("/proc/uptime", O_RDONLY) = 3
lseek(3, 0, SEEK_SET) = 0
read(3, "7267.86 7230.07\n", 1023) = 16
access("/var/run/utmpx", F_OK) = -1 ENOENT (No such file
or directory)
open("/var/run/utmp", O_RDWR) = -1 EACCES (Permission denied)
open("/var/run/utmp", O_RDONLY) = 4
fcntl(4, F_GETFD) = 0
...
open("/proc", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
getdents(3, /* 35 entries */, 1024) = 1000
getdents(3, /* 38 entries */, 1024) = 1016
stat("/proc/1", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/1/stat", O_RDONLY) = 4
read(4, "1 (init) S 0 1 1 0 -1 4194560 64"..., 1023) = 216
close(4) = 0
socket(PF_FILE, SOCK_STREAM, 0) = 4
...
fcntl(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
alarm(0) = 1
rt_sigaction(SIGALRM, {SIG_DFL}, NULL, 8) = 0
close(5) = 0
write(1, " 21:31:48 up 2:01, 2 users, l"..., 281 21:31:48 up 2:01,
2 users, load average: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
nick pts/0 192.168.8.18 21:12 9.00s 0.03s 0.03s -bash
nick pts/1 192.168.8.18 21:19 0.00s 0.13s 0.04s strace
/usr/bin
) = 281
exit_group(0) = ?
Process 3947 detached
Only a small part of the long output is shown here. As you can see, everything starts with the "execve" syscall. Which is correct - this is the call that executes a program in UNIX. By the way, for each of these calls you can get more information from the manual pages - just type "man execve" to learn more about the call and its parameters (as you can see the parameters are shown in the trace).And, as it is supposed to be, the execution ends with "exit_group" call with the status 0 (success).
You can see how the program being started loads the shared libraries it depends on. You can see that at some point it tries to open "var/run/utmpx" file but it does not exist (ENOENT). Then it tries to open "/var/run/utmp" file but it is not readable for this program (i.e. for this user) so it gets back (EACCES). Finally this program decides that the only way to find out who is logged in is to go through all the active processes and check where their standard input and output are attached to. In order to do it, "w" readsthe list of files and directories under "/proc" (open - getdents - close). By the way, the number returned by the open() call is the file handle. By using it, you can follow the operations with this file since all read() or write() system calls will use the handle. After getting the list of files, "w" goes over all process directories and reads the virtual files from there to get more information about the processes. And fnally it displays the results by writing them to the file descriptor 1 (standard output, stdout). Very simple and informative.
There is a number of parameters that you can use with "strace" to filter the results and get exactly the information you are looking for. For example, if you are interested only in the network operations, lets see what does "ping" command do (run it as root user):
[root@localhost]$ strace -x -e trace=network,desc ping -c 1 127.0.0.1
...
socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(1025), sin_addr=
inet_addr("127.0.0.1")}, 16) = 0
getsockname(4, {sa_family=AF_INET, sin_port=htons(1028), sin_addr=
inet_addr("127.0.0.1")}, [16]) = 0
close(4) = 0
setsockopt(3, SOL_RAW, ICMP_FILTER, ~(ICMP_ECHOREPLY|ICMP_DEST_UNREACH|
ICMP_SOURCE_QUENCH|ICMP_REDIRECT|ICMP_TIME_EXCEEDED|
ICMP_PARAMETERPROB),
4) = 0
setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [27472947287556096], [4]) = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
setsockopt(3, SOL_SOCKET, SO_TIMESTAMP, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO, "\x01\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00"..., 16) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\x01\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00"..., 16) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff418a7c90) = -1
EINVAL (Invalid argument)
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\x08\x00
\xc7\x8b\xd6\x10\x00\x01\x86\x16\xa0\x46\x00\x00"..., 64}],
msg_controllen=0, msg_flags=0}, 0) = 64
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\x45
\x00\x00\x54\xed\xd5\x00\x00\x40\x01\x8e\xd1\x7f\x00"..., 192}],
msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET,
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 84
write(1, "PING 127.0.0.1 (127.0.0.1) 56(84"..., 106PING
127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.321 ms
) = 106
write(1, "\n", 1
) = 1
write(1, "--- 127.0.0.1 ping statistics --"..., 144---
127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.321/0.321/0.321/0.000 ms
) = 144
Process 4310 detached
"-e" option allows to specify what kind(s) of system calls you are interested in.
As you can see, this program opes 2 network sockets (with the descriptors 3 and 4). One of them is used only to do getsockname() call (do "man getsockname" to get more information on this call) and then it gets closed. The one associated with the descriptor 3 is used to send and receive raw IP traffic (this is the way to send the packets using the protocols different from UDP and TCP). It configures the socket using a number of options and then finally sends an IP packet (sendmsg) to the given destination. You can even see part of the message data ("-x" option requests strace to display the hex numbers instead of non-ASCII strings). If you want to see the complete data you may want to use "-s" option to set the maximum string length you want to be displayed. After sending the message it gets back the response (recvmsg). Then it displays the output of the program.
You can use "-o" option with the file name argument to redirect the output to a text file. Usually when analyzing more complicated examples it is easier to get the trace into a file first and then study this file using text editor or "grep" command.
Mac OS X (like all BSD systems) offers "ktrace" command. In order to get similar result when analyzing the "ping" command we would have to do something like this:
[root@localhost]$ ktrace -f ping.trace -t ci ping -c 1 127.0.0.1
[root@localhost]$ kdump -f ping.trace | less
...
684 ping CALL socket(0x2,0x3,0x1)
684 ping RET socket 3
684 ping CALL getuid
684 ping RET getuid 0
684 ping CALL setuid(0)
684 ping RET setuid 0
684 ping CALL getuid
684 ping RET getuid 0
684 ping CALL getpid
684 ping RET getpid 684/0x2ac
684 ping CALL setsockopt(0x3,0xffff,0x400,0xbffffdc8,0x4)
684 ping RET setsockopt 0
684 ping CALL setsockopt(0x3,0xffff,0x1002,0xbffffdb8,0x4)
684 ping RET setsockopt 0
684 ping CALL setsockopt(0x3,0xffff,0x1001,0xbffffdb8,0x4)
684 ping RET setsockopt 0
684 ping CALL fstat(0x1,0xbffef0c0)
684 ping RET fstat 0
684 ping CALL ioctl(0x1,FIODTYPE,0xbffef08c)
684 ping RET ioctl 0
...
684 ping CALL sendto(0x3,0x5294,0x40,0,0x5260,0x10)
684 ping GIO fd 3 wrote 64 bytes
"\b\08\^R\M-,\^B\0\0\M-T\^X\240F\M-'\M^H\r\0\b
\v\f\r\^N\^O\^P\^Q\^R\^S\^T\^U\^V\^W\^X\^Y\^Z
\^[\^\\^]\^^\^_ !"#$%&'()*+,-./01234567"
684 ping RET sendto 64/0x40
684 ping CALL select(0x4,0xbffffc3c,0,0,0xbffffd90)
684 ping RET select 1
684 ping CALL recvmsg(0x3,0xbffffd24,0)
684 ping GIO fd 3 wrote 84 bytes
"E\0@\0|\M^N\0\0@\^A\0\0\^?\0\0\^A\^?\0\0
\^A\0\0@\^R\M-,\^B\0\0\M-T\^X\240F\M-'\M^H\r\0\b
\v\f\r\^N\^O\^P\^Q\^R\^S\^T\^U\^V\^W\^X\^Y\^Z
\^[\^\\^]\^^\^_ !"#$%&'()*+,-./01234567"
684 ping RET recvmsg 84/0x54
684 ping CALL write(0x1,0x18000,0x39)
684 ping GIO fd 1 wrote 57 bytes
"64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.069 ms
"
684 ping RET write 57/0x39
684 ping CALL sigaction(0x2,0xbffef8c0,0xbffef8f8)
684 ping RET sigaction 0
684 ping CALL sigaction(0xe,0xbffef8c0,0xbffef8f8)
684 ping RET sigaction 0
684 ping CALL write(0x1,0x18000,0x1)
684 ping GIO fd 1 wrote 1 byte
"
"
684 ping RET write 1
684 ping CALL write(0x1,0x18000,0x22)
684 ping GIO fd 1 wrote 34 bytes
"--- 127.0.0.1 ping statistics ---
...
"ktrace" writes the tracing information into a file ("ktrace.out" if the "-f" option is not specified) and the "kdump" tool is used to decode the file into more human-readable format. As you can see the behavior of the ping program on OS X is similar to what we saw on Linux. ktrace/kdump also have some basic filtering capabilities.
Important thing to know about strace is that by default it does not follow the child processes. I.e. if you try to trace a program than spawns other processes, you will not see the operations performed by the child processes. You will see only the fork() calls. In order to "follow" the execution of all child processes you need to use "-f" option.
Tapping the wire
Another nice feature offered by strace (and ktrace on BSD systems) is the ability to attach to any process in the system for a limited time and capture its activities. Sometimes it may help to troubleshoot the network server processes a lot. In order to "attach" to a process you need to use "-p process_ID" option offered by strace. When you are done capturing the information it is enough just to terminate strace by pressing Ctrl+C. Lets try to see what happens when Sendmail accepts an incoming connection (in the given example the number 233 is the process ID of the sendmail process).
[root@localhost]$ strace -f -p 233
And on the second console do:
[root@localhost]$ telnet localhost 25
Trying 127.0.0.1...
Connected to oberon.
Escape character is '^]'.
220 oberon ESMTP Sendmail 8.11.6p2/8.11.6-20030304; Sat, 21 Jul 2007
06:00:02 +0400
quit
221 2.0.0 oberon closing connection
Connection closed by foreign host.
And the press Ctrl+C in the first console. Here is what you will see there (use "-o" option when running strace if you want to write the output to a file):
233 select(5, [4], NULL, NULL, {5, 0}) = 1 (in [4], left {1, 750000})
233 accept(4, {sin_family=AF_INET, sin_port=htons(1226), sin_addr=
inet_addr("127.0.0.1")}, [16]) = 5
233 time([1184983202]) = 1184983202
233 sigprocmask(SIG_BLOCK, [ALRM], []) = 0
233 pipe([6, 7]) = 0
233 sigprocmask(SIG_BLOCK, [CHLD], [ALRM]) = 0
233 fork() = 20953
20953 sigprocmask(SIG_UNBLOCK, [ALRM],
233 getpid(
20953 <... sigprocmask resumed> [ALRM CHLD]) = 0
233 <... getpid resumed> ) = 233
20953 sigprocmask(SIG_UNBLOCK, [CHLD],
233 sigprocmask(SIG_UNBLOCK, [CHLD],
20953 <... sigprocmask resumed> [CHLD]) = 0
233 <... sigprocmask resumed> [ALRM CHLD]) = 0
20953 sigaction(SIGCHLD, {SIG_DFL},
233 close(6
...
20953 open("/etc/hosts", O_RDONLY) = 6
20953 fstat(6, {st_mode=S_IFREG|0644, st_size=963, ...}) = 0
20953 mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0)
= 0x11a000
20953 read(6, "#\n# hosts\t\tThis file describe"..., 4096) = 963
20953 close(6) = 0
...
20953 write(3, "<22>Jul 21 06:00:03 sendmail[209"..., 143) = -1 EPIPE
(Broken pipe)
20953 --- SIGPIPE (Broken pipe) ---
20953 close(3) = 0
20953 close(-1) = -1 EBADF (Bad file number)
20953 sigaction(SIGPIPE, {SIG_IGN}, NULL) = 0
20953 sigprocmask(SIG_UNBLOCK, [ALRM], []) = 0
20953 sigprocmask(SIG_BLOCK, NULL, []) = 0
20953 time([1184983203]) = 1184983203
20953 unlink("./dfl6L202l20953") = -1 ENOENT (No such
file or directory)
20953 unlink("./qfl6L202l20953") = -1 ENOENT (No such
file or directory)
20953 unlink("./xfl6L202l20953") = -1 ENOENT (No such
file or directory)
20953 getpid() = 20953
20953 close(7) = 0
20953 setuid(0) = 0
20953 _exit(71) = ?
233 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted)
233 --- SIGCHLD (Child exited) ---
233 wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 71], WNOHANG, NULL)
= 20953
233 wait4(-1, 0xbfffdf60, WNOHANG, NULL) = 0
233 sigprocmask(SIG_BLOCK, [ALRM], []) = 0
233 time([1184983203]) = 1184983203
233 open("/proc/loadavg", O_RDONLY) = 5
233 fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
233 mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0)
= 0x118000
233 read(5, "0.16 0.09 0.29 3/53 21316\n", 1024) = 26
233 close(5) = 0
233 munmap(0x118000, 4096) = 0
233 sigprocmask(SIG_UNBLOCK, [ALRM], [ALRM]) = 0
233 getpid() = 233
233 select(5, [4], NULL, NULL, {5, 0}
As you can see, sendmail is initially sitting there listening for the incoming connections on the file descriptor 4 (you can use "lsof" to see that it is associated with a TCP socket) - see the second parameter of the select() call. Then the incoming connection comes in and sendmail spawns another process to handle this connection (fork() is not followed by an execve() call because the child process is also sendmail). Then the child process (PID=20953) handles the SMTP dialog and finally exits.
As you can see, it is very simple. Just to add a couple of important points: do not forget to use "-f" flag when troubleshooting the server processes because most of them spawn the child processes to handle the connections. You can also specify multiple "-p" options with different PIDs to monitor many processes at the same time.
I hope this tutorial was useful. The process tracing mechanisms available in the UNIX systems give most of the people confidence that they actually control their systems and there is nothing "mysterical" running on their computers.
refer:http://myhowto.org/solving-problems/7-exploring-system-internals-with-lsof-and-strace/
-----------------------------
For tracing the system calls of a program, we have a very good tool in strace. What is unique about strace is that, when it is run in conjunction with a program, it outputs all the calls made to the kernel by the program. In many cases, a program may fail because it is unable to open a file or because of insufficient memory. And tracing the output of the program will clearly show the cause of either problem.
The use of strace is quite simple and takes the following form:
$ strace
For example, I can run a trace on 'ls' as follows :
$ strace ls
And this will output a great amount of data on to the screen. If it is hard to keep track of the scrolling mass of data, then there is an option to write the output of strace to a file instead which is done using the -o option. For example,
$ strace -o strace_ls_output.txt ls
... will write all the tracing output of 'ls' to the 'strace_ls_output.txt' file. Now all it requires is to open the file in a text editor and analyze the output to get the necessary clues.
It is common to find a lot of system function calls in the strace output. The most common of them being open(),write(),read(),close() and so on. But the function calls are not limited to these four as you will find many others too.
For example, if you look in the strace output of ls, you will find the following line:
open("/lib/libselinux.so.1", O_RDONLY) = 3
This means that some aspect of ls requires the library module libselinux.so.1 to be present in the /lib folder. And if the library is missing or in a different path, then that aspect of ls which depends on this library will fail to function. The line of code signifies that the opening of the library libselinux.so.1 is successful.
Here I will share my experience in using strace to solve a particular problem I faced. I had installed all the multimedia codecs including the libdvdcss which allowed me to play encrypted DVDs in Ubuntu Linux which I use on a daily basis. But after installing all the necessary codecs, when I tried playing a DVD movie, totem gave me an error saying that it was unable to play the movie (see the picture below). But since I knew that I had already installed libdvdcss on my machine, I was at a loss what to do.
Fig: Totem showing error saying that it cannot find libdvdcss
Then I ran strace on totem as follows :
$ strace -o strace.totem totem
... and then opened the file strace.totem in a text editor and searched for the string libdvdcss . And not surprisingly I came across this line of output as shown in the listing below.
# Output of strace on totem
open("/etc/ld.so.cache", O_RDONLY) = 26
fstat64(26, {st_mode=S_IFREG|0644, st_size=58317, ...}) = 0
old_mmap(NULL, 58317, PROT_READ, MAP_PRIVATE, 26, 0) = 0xb645e000
close(26)
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
...
open("/lib/tls/i686/cmov/libdvdcss.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/lib/tls/i686/cmov", {st_mode=S_IFDIR|0755, st_size=1560, ...}) = 0
...
stat64("/lib/i486-linux-gnu", 0xbfab4770) = -1 ENOENT (No such file or directory)
munmap(0xb645e000, 58317) = 0
open("/usr/lib/xine/plugins/1.1.1/xineplug_inp_mms.so", O_RDONLY) = 26
read(26, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\27"..., 512) = 512
fstat64(26, {st_mode=S_IFREG|0644, st_size=40412, ...}) = 0
In the above listing which I have truncated for clarity, the line in bold clearly shows that totem is trying to find the library in, among other places, the '/lib/tls/i686/cmov/' directory and the return value of -1 shows that it has failed to find it. So I realized that for totem to correctly play the encrypted DVD, it has to find the libdvdcss.so.2 file in the path it is searching.
Then I used the find command to locate the library and then copy it to the directory /lib/tls/i686/cmov/. Once I accomplished this, I tried playing the DVD again in totem and it started playing without a hitch.
Fig: Totem playing an encrypted DVD Movie
Just to make sure, I took another trace of totem and it showed that the error was rectified as shown by the bold line of output below.
# Output of the second strace on totem
open("/etc/ld.so.cache", O_RDONLY) = 26
fstat64(26, {st_mode=S_IFREG|0644, st_size=58317, ...}) = 0
old_mmap(NULL, 58317, PROT_READ, MAP_PRIVATE, 26, 0) = 0xb644d000
close(26) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
...
open("/lib/tls/i686/cmov/libdvdcss.so.2", O_RDONLY) = 26
...
stat64("/lib/tls/i686/sse2", 0xbffa4020) = -1 ENOENT (No such file or directory)
munmap(0xb645e000, 58317) = 0
open("/usr/lib/xine/plugins/1.1.1/xineplug_inp_mms.so", O_RDONLY) = 26
read(26, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\20"..., 512) = 512
fstat64(26, {st_mode=S_IFREG|0644, st_size=28736, ...}) = 0
Opening the man page of strace, one will find scores of options. For example, if you use the option -t, then strace will prefix each line of the trace with the time of day. One can even specify the system call functions to trace using the -e option. For example, to trace only open() and close() function system calls, one can use the command as follows:
$ strace -o strace.totem -e trace=open,close totem
The ubiquitous strace should not be confused with DTrace that ships with Sun Solaris. strace is just a single tool which takes care of a small part which is tracing a single program. Where as Sun's DTrace toolkit is much more powerful and consists of a collection of scripts which can track, tune and aid the user in troubleshooting ones system in real time. More over, dtrace is a scripting language
with close semblance to C/C++ and awk. Put another way, strace tool in GNU/Linux provides only one of the many functions provided by DTrace in Sun Solaris. That being said, strace plays an important part in aiding the user to troubleshoot ones programs by providing a view of the system calls that the program makes to the Linux kernel.
refer:
http://linuxhelp.blogspot.com/2006/05/strace-very-powerful-troubleshooting.html
This section covers briefly the terminology that you need to familiarize yourself with in order to understand the results of the operations described later in the article. Do not worry, it is not more difficult when answering to Windows Vista UAC question ;)
System Call (a.k.a. "syscall")
System Call is a mechanism for the application to make a call to the operating system. The OS usually has a limited number of the "entry points" that can be used for the basic and fundamental tasks like file I/O, memory allocation, networking etc. For example, most of the UNIX systems have only 1-3 different system calls to open a file (although various libraries offer dozens of ways to deal with a file, they all finally make the same system call). The important thing about the system call that you need to understand is that there are few of them and they are the only way for any program to interact with the OS.
Process ID (PID)
Any process in the UNIX OSes is defined by its unique numeric PID, even if the process runs several threads. From the moment you start a program till the moment it is finished it maintains the same PID. However, some processes can span other processes (usually by using one of the fork() syscalls).
File Handle
Any I/O operations in UNIX is associated with a file handle which is a positive integer number. It applies to any file on the disk, a network socket etc. A typical operation with a file on the disk, for example, is one of the following:
1. open("filename") -> returns a handle that from this moment will be associated with this open file
2. read(handle, where, how_much)
3. write(handle, from_where, how_much)
4. close(handle) -> removes the association established with open()
Handles are unique only for an individual process. There are well-known handles available by default for any application, such as 0 for the standard input, 1 for standard output and 2 for the standard error stream.
Turn out your pockets, process!
We will start from a tool that can show us a snapshot of what is happening right now with our system. There is an excellent tool that is available for many UNIX systems called "lsof" ("LiSt Open Files"). You can find more information about this tool here. Most of the Linux distributions and Mac OS X have this tool included.
Here is what this tool can do:
* Show the list of files and other resources open by a selected process or by all processes in the system
* Show the list of processes using a particular resource (like a TCP socket, for example)
You can do various types of sorting, filtering and formatting for getting exactly the results you want. The program has about 3 dozens of command line options, we will try a couple of typical and most representative examples now. Note: most of these examples require root access to the system (or you will be limited to seeing only the information that is related to your processes).
How to see the list of files opened by a running process
For example, we would like to see which resources are currently used by Apache HTTP server. First, we need to identify the process we are interested in:
[root@localhost]$ ps ax | grep httpd | grep -v grep
2383 ? Ss 0:00 /usr/sbin/httpd
2408 ? S 0:00 /usr/sbin/httpd
2409 ? S 0:00 /usr/sbin/httpd
2410 ? S 0:00 /usr/sbin/httpd
2411 ? S 0:00 /usr/sbin/httpd
2412 ? S 0:00 /usr/sbin/httpd
2414 ? S 0:00 /usr/sbin/httpd
2415 ? S 0:00 /usr/sbin/httpd
2416 ? S 0:00 /usr/sbin/httpd
Hmm...which one I am interested in? Unfortunately, there is no universal recipe to answer this question. However...lets look at the problem from another perspective. At some point I have mentioned that when you start a program, the OS creates a process. And a process can spawn other processes. So, we can assume that there may be a tree of sub-processed related to one parent. Lets see how we can get this information (this command will work on both Linux and OS X):
[root@localhost]$ ps axo pid,ppid,command | egrep '(httpd)|(PID)' |
grep -v grep
2383 1 /usr/sbin/httpd
2408 2383 /usr/sbin/httpd
2409 2383 /usr/sbin/httpd
2410 2383 /usr/sbin/httpd
2411 2383 /usr/sbin/httpd
2412 2383 /usr/sbin/httpd
2414 2383 /usr/sbin/httpd
2415 2383 /usr/sbin/httpd
2416 2383 /usr/sbin/httpd
PPID is simply Parent PID, i.e. Parent Process ID. As you can see, most of these processes have the same PPID (2383) and there is one process with the PID of 2383. This must be the parent process. If you are curious to find out who is its parent, I will let you to do this exercise yourself.
Linux has another fancy tool called "pstree". We could use it to get the same kind of information:
[root@localhost]$ pstree -p | grep httpd
|-httpd(2383)-+-httpd(2408)
| |-httpd(2409)
| |-httpd(2410)
| |-httpd(2411)
| |-httpd(2412)
| |-httpd(2414)
| |-httpd(2415)
| `-httpd(2416)
Looks better and quite self-explaining, isn't it? OK, back to the original question. Now we know that there are certain relationships with these "httpd"s. So, our target will be the process 2383.
In its simples form, "lsof" can be called like this:
[root@localhost]$ lsof -p 2383
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
httpd 2383 root cwd DIR 3,6 1024 2 /
httpd 2383 root rtd DIR 3,6 1024 2 /
httpd 2383 root txt REG 3,8 328136 261136 /usr/sbin/httpd
httpd 2383 root mem REG 3,6 130448 8167 /lib64/ld-2.5.so
httpd 2383 root DEL REG 0,8 8072 /dev/zero
httpd 2383 root mem REG 3,6 615136 79579 /lib64/libm-2.5.so
httpd 2383 root mem REG 3,6 117656 8186 /lib64/libpcre.so.0.0.1
httpd 2383 root mem REG 3,6 95480 8213 /lib64/libselinux.so.1
...
httpd 2383 root mem REG 3,8 46160 1704830 /usr/lib64/
php/modules/ldap.so
httpd 2383 root DEL REG 0,8 8120 /dev/zero
httpd 2383 root DEL REG 0,8 8122 /dev/zero
httpd 2383 root 0r CHR 1,3 1488 /dev/null
httpd 2383 root 1w CHR 1,3 1488 /dev/null
httpd 2383 root 2w REG 3,12 1077 4210 /var/log/
httpd/error_log
httpd 2383 root 3r CHR 1,9 1922 /dev/urandom
httpd 2383 root 4u IPv6 8052 TCP *:http (LISTEN)
httpd 2383 root 5u sock 0,5 8053 can't identify protocol
httpd 2383 root 6u IPv6 8057 TCP *:https (LISTEN)
httpd 2383 root 7u sock 0,5 8058 can't identify protocol
httpd 2383 root 8r FIFO 0,6 8069 pipe
httpd 2383 root 9w FIFO 0,6 8069 pipe
httpd 2383 root 10w REG 3,12 1077 4210 /var/log/
httpd/error_log
httpd 2383 root 11w REG 3,12 711 4215 /var/log/
httpd/ssl_error_log
httpd 2383 root 12w REG 3,12 0 4138 /var/log/
httpd/access_log
httpd 2383 root 13w REG 3,12 0 4151 /var/log/
httpd/ssl_access_log
httpd 2383 root 14w REG 3,12 0 4152 /var/log/
httpd/ssl_request_log
Looks quite interesting, doesn't it? Now I am going to briefly explain the meaning of each kind of entry (you can get more information from the manual page for "lsof").
COMMAND
Contains first 9 characters of the UNIX command name associated with the process
PID
The process ID. In our example it is simply the PID we have asked the information for
USER
The UNIX user owning the process
FD
File Descriptor number. Actually, it is more than the descriptor I have mentioned before. There are special descriptors identifying various resources in the UNIX system. For more details I recommend to check the man page. As you can see, everything is there: the current working directory (cwd), the process image itself (txt), memory-mapped libraries etc. The most interesting for us are the open files and sockets: you can see the numbers 0..14 followed by a suffix ("w" for writing only, "r" for reading only, "u" for both reading and writing and others). For example, you can see that Apache httpd redirects all its error output (2w) to /var/log/https/error_log. It also has a bunch of other regular files open for writing (handles 10-14, various log files). You can also see that it is listening on two TCP sockets (the port numbers shown are resolved through /etc/services file, you can always use "-n" option to disable any name resolution - useful if you, for example, know the port number and want to look for it specifically). You can see two more sockets (do not be afraid of the message "can't identify...", it is just application-specific protocol), Apache is using them to talk to its sub-processes.
TYPE
Type of the node. There is several dozens of various node types, like "REG" for the regular files, "IPv4" (or "IPv6 for IP version 6) for the Internet sockets, "CHR" for the character devices, "DIR" for the directories etc.
DEVICE
Shows the UNIX device numbers (major, minor) associated with the resource. Another interesting parameter. For example, on my system the numbers 3,12 correspond to:
[root@localhost]$ ls -l /dev/* | egrep '3,\s*12'
brw-r----- 1 root disk 3, 12 Jul 17 19:17 /dev/hda12
[root@localhost]$ mount | grep hda12
/dev/hda12 on /var type ext3 (rw)
This is true, the files are locates on one of the partitions on the first IDE disk mounted at /var.
SIZE
The size of the file or the current offset (in bytes). This number does not make sense for some of the resource types so you see blank values there.
NODE
Node number of the file, protocol identifier (for Internet sockets) etc.
Name
The name of the file or resource, if applicable. The value is quite self-explaining and probably one of the most useful.
Try using this command for other processes in your system. Probably one of the most interesting ones to look at will be your X11 server.
There is a number of filters you can use to see only the information you are interested in. For example, I want to know does my iTunes talk to someone over the network or not. Here is a command I would use something like:
[root@localhost]$ lsof -p 1559 -a -i
Which produces empty answer (exactly what I was hoping for :) ). But if I try my SSH client instead of iTunes:
[root@localhost]$ lsof -p 1586 -a -i
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ssh 1586 nik 3u IPv4 0x40e7394 0t0 TCP macbook:50648
->oberon:ssh (ESTABLISHED)
Summary: you can get a snapshot of any live process using this command. Sometimes it is simpler to use "lsof" (especially on a system you are not familiar with) to find the location of the log file, config files for a service etc.
Who is using that file?
From time to time you may become very curious about who (or at least which process :) ) is using certain resource(s) on your system. There may be a number of reasons: a server that does not start complaining about the port that is already being used, a suspicious network connection that you see when using "netstat" command, inability to unmount a network disk or a external storage unit etc.
A traditional "fuser" command is not always good enough for these purposes. For example, if I want to see all the processes that have any open files under /etc directory. Here is the command that give me the answer:
[root@localhost]$ lsof +D /etc
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
dbus-daem 2206 dbus 5r DIR 3,6 1024 59500 /etc/dbus-1/system.d
acpid 2305 root 6r REG 3,6 236 63639 /etc/acpi/
events/video.conf
avahi-dae 2541 avahi cwd DIR 3,6 1024 63468 /etc/avahi
avahi-dae 2541 avahi rtd DIR 3,6 1024 63468 /etc/avahi
prefdm 2662 root 255r REG 3,6 1465 59322 /etc/X11/prefdm
000-delay 3831 root 255r REG 3,6 577 59446 /etc/cron.
daily/000-delay.cron
When you use "+D path" option, lsof checks every single file under the specified path and shows the information about the processes using it. This command may take some time to execute if you have a lot of files there.
Another situation: a network connection. "-i" option of lsof allows you to search for a specific network connection. The format is following: lsof -i [4|6][protocol][@hostname|hostaddr][:service|port]
"4" or "6" defines the IP protocol version. If you do not use any filters and just use "-i" you will get the list of all connections in your system. Lets find all the UDP connections that we have on a OS X box:
[root@localhost]$ lsof -i UDP
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
launchd 1 root 10u IPv4 0x393ed40 0t0 UDP *:netbios-ns
mDNSRespo 39 root 7u IPv4 0x393e790 0t0 UDP *:mdns
mDNSRespo 39 root 8u IPv6 0x393e6c0 0t0 UDP *:mdns
mDNSRespo 39 root 9u IPv4 0x41556c0 0t0 UDP *:mdns
mDNSRespo 39 root 11u IPv4 0x4155ee0 0t0 UDP *:mdns
mDNSRespo 39 root 12u IPv4 0x393dd00 0t0 UDP macbook:49727
netinfod 40 root 6u IPv4 0x393ee10 0t0 UDP localhost:
netinfo-local
syslogd 41 root 16u IPv4 0x393ec70 0t0 UDP *:49175
Directory 52 root 10u IPv4 0x3fbd1a0 0t0 UDP *:*
Directory 52 root 12u IPv4 0x4155d40 0t0 UDP *:*
ntpd 141 root 5u IPv4 0x393e520 0t0 UDP *:ntp
ntpd 141 root 6u IPv6 0x393e110 0t0 UDP *:ntp
ntpd 141 root 7u IPv6 0x393ead0 0t0 UDP localhost:ntp
ntpd 141 root 8u IPv6 0x393e860 0t0 UDP [fe80:1::1]:ntp
ntpd 141 root 9u IPv4 0x393e5f0 0t0 UDP localhost:ntp
ntpd 141 root 10u IPv6 0x393e450 0t0 UDP [fe80:5::216:
cbff:febf:a08d]:ntp
ntpd 141 root 11u IPv4 0x393e380 0t0 UDP macbook:ntp
prl_dhcpd 147 root 5u IPv4 0x393db60 0t0 UDP *:*
automount 168 root 8u IPv4 0x3fbd000 0t0 UDP localhost:1023
automount 179 root 8u IPv4 0x393e930 0t0 UDP localhost:1022
nmbd 1078 root 0u IPv4 0x393ed40 0t0 UDP *:netbios-ns
nmbd 1078 root 1u IPv4 0x393ed40 0t0 UDP *:netbios-ns
nmbd 1078 root 8u IPv4 0x393ea00 0t0 UDP *:netbios-dgm
nmbd 1078 root 9u IPv4 0x3fbd4e0 0t0 UDP 10.37.129.2:
netbios-ns
nmbd 1078 root 10u IPv4 0x3fbeba0 0t0 UDP 10.37.129.2:
netbios-dgm
nmbd 1078 root 11u IPv4 0x4155380 0t0 UDP macbook:
netbios-ns
nmbd 1078 root 12u IPv4 0x4155450 0t0 UDP macbook:
netbios-dgm
cupsd 1087 root 6u IPv4 0x3fbd340 0t0 UDP *:ipp
Do not be surprised if you do not see anything on your machine - you will not see the information about the processes that do not belong to you unless you are root.
Summary: using lsof with various filters allows you to search through your system and find out the detailed information about any local or network resource used.
Watch your steps
We have learned how to get a snapshot or your UNIX system (or any individual process living in it). But sometimes it is not enough to look at the static information. Dynamic runtime information may answer a lot of other questions - why does the program crash, why does it say "cannot open resource" without providing the resource location, is it accepting the network connection or not etc.
In the fisrt section it was mentioned that the way the UNIX program talks to the operating system (i.e. deals with almost any resource on your computer) is called "system calls". Most of the UNIX kernels, including the Linux and OS X provide a way for the user to see the list of the system calls a process makes in real time, including the parameters the process passes to the kernel. There are different tools on different systems that provide this functionality. On Linux it is called "strace", on BSD (including OS X) there are two toolss working together - "ktrace" and "kdump". We will briefly cover both of them starting from the Linux one.
The really great thing about these tools is that they use the kernel tracing facility in order to work. Thus, they work with ANY program that is running in your system, it does not have to be something you have developed and you do not need the source code of the program in order to understand (in general, of course) what it is doing.
It is important to mention again that just like in case of "lsof" you cannot see anything that you are not supposed to see, i.e. you cannot see what process is doing if it is not yours (or unless you are root user).
The simplest way to run the "strace" would be as: strace command Lets see what a simple well-known like "w" (the command that shows the users logged in and their current activities) does:
[root@localhost]$ strace /usr/bin/w 2>&1 | less
execve("/usr/bin/w", ["/usr/bin/w"], [/* 30 vars */]) = 0
brk(0) = 0x603000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2aaaaaaab000
uname({sys="Linux", node="oberon.anglab.com", ...}) = 0
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such
file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=103874, ...}) = 0
mmap(NULL, 103874, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aaaaaaac000
close(3) = 0
open("/lib64/libproc-3.2.7.so", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\
240;`\314"..., 832) = 832
...
open("/proc/uptime", O_RDONLY) = 3
lseek(3, 0, SEEK_SET) = 0
read(3, "7267.86 7230.07\n", 1023) = 16
access("/var/run/utmpx", F_OK) = -1 ENOENT (No such file
or directory)
open("/var/run/utmp", O_RDWR) = -1 EACCES (Permission denied)
open("/var/run/utmp", O_RDONLY) = 4
fcntl(4, F_GETFD) = 0
...
open("/proc", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
getdents(3, /* 35 entries */, 1024) = 1000
getdents(3, /* 38 entries */, 1024) = 1016
stat("/proc/1", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/1/stat", O_RDONLY) = 4
read(4, "1 (init) S 0 1 1 0 -1 4194560 64"..., 1023) = 216
close(4) = 0
socket(PF_FILE, SOCK_STREAM, 0) = 4
...
fcntl(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
alarm(0) = 1
rt_sigaction(SIGALRM, {SIG_DFL}, NULL, 8) = 0
close(5) = 0
write(1, " 21:31:48 up 2:01, 2 users, l"..., 281 21:31:48 up 2:01,
2 users, load average: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
nick pts/0 192.168.8.18 21:12 9.00s 0.03s 0.03s -bash
nick pts/1 192.168.8.18 21:19 0.00s 0.13s 0.04s strace
/usr/bin
) = 281
exit_group(0) = ?
Process 3947 detached
Only a small part of the long output is shown here. As you can see, everything starts with the "execve" syscall. Which is correct - this is the call that executes a program in UNIX. By the way, for each of these calls you can get more information from the manual pages - just type "man execve" to learn more about the call and its parameters (as you can see the parameters are shown in the trace).And, as it is supposed to be, the execution ends with "exit_group" call with the status 0 (success).
You can see how the program being started loads the shared libraries it depends on. You can see that at some point it tries to open "var/run/utmpx" file but it does not exist (ENOENT). Then it tries to open "/var/run/utmp" file but it is not readable for this program (i.e. for this user) so it gets back (EACCES). Finally this program decides that the only way to find out who is logged in is to go through all the active processes and check where their standard input and output are attached to. In order to do it, "w" readsthe list of files and directories under "/proc" (open - getdents - close). By the way, the number returned by the open() call is the file handle. By using it, you can follow the operations with this file since all read() or write() system calls will use the handle. After getting the list of files, "w" goes over all process directories and reads the virtual files from there to get more information about the processes. And fnally it displays the results by writing them to the file descriptor 1 (standard output, stdout). Very simple and informative.
There is a number of parameters that you can use with "strace" to filter the results and get exactly the information you are looking for. For example, if you are interested only in the network operations, lets see what does "ping" command do (run it as root user):
[root@localhost]$ strace -x -e trace=network,desc ping -c 1 127.0.0.1
...
socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(1025), sin_addr=
inet_addr("127.0.0.1")}, 16) = 0
getsockname(4, {sa_family=AF_INET, sin_port=htons(1028), sin_addr=
inet_addr("127.0.0.1")}, [16]) = 0
close(4) = 0
setsockopt(3, SOL_RAW, ICMP_FILTER, ~(ICMP_ECHOREPLY|ICMP_DEST_UNREACH|
ICMP_SOURCE_QUENCH|ICMP_REDIRECT|ICMP_TIME_EXCEEDED|
ICMP_PARAMETERPROB),
4) = 0
setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [27472947287556096], [4]) = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
setsockopt(3, SOL_SOCKET, SO_TIMESTAMP, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO, "\x01\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00"..., 16) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\x01\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00"..., 16) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff418a7c90) = -1
EINVAL (Invalid argument)
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\x08\x00
\xc7\x8b\xd6\x10\x00\x01\x86\x16\xa0\x46\x00\x00"..., 64}],
msg_controllen=0, msg_flags=0}, 0) = 64
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\x45
\x00\x00\x54\xed\xd5\x00\x00\x40\x01\x8e\xd1\x7f\x00"..., 192}],
msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET,
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 84
write(1, "PING 127.0.0.1 (127.0.0.1) 56(84"..., 106PING
127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.321 ms
) = 106
write(1, "\n", 1
) = 1
write(1, "--- 127.0.0.1 ping statistics --"..., 144---
127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.321/0.321/0.321/0.000 ms
) = 144
Process 4310 detached
"-e" option allows to specify what kind(s) of system calls you are interested in.
As you can see, this program opes 2 network sockets (with the descriptors 3 and 4). One of them is used only to do getsockname() call (do "man getsockname" to get more information on this call) and then it gets closed. The one associated with the descriptor 3 is used to send and receive raw IP traffic (this is the way to send the packets using the protocols different from UDP and TCP). It configures the socket using a number of options and then finally sends an IP packet (sendmsg) to the given destination. You can even see part of the message data ("-x" option requests strace to display the hex numbers instead of non-ASCII strings). If you want to see the complete data you may want to use "-s" option to set the maximum string length you want to be displayed. After sending the message it gets back the response (recvmsg). Then it displays the output of the program.
You can use "-o" option with the file name argument to redirect the output to a text file. Usually when analyzing more complicated examples it is easier to get the trace into a file first and then study this file using text editor or "grep" command.
Mac OS X (like all BSD systems) offers "ktrace" command. In order to get similar result when analyzing the "ping" command we would have to do something like this:
[root@localhost]$ ktrace -f ping.trace -t ci ping -c 1 127.0.0.1
[root@localhost]$ kdump -f ping.trace | less
...
684 ping CALL socket(0x2,0x3,0x1)
684 ping RET socket 3
684 ping CALL getuid
684 ping RET getuid 0
684 ping CALL setuid(0)
684 ping RET setuid 0
684 ping CALL getuid
684 ping RET getuid 0
684 ping CALL getpid
684 ping RET getpid 684/0x2ac
684 ping CALL setsockopt(0x3,0xffff,0x400,0xbffffdc8,0x4)
684 ping RET setsockopt 0
684 ping CALL setsockopt(0x3,0xffff,0x1002,0xbffffdb8,0x4)
684 ping RET setsockopt 0
684 ping CALL setsockopt(0x3,0xffff,0x1001,0xbffffdb8,0x4)
684 ping RET setsockopt 0
684 ping CALL fstat(0x1,0xbffef0c0)
684 ping RET fstat 0
684 ping CALL ioctl(0x1,FIODTYPE,0xbffef08c)
684 ping RET ioctl 0
...
684 ping CALL sendto(0x3,0x5294,0x40,0,0x5260,0x10)
684 ping GIO fd 3 wrote 64 bytes
"\b\08\^R\M-,\^B\0\0\M-T\^X\240F\M-'\M^H\r\0\b
\v\f\r\^N\^O\^P\^Q\^R\^S\^T\^U\^V\^W\^X\^Y\^Z
\^[\^\\^]\^^\^_ !"#$%&'()*+,-./01234567"
684 ping RET sendto 64/0x40
684 ping CALL select(0x4,0xbffffc3c,0,0,0xbffffd90)
684 ping RET select 1
684 ping CALL recvmsg(0x3,0xbffffd24,0)
684 ping GIO fd 3 wrote 84 bytes
"E\0@\0|\M^N\0\0@\^A\0\0\^?\0\0\^A\^?\0\0
\^A\0\0@\^R\M-,\^B\0\0\M-T\^X\240F\M-'\M^H\r\0\b
\v\f\r\^N\^O\^P\^Q\^R\^S\^T\^U\^V\^W\^X\^Y\^Z
\^[\^\\^]\^^\^_ !"#$%&'()*+,-./01234567"
684 ping RET recvmsg 84/0x54
684 ping CALL write(0x1,0x18000,0x39)
684 ping GIO fd 1 wrote 57 bytes
"64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.069 ms
"
684 ping RET write 57/0x39
684 ping CALL sigaction(0x2,0xbffef8c0,0xbffef8f8)
684 ping RET sigaction 0
684 ping CALL sigaction(0xe,0xbffef8c0,0xbffef8f8)
684 ping RET sigaction 0
684 ping CALL write(0x1,0x18000,0x1)
684 ping GIO fd 1 wrote 1 byte
"
"
684 ping RET write 1
684 ping CALL write(0x1,0x18000,0x22)
684 ping GIO fd 1 wrote 34 bytes
"--- 127.0.0.1 ping statistics ---
...
"ktrace" writes the tracing information into a file ("ktrace.out" if the "-f" option is not specified) and the "kdump" tool is used to decode the file into more human-readable format. As you can see the behavior of the ping program on OS X is similar to what we saw on Linux. ktrace/kdump also have some basic filtering capabilities.
Important thing to know about strace is that by default it does not follow the child processes. I.e. if you try to trace a program than spawns other processes, you will not see the operations performed by the child processes. You will see only the fork() calls. In order to "follow" the execution of all child processes you need to use "-f" option.
Tapping the wire
Another nice feature offered by strace (and ktrace on BSD systems) is the ability to attach to any process in the system for a limited time and capture its activities. Sometimes it may help to troubleshoot the network server processes a lot. In order to "attach" to a process you need to use "-p process_ID" option offered by strace. When you are done capturing the information it is enough just to terminate strace by pressing Ctrl+C. Lets try to see what happens when Sendmail accepts an incoming connection (in the given example the number 233 is the process ID of the sendmail process).
[root@localhost]$ strace -f -p 233
And on the second console do:
[root@localhost]$ telnet localhost 25
Trying 127.0.0.1...
Connected to oberon.
Escape character is '^]'.
220 oberon ESMTP Sendmail 8.11.6p2/8.11.6-20030304; Sat, 21 Jul 2007
06:00:02 +0400
quit
221 2.0.0 oberon closing connection
Connection closed by foreign host.
And the press Ctrl+C in the first console. Here is what you will see there (use "-o" option when running strace if you want to write the output to a file):
233 select(5, [4], NULL, NULL, {5, 0}) = 1 (in [4], left {1, 750000})
233 accept(4, {sin_family=AF_INET, sin_port=htons(1226), sin_addr=
inet_addr("127.0.0.1")}, [16]) = 5
233 time([1184983202]) = 1184983202
233 sigprocmask(SIG_BLOCK, [ALRM], []) = 0
233 pipe([6, 7]) = 0
233 sigprocmask(SIG_BLOCK, [CHLD], [ALRM]) = 0
233 fork() = 20953
20953 sigprocmask(SIG_UNBLOCK, [ALRM],
233 getpid(
20953 <... sigprocmask resumed> [ALRM CHLD]) = 0
233 <... getpid resumed> ) = 233
20953 sigprocmask(SIG_UNBLOCK, [CHLD],
233 sigprocmask(SIG_UNBLOCK, [CHLD],
20953 <... sigprocmask resumed> [CHLD]) = 0
233 <... sigprocmask resumed> [ALRM CHLD]) = 0
20953 sigaction(SIGCHLD, {SIG_DFL},
233 close(6
...
20953 open("/etc/hosts", O_RDONLY) = 6
20953 fstat(6, {st_mode=S_IFREG|0644, st_size=963, ...}) = 0
20953 mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0)
= 0x11a000
20953 read(6, "#\n# hosts\t\tThis file describe"..., 4096) = 963
20953 close(6) = 0
...
20953 write(3, "<22>Jul 21 06:00:03 sendmail[209"..., 143) = -1 EPIPE
(Broken pipe)
20953 --- SIGPIPE (Broken pipe) ---
20953 close(3) = 0
20953 close(-1) = -1 EBADF (Bad file number)
20953 sigaction(SIGPIPE, {SIG_IGN}, NULL) = 0
20953 sigprocmask(SIG_UNBLOCK, [ALRM], []) = 0
20953 sigprocmask(SIG_BLOCK, NULL, []) = 0
20953 time([1184983203]) = 1184983203
20953 unlink("./dfl6L202l20953") = -1 ENOENT (No such
file or directory)
20953 unlink("./qfl6L202l20953") = -1 ENOENT (No such
file or directory)
20953 unlink("./xfl6L202l20953") = -1 ENOENT (No such
file or directory)
20953 getpid() = 20953
20953 close(7) = 0
20953 setuid(0) = 0
20953 _exit(71) = ?
233 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted)
233 --- SIGCHLD (Child exited) ---
233 wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 71], WNOHANG, NULL)
= 20953
233 wait4(-1, 0xbfffdf60, WNOHANG, NULL) = 0
233 sigprocmask(SIG_BLOCK, [ALRM], []) = 0
233 time([1184983203]) = 1184983203
233 open("/proc/loadavg", O_RDONLY) = 5
233 fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
233 mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0)
= 0x118000
233 read(5, "0.16 0.09 0.29 3/53 21316\n", 1024) = 26
233 close(5) = 0
233 munmap(0x118000, 4096) = 0
233 sigprocmask(SIG_UNBLOCK, [ALRM], [ALRM]) = 0
233 getpid() = 233
233 select(5, [4], NULL, NULL, {5, 0}
As you can see, sendmail is initially sitting there listening for the incoming connections on the file descriptor 4 (you can use "lsof" to see that it is associated with a TCP socket) - see the second parameter of the select() call. Then the incoming connection comes in and sendmail spawns another process to handle this connection (fork() is not followed by an execve() call because the child process is also sendmail). Then the child process (PID=20953) handles the SMTP dialog and finally exits.
As you can see, it is very simple. Just to add a couple of important points: do not forget to use "-f" flag when troubleshooting the server processes because most of them spawn the child processes to handle the connections. You can also specify multiple "-p" options with different PIDs to monitor many processes at the same time.
I hope this tutorial was useful. The process tracing mechanisms available in the UNIX systems give most of the people confidence that they actually control their systems and there is nothing "mysterical" running on their computers.
refer:http://myhowto.org/solving-problems/7-exploring-system-internals-with-lsof-and-strace/
-----------------------------
For tracing the system calls of a program, we have a very good tool in strace. What is unique about strace is that, when it is run in conjunction with a program, it outputs all the calls made to the kernel by the program. In many cases, a program may fail because it is unable to open a file or because of insufficient memory. And tracing the output of the program will clearly show the cause of either problem.
The use of strace is quite simple and takes the following form:
$ strace
For example, I can run a trace on 'ls' as follows :
$ strace ls
And this will output a great amount of data on to the screen. If it is hard to keep track of the scrolling mass of data, then there is an option to write the output of strace to a file instead which is done using the -o option. For example,
$ strace -o strace_ls_output.txt ls
... will write all the tracing output of 'ls' to the 'strace_ls_output.txt' file. Now all it requires is to open the file in a text editor and analyze the output to get the necessary clues.
It is common to find a lot of system function calls in the strace output. The most common of them being open(),write(),read(),close() and so on. But the function calls are not limited to these four as you will find many others too.
For example, if you look in the strace output of ls, you will find the following line:
open("/lib/libselinux.so.1", O_RDONLY) = 3
This means that some aspect of ls requires the library module libselinux.so.1 to be present in the /lib folder. And if the library is missing or in a different path, then that aspect of ls which depends on this library will fail to function. The line of code signifies that the opening of the library libselinux.so.1 is successful.
Here I will share my experience in using strace to solve a particular problem I faced. I had installed all the multimedia codecs including the libdvdcss which allowed me to play encrypted DVDs in Ubuntu Linux which I use on a daily basis. But after installing all the necessary codecs, when I tried playing a DVD movie, totem gave me an error saying that it was unable to play the movie (see the picture below). But since I knew that I had already installed libdvdcss on my machine, I was at a loss what to do.
Fig: Totem showing error saying that it cannot find libdvdcss
Then I ran strace on totem as follows :
$ strace -o strace.totem totem
... and then opened the file strace.totem in a text editor and searched for the string libdvdcss . And not surprisingly I came across this line of output as shown in the listing below.
# Output of strace on totem
open("/etc/ld.so.cache", O_RDONLY) = 26
fstat64(26, {st_mode=S_IFREG|0644, st_size=58317, ...}) = 0
old_mmap(NULL, 58317, PROT_READ, MAP_PRIVATE, 26, 0) = 0xb645e000
close(26)
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
...
open("/lib/tls/i686/cmov/libdvdcss.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/lib/tls/i686/cmov", {st_mode=S_IFDIR|0755, st_size=1560, ...}) = 0
...
stat64("/lib/i486-linux-gnu", 0xbfab4770) = -1 ENOENT (No such file or directory)
munmap(0xb645e000, 58317) = 0
open("/usr/lib/xine/plugins/1.1.1/xineplug_inp_mms.so", O_RDONLY) = 26
read(26, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\27"..., 512) = 512
fstat64(26, {st_mode=S_IFREG|0644, st_size=40412, ...}) = 0
In the above listing which I have truncated for clarity, the line in bold clearly shows that totem is trying to find the library in, among other places, the '/lib/tls/i686/cmov/' directory and the return value of -1 shows that it has failed to find it. So I realized that for totem to correctly play the encrypted DVD, it has to find the libdvdcss.so.2 file in the path it is searching.
Then I used the find command to locate the library and then copy it to the directory /lib/tls/i686/cmov/. Once I accomplished this, I tried playing the DVD again in totem and it started playing without a hitch.
Fig: Totem playing an encrypted DVD Movie
Just to make sure, I took another trace of totem and it showed that the error was rectified as shown by the bold line of output below.
# Output of the second strace on totem
open("/etc/ld.so.cache", O_RDONLY) = 26
fstat64(26, {st_mode=S_IFREG|0644, st_size=58317, ...}) = 0
old_mmap(NULL, 58317, PROT_READ, MAP_PRIVATE, 26, 0) = 0xb644d000
close(26) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
...
open("/lib/tls/i686/cmov/libdvdcss.so.2", O_RDONLY) = 26
...
stat64("/lib/tls/i686/sse2", 0xbffa4020) = -1 ENOENT (No such file or directory)
munmap(0xb645e000, 58317) = 0
open("/usr/lib/xine/plugins/1.1.1/xineplug_inp_mms.so", O_RDONLY) = 26
read(26, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\20"..., 512) = 512
fstat64(26, {st_mode=S_IFREG|0644, st_size=28736, ...}) = 0
Opening the man page of strace, one will find scores of options. For example, if you use the option -t, then strace will prefix each line of the trace with the time of day. One can even specify the system call functions to trace using the -e option. For example, to trace only open() and close() function system calls, one can use the command as follows:
$ strace -o strace.totem -e trace=open,close totem
The ubiquitous strace should not be confused with DTrace that ships with Sun Solaris. strace is just a single tool which takes care of a small part which is tracing a single program. Where as Sun's DTrace toolkit is much more powerful and consists of a collection of scripts which can track, tune and aid the user in troubleshooting ones system in real time. More over, dtrace is a scripting language
with close semblance to C/C++ and awk. Put another way, strace tool in GNU/Linux provides only one of the many functions provided by DTrace in Sun Solaris. That being said, strace plays an important part in aiding the user to troubleshoot ones programs by providing a view of the system calls that the program makes to the Linux kernel.
refer:
http://linuxhelp.blogspot.com/2006/05/strace-very-powerful-troubleshooting.html
Monday, December 28, 2009
cpanel tutorial: how to increase mail send limit for domain
You can change the maximum number of emails allowed for a particular domain to
a different number than the system default using the
file /var/cpanel/maxemails.
Just add an entry like ‘domain.com = 100″. Now 100 is the maximum email per hour limit for domain.com.
But please make sure that you have executed the following script after updating the file /var/cpanel/maxemails.
/scripts/build_maxemails_config
refer:
http://blog.webhostinghelps.net/?cat=4
http://blog.webhostinghelps.net/?cat=26
http://blog.webhostinghelps.net/?cat=6
http://blog.webhostinghelps.net/?cat=86
http://blog.webhostinghelps.net/?cat=1
a different number than the system default using the
file /var/cpanel/maxemails.
Just add an entry like ‘domain.com = 100″. Now 100 is the maximum email per hour limit for domain.com.
But please make sure that you have executed the following script after updating the file /var/cpanel/maxemails.
/scripts/build_maxemails_config
refer:
http://blog.webhostinghelps.net/?cat=4
http://blog.webhostinghelps.net/?cat=26
http://blog.webhostinghelps.net/?cat=6
http://blog.webhostinghelps.net/?cat=86
http://blog.webhostinghelps.net/?cat=1
kernel boot time paramenters
The Linux kernel accepts boot time parameters as it starts to boot system. This is used to inform kernel about various hardware parameter. You need boot time parameters:
* Troubleshoot system
* Hardware parameters that the kernel would not able to determine on its own
* Force kernel to override the default hardware parameters in order to increase performance
* Password and other recovery operations
The kernel command line syntax
name=value1,value2,value3…
Where,
* name : Keyword name, for example, init, ro, boot etc
Ten common Boot time parameters
init
This sets the initial command to be executed by the kernel. Default is to use /sbin/init, which is the parent of all processes.
To boot system without password pass /bin/bash or /bin/sh as argument to init
init=/bin/bash
single
The most common argument that is passed to the init process is the word 'single' which instructs init to boot the computer in single user mode, and not launch all the usual daemons
root=/dev/device
This argument tells the kernel what device (hard disk, floppy disk) to be used as the root filesystem while booting. For example following boot parameter use /dev/sda1 as the root file system:
root=/dev/sda1
If you copy entire partition from /dev/sda1 to /dev/sdb1 then use
root=/dev/sdb1
ro
This argument tells the kernel to mount root file system as read-only. This is done so that fsck program can check and repair a Linux file system. Please note that you should never ever run fsck on read/write file system.
rw
This argument tells the kernel to mount root file system as read and write mode.
panic=SECOND
Specify kernel behavior on panic. By default, the kernel will not reboot after a panic, but this option will cause a kernel reboot after N seconds. For example following boot parameter will force to reboot Linux after 10 seconds
panic=10
maxcpus=NUMBER
Specify maximum number of processors that an SMP kernel should make use of. For example if you have four cpus and would like to use 2 CPU then pass 2 as a number to maxcpus (useful to test different software performances and configurations).
maxcpus=2
debug
Enable kernel debugging. This option is useful for kernel hackers and developers who wish to troubleshoot problem
selinux [0|1]
Disable or enable SELinux at boot time.
* Value 0 : Disable selinux
* Value 1 : Enable selinux
raid=/dev/mdN
This argument tells kernel howto assembly of RAID arrays at boot time. Please note that When md is compiled into the kernel (not as module), partitions of type 0xfd are scanned and automatically assembled into RAID arrays. This autodetection may be suppressed with the kernel parameter "raid=noautodetect". As of kernel 2.6.9, only drives with a type 0 superblock can be autodetected and run at boot time.
mem=MEMEORY_SIZE
This is a classic parameter. Force usage of a specific amount of memory to be used when the kernel is not able to see the whole system memory or for test. For example:
mem=1024M
The kernel command line is a null-terminated string currently up to 255 characters long, plus the final null. A string that is too long will be automatically truncated by the kernel, a boot loader may allow a longer command line to be passed to permit future kernels to extend this limit (H. Peter Anvin ).
Other parameters
initrd /boot/initrd.img
An initrd should be loaded. the boot process will load the kernel and an initial ramdisk; then the kernel converts initrd into a "normal" ramdisk, which is mounted read-write as root device; then /linuxrc is executed; afterwards the "real" root file system is mounted, and the initrd file system is moved over to /initrd; finally the usual boot sequence (e.g. invocation of /sbin/init) is performed. initrd is used to provide/load additional modules (device driver). For example, SCSI or RAID device driver loaded using initrd.
hdX =noprobe
Do not probe for hdX drive. For example, disable hdb hard disk:
hdb=noprobe
If you disable hdb in BIOS, Linux will still detect it. This is the only way to disable hdb.
ether=irq,iobase,[ARG1,ARG2],name
Where,
* ether: ETHERNET DEVICES
For example, following boot argument force probing for a second Ethernet card (NIC), as the default is to only probe for one (irq=0,iobase=0 means automatically detect them).
ether=0,0,eth1
How to begin the enter parameters mode?
You need to enter all this parameter at Grub or Lilo boot prompt. For example if you are using Grub as a boot loader, at Grub prompt press 'e' to edit command before booting.
1) Select second line
2) Again, press 'e' to edit selected command
3) Type any of above parameters.
refer:
http://www.cyberciti.biz/tips/linux-limiting-or-restricting-smp-cpu-activation-in-smp-mode.html
http://www.cyberciti.biz/tips/10-boot-time-parameters-you-should-know-about-the-linux-kernel.html
* Troubleshoot system
* Hardware parameters that the kernel would not able to determine on its own
* Force kernel to override the default hardware parameters in order to increase performance
* Password and other recovery operations
The kernel command line syntax
name=value1,value2,value3…
Where,
* name : Keyword name, for example, init, ro, boot etc
Ten common Boot time parameters
init
This sets the initial command to be executed by the kernel. Default is to use /sbin/init, which is the parent of all processes.
To boot system without password pass /bin/bash or /bin/sh as argument to init
init=/bin/bash
single
The most common argument that is passed to the init process is the word 'single' which instructs init to boot the computer in single user mode, and not launch all the usual daemons
root=/dev/device
This argument tells the kernel what device (hard disk, floppy disk) to be used as the root filesystem while booting. For example following boot parameter use /dev/sda1 as the root file system:
root=/dev/sda1
If you copy entire partition from /dev/sda1 to /dev/sdb1 then use
root=/dev/sdb1
ro
This argument tells the kernel to mount root file system as read-only. This is done so that fsck program can check and repair a Linux file system. Please note that you should never ever run fsck on read/write file system.
rw
This argument tells the kernel to mount root file system as read and write mode.
panic=SECOND
Specify kernel behavior on panic. By default, the kernel will not reboot after a panic, but this option will cause a kernel reboot after N seconds. For example following boot parameter will force to reboot Linux after 10 seconds
panic=10
maxcpus=NUMBER
Specify maximum number of processors that an SMP kernel should make use of. For example if you have four cpus and would like to use 2 CPU then pass 2 as a number to maxcpus (useful to test different software performances and configurations).
maxcpus=2
debug
Enable kernel debugging. This option is useful for kernel hackers and developers who wish to troubleshoot problem
selinux [0|1]
Disable or enable SELinux at boot time.
* Value 0 : Disable selinux
* Value 1 : Enable selinux
raid=/dev/mdN
This argument tells kernel howto assembly of RAID arrays at boot time. Please note that When md is compiled into the kernel (not as module), partitions of type 0xfd are scanned and automatically assembled into RAID arrays. This autodetection may be suppressed with the kernel parameter "raid=noautodetect". As of kernel 2.6.9, only drives with a type 0 superblock can be autodetected and run at boot time.
mem=MEMEORY_SIZE
This is a classic parameter. Force usage of a specific amount of memory to be used when the kernel is not able to see the whole system memory or for test. For example:
mem=1024M
The kernel command line is a null-terminated string currently up to 255 characters long, plus the final null. A string that is too long will be automatically truncated by the kernel, a boot loader may allow a longer command line to be passed to permit future kernels to extend this limit (H. Peter Anvin ).
Other parameters
initrd /boot/initrd.img
An initrd should be loaded. the boot process will load the kernel and an initial ramdisk; then the kernel converts initrd into a "normal" ramdisk, which is mounted read-write as root device; then /linuxrc is executed; afterwards the "real" root file system is mounted, and the initrd file system is moved over to /initrd; finally the usual boot sequence (e.g. invocation of /sbin/init) is performed. initrd is used to provide/load additional modules (device driver). For example, SCSI or RAID device driver loaded using initrd.
hdX =noprobe
Do not probe for hdX drive. For example, disable hdb hard disk:
hdb=noprobe
If you disable hdb in BIOS, Linux will still detect it. This is the only way to disable hdb.
ether=irq,iobase,[ARG1,ARG2],name
Where,
* ether: ETHERNET DEVICES
For example, following boot argument force probing for a second Ethernet card (NIC), as the default is to only probe for one (irq=0,iobase=0 means automatically detect them).
ether=0,0,eth1
How to begin the enter parameters mode?
You need to enter all this parameter at Grub or Lilo boot prompt. For example if you are using Grub as a boot loader, at Grub prompt press 'e' to edit command before booting.
1) Select second line
2) Again, press 'e' to edit selected command
3) Type any of above parameters.
refer:
http://www.cyberciti.biz/tips/linux-limiting-or-restricting-smp-cpu-activation-in-smp-mode.html
http://www.cyberciti.biz/tips/10-boot-time-parameters-you-should-know-about-the-linux-kernel.html
compression in smp+auditing
PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 or newer (ie: anything compressed with pbzip2 can be decompressed with bzip2). PBZIP2 should work on any system that has a pthreads compatible C++ compiler (such as gcc). It has been tested on: Linux, Windows (cygwin & MinGW), Solaris, Tru64/OSF1, HP-UX, OS/2, and Irix.
NOTE: If you are looking for a parallel BZIP2 that works on cluster machines, you should check out MPIBZIP2 which was designed for a distributed-memory message-passing architecture.
The pbzip2 program is a parallel version of bzip2 for use on shared memory machines. It provides near-linear speedup when used on true multi-processor machines and 5-10% speedup on Hyperthreaded machines. The output is fully compatible with the regular bzip2 data so any files created with pbzip2 can be uncompressed by bzip2 and vice-versa.
The default settings for pbzip2 will work well in most cases. The only switch you will likely need to use is -d to decompress files and -p to set the # of processors for pbzip2 to use if autodetect is not supported on your system, or you want to use a specific # of CPUs.
Example 1: pbzip2 -v myfile.tar
This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with the default file block size of 900k and default BWT block size of 900k.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 2
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar
Output Name: myfile.tar.bz2
Input Size: 7428687 bytes
Compressing data...
Output Size: 3236549 bytes
-------------------------------------------
Wall Clock: 2.809000 seconds
===================================================================
Example 2: pbzip2 -b15vk myfile.tar
This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with a file block size of 1500k and a BWT block size of 900k. The file "myfile.tar" will not be deleted after compression is finished.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 2
BWT Block Size: 900k
File Block Size: 1500k
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar
Output Name: myfile.tar.bz2
Input Size: 7428687 bytes
Compressing data...
Output Size: 3236394 bytes
-------------------------------------------
Wall Clock: 3.059000 seconds
===================================================================
Example 3: pbzip2 -p4 -r -5 -v myfile.tar second*.txt
This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use 4 processors with a BWT block size of 500k. The file block size will be the size of "myfile.tar" divided by 4 (# of processors) so that the data will be split evenly among each processor. This requires you have enough RAM for pbzip2 to read the entire file into memory for compression. Pbzip2 will then use the same options to compress all other files that match the wildcard "second*.txt" in that directory.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 4
BWT Block Size: 500k
File Block Size: 1857k
-------------------------------------------
File #: 1 of 3
Input Name: myfile.tar
Output Name: myfile.tar.bz2
Input Size: 7428687 bytes
Compressing data...
Output Size: 3237105 bytes
-------------------------------------------
File #: 2 of 3
Input Name: secondfile.txt
Output Name: secondfile.txt.bz2
Input Size: 5897 bytes
Compressing data...
Output Size: 3192 bytes
-------------------------------------------
File #: 3 of 3
Input Name: secondbreakfast.txt
Output Name: secondbreakfast.txt.bz2
Input Size: 83531 bytes
Compressing data...
Output Size: 11832 bytes
-------------------------------------------
Wall Clock: 5.127381 seconds
===================================================================
Example 4: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example 4: tar -c directory_to_compress/ | pbzip2 -vc > myfile.tar.bz2
This example will compress the data being given to pbzip2 via pipe from TAR into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with the default file block size of 900k and default BWT block size of 900k. TAR is collecting all of the files from the "directory_to_compress/" directory and passing the data to pbzip2 as it works.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 2
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name:
Output Name:
Compressing data...
-------------------------------------------
Wall Clock: 0.176441 seconds
===================================================================
Example 5: pbzip2 -dv myfile.tar.bz2
This example will decompress the file "myfile.tar.bz2" into the decompressed file "myfile.tar". It will use the autodetected # of processors (or 2 processors if autodetect not supported). The switches -b, -r, and -1..-9 are not valid for decompression.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 2
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar.bz2
Output Name: myfile.tar
BWT Block Size: 900k
Input Size: 3236549 bytes
Decompressing data...
Output Size: 7428687 bytes
-------------------------------------------
Wall Clock: 1.154000 seconds
refer:
http://compression.ca/pbzip2/
----------------------------------------
Linux Setting processor affinity for a certain task or process
by nixcraft · 25 comments
When you are using SMP (Symmetric MultiProcessing) you might want to override the kernel's process scheduling and bind a certain process to a specific CPU(s).
But what is CPU affinity?
CPU affinity is nothing but a scheduler property that "bonds" a process to a given set of CPUs on the SMP system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity:
The scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications. For example, application such as Oracle (ERP apps) use # of cpus per instance licensed. You can bound Oracle to specific CPU to avoid license problem. This is a really useful on large server having 4 or 8 CPUS
Setting processor affinity for a certain task or process using taskset command
taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. However taskset is not installed by default. You need to install schedutils (Linux scheduler utilities) package.
Install schedutils
Debian Linux:
# apt-get install schedutils
Red Hat Enterprise Linux:
# up2date schedutils
OR
# rpm -ivh schedutils*
Under latest version of Debian / Ubuntu Linux taskset is installed by default using util-linux package.
The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. For example:
* 0x00000001 is processor #0 (1st processor)
* 0x00000003 is processors #0 and #1
* 0x00000004 is processors #2 (3rd processor)
To set the processor affinity of process 13545 to processor #0 (1st processor) type following command:
# taskset 0x00000001 -p 13545
If you find a bitmask hard to use, then you can specify a numerical list of processors instead of a bitmask using -c flag:
# taskset -c 1 -p 13545
# taskset -c 3,4 -p 13545
Where,
* -p : Operate on an existing PID and not launch a new task (default is to launch a new task)
---------------------------
Linux audit files to see who made changes to a file
by Vivek Gite · 24 comments
This is one of the key questions many new sys admin ask:
How do I audit file events such as read / write etc? How can I use audit to see who changed a file in Linux?
The answer is to use 2.6 kernel’s audit system. Modern Linux kernel (2.6.x) comes with auditd daemon. It’s responsible for writing audit records to the disk. During startup, the rules in /etc/audit.rules are read by this daemon. You can open /etc/audit.rules file and make changes such as setup audit file log location and other option. The default file is good enough to get started with auditd.
In order to use audit facility you need to use following utilities
=> auditctl - a command to assist controlling the kernel’s audit system. You can get status, and add or delete rules into kernel audit system. Setting a watch on a file is accomplished using this command:
=> ausearch - a command that can query the audit daemon logs based for events based on different search criteria.
=> aureport - a tool that produces summary reports of the audit system logs.
Note that following all instructions are tested on CentOS 4.x and Fedora Core and RHEL 4/5 Linux.
Task: install audit package
The audit package contains the user space utilities for storing and searching the audit records generate by the audit subsystem in the Linux 2.6 kernel. CentOS/Red Hat and Fedora core includes audit rpm package. Use yum or up2date command to install package
# yum install audit
or
# up2date install audit
Auto start auditd service on boot
# ntsysv
OR
# chkconfig auditd on
Now start service:
# /etc/init.d/auditd start
How do I set a watch on a file for auditing?
Let us say you would like to audit a /etc/passwd file. You need to type command as follows:
# auditctl -w /etc/passwd -p war -k password-file
Where,
* -w /etc/passwd : Insert a watch for the file system object at given path i.e. watch file called /etc/passwd
* -p war : Set permissions filter for a file system watch. It can be r for read, w for write, x for execute, a for append.
* -k password-file : Set a filter key on a /etc/passwd file (watch). The password-file is a filterkey (string of text that can be up to 31 bytes long). It can uniquely identify the audit records produced by the watch. You need to use password-file string or phrase while searching audit logs.
In short you are monitoring (read as watching) a /etc/passwd file for anyone (including syscall) that may perform a write, append or read operation on a file.
Wait for some time or as a normal user run command as follows:
$ grep 'something' /etc/passwd
$ vi /etc/passwd
Following are more examples:
File System audit rules
Add a watch on "/etc/shadow" with the arbitrary filterkey "shadow-file" that generates records for "reads, writes, executes, and appends" on "shadow"
# auditctl -w /etc/shadow -k shadow-file -p rwxa
syscall audit rule
The next rule suppresses auditing for mount syscall exits
# auditctl -a exit,never -S mount
File system audit rule
Add a watch "tmp" with a NULL filterkey that generates records "executes" on "/tmp" (good for a webserver)
# auditctl -w /tmp -p e -k webserver-watch-tmp
syscall audit rule using pid
To see all syscalls made by a program called sshd (pid - 1005):
# auditctl -a entry,always -S all -F pid=1005
How do I find out who changed or accessed a file /etc/passwd?
Use ausearch command as follows:
# ausearch -f /etc/passwd
OR
# ausearch -f /etc/passwd | less
OR
# ausearch -f /etc/passwd -i | less
Where,
* -f /etc/passwd : Only search for this file
* -i : Interpret numeric entities into text. For example, uid is converted to account name.
Output:
----
type=PATH msg=audit(03/16/2007 14:52:59.985:55) : name=/etc/passwd flags=follow,open inode=23087346 dev=08:02 mode=file,644 ouid=root ogid=root rdev=00:00
type=CWD msg=audit(03/16/2007 14:52:59.985:55) : cwd=/webroot/home/lighttpd
type=FS_INODE msg=audit(03/16/2007 14:52:59.985:55) : inode=23087346 inode_uid=root inode_gid=root inode_dev=08:02 inode_rdev=00:00
type=FS_WATCH msg=audit(03/16/2007 14:52:59.985:55) : watch_inode=23087346 watch=passwd filterkey=password-file perm=read,write,append perm_mask=read
type=SYSCALL msg=audit(03/16/2007 14:52:59.985:55) : arch=x86_64 syscall=open success=yes exit=3 a0=7fbffffcb4 a1=0 a2=2 a3=6171d0 items=1 pid=12551 auid=unknown(4294967295) uid=lighttpd gid=lighttpd euid=lighttpd suid=lighttpd fsuid=lighttpd egid=lighttpd sgid=lighttpd fsgid=lighttpd comm=grep exe=/bin/grep
Let us try to understand output
* audit(03/16/2007 14:52:59.985:55) : Audit log time
* uid=lighttpd gid=lighttpd : User ids in numerical format. By passing -i option to command you can convert most of numeric data to human readable format. In our example user is lighttpd used grep command to open a file
* exe="/bin/grep" : Command grep used to access /etc/passwd file
* perm_mask=read : File was open for read operation
So from log files you can clearly see who read file using grep or made changes to a file using vi/vim text editor. Log provides tons of other information. You need to read man pages and documentation to understand raw log format.
Other useful examples
Search for events with date and time stamps. if the date is omitted, today is assumed. If the time is omitted, now is assumed. Use 24 hour clock time rather than AM or PM to specify time. An example date is 10/24/05. An example of time is 18:00:00.
# ausearch -ts today -k password-file
# ausearch -ts 3/12/07 -k password-file
Search for an event matching the given executable name using -x option. For example find out who has accessed /etc/passwd using rm command:
# ausearch -ts today -k password-file -x rm
# ausearch -ts 3/12/07 -k password-file -x rm
Search for an event with the given user name (UID). For example find out if user vivek (uid 506) try to open /etc/passwd:
# ausearch -ts today -k password-file -x rm -ui 506
# ausearch -k password-file -ui 506
refer:
http://www.cyberciti.biz/tips/linux-audit-files-to-see-who-made-changes-to-a-file.html
NOTE: If you are looking for a parallel BZIP2 that works on cluster machines, you should check out MPIBZIP2 which was designed for a distributed-memory message-passing architecture.
The pbzip2 program is a parallel version of bzip2 for use on shared memory machines. It provides near-linear speedup when used on true multi-processor machines and 5-10% speedup on Hyperthreaded machines. The output is fully compatible with the regular bzip2 data so any files created with pbzip2 can be uncompressed by bzip2 and vice-versa.
The default settings for pbzip2 will work well in most cases. The only switch you will likely need to use is -d to decompress files and -p to set the # of processors for pbzip2 to use if autodetect is not supported on your system, or you want to use a specific # of CPUs.
Example 1: pbzip2 -v myfile.tar
This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with the default file block size of 900k and default BWT block size of 900k.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 2
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar
Output Name: myfile.tar.bz2
Input Size: 7428687 bytes
Compressing data...
Output Size: 3236549 bytes
-------------------------------------------
Wall Clock: 2.809000 seconds
===================================================================
Example 2: pbzip2 -b15vk myfile.tar
This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with a file block size of 1500k and a BWT block size of 900k. The file "myfile.tar" will not be deleted after compression is finished.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 2
BWT Block Size: 900k
File Block Size: 1500k
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar
Output Name: myfile.tar.bz2
Input Size: 7428687 bytes
Compressing data...
Output Size: 3236394 bytes
-------------------------------------------
Wall Clock: 3.059000 seconds
===================================================================
Example 3: pbzip2 -p4 -r -5 -v myfile.tar second*.txt
This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use 4 processors with a BWT block size of 500k. The file block size will be the size of "myfile.tar" divided by 4 (# of processors) so that the data will be split evenly among each processor. This requires you have enough RAM for pbzip2 to read the entire file into memory for compression. Pbzip2 will then use the same options to compress all other files that match the wildcard "second*.txt" in that directory.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 4
BWT Block Size: 500k
File Block Size: 1857k
-------------------------------------------
File #: 1 of 3
Input Name: myfile.tar
Output Name: myfile.tar.bz2
Input Size: 7428687 bytes
Compressing data...
Output Size: 3237105 bytes
-------------------------------------------
File #: 2 of 3
Input Name: secondfile.txt
Output Name: secondfile.txt.bz2
Input Size: 5897 bytes
Compressing data...
Output Size: 3192 bytes
-------------------------------------------
File #: 3 of 3
Input Name: secondbreakfast.txt
Output Name: secondbreakfast.txt.bz2
Input Size: 83531 bytes
Compressing data...
Output Size: 11832 bytes
-------------------------------------------
Wall Clock: 5.127381 seconds
===================================================================
Example 4: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example 4: tar -c directory_to_compress/ | pbzip2 -vc > myfile.tar.bz2
This example will compress the data being given to pbzip2 via pipe from TAR into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with the default file block size of 900k and default BWT block size of 900k. TAR is collecting all of the files from the "directory_to_compress/" directory and passing the data to pbzip2 as it works.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 2
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name:
Output Name:
Compressing data...
-------------------------------------------
Wall Clock: 0.176441 seconds
===================================================================
Example 5: pbzip2 -dv myfile.tar.bz2
This example will decompress the file "myfile.tar.bz2" into the decompressed file "myfile.tar". It will use the autodetected # of processors (or 2 processors if autodetect not supported). The switches -b, -r, and -1..-9 are not valid for decompression.
The program would report something like:
===================================================================
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 2
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar.bz2
Output Name: myfile.tar
BWT Block Size: 900k
Input Size: 3236549 bytes
Decompressing data...
Output Size: 7428687 bytes
-------------------------------------------
Wall Clock: 1.154000 seconds
refer:
http://compression.ca/pbzip2/
----------------------------------------
Linux Setting processor affinity for a certain task or process
by nixcraft · 25 comments
When you are using SMP (Symmetric MultiProcessing) you might want to override the kernel's process scheduling and bind a certain process to a specific CPU(s).
But what is CPU affinity?
CPU affinity is nothing but a scheduler property that "bonds" a process to a given set of CPUs on the SMP system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity:
The scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications. For example, application such as Oracle (ERP apps) use # of cpus per instance licensed. You can bound Oracle to specific CPU to avoid license problem. This is a really useful on large server having 4 or 8 CPUS
Setting processor affinity for a certain task or process using taskset command
taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. However taskset is not installed by default. You need to install schedutils (Linux scheduler utilities) package.
Install schedutils
Debian Linux:
# apt-get install schedutils
Red Hat Enterprise Linux:
# up2date schedutils
OR
# rpm -ivh schedutils*
Under latest version of Debian / Ubuntu Linux taskset is installed by default using util-linux package.
The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. For example:
* 0x00000001 is processor #0 (1st processor)
* 0x00000003 is processors #0 and #1
* 0x00000004 is processors #2 (3rd processor)
To set the processor affinity of process 13545 to processor #0 (1st processor) type following command:
# taskset 0x00000001 -p 13545
If you find a bitmask hard to use, then you can specify a numerical list of processors instead of a bitmask using -c flag:
# taskset -c 1 -p 13545
# taskset -c 3,4 -p 13545
Where,
* -p : Operate on an existing PID and not launch a new task (default is to launch a new task)
---------------------------
Linux audit files to see who made changes to a file
by Vivek Gite · 24 comments
This is one of the key questions many new sys admin ask:
How do I audit file events such as read / write etc? How can I use audit to see who changed a file in Linux?
The answer is to use 2.6 kernel’s audit system. Modern Linux kernel (2.6.x) comes with auditd daemon. It’s responsible for writing audit records to the disk. During startup, the rules in /etc/audit.rules are read by this daemon. You can open /etc/audit.rules file and make changes such as setup audit file log location and other option. The default file is good enough to get started with auditd.
In order to use audit facility you need to use following utilities
=> auditctl - a command to assist controlling the kernel’s audit system. You can get status, and add or delete rules into kernel audit system. Setting a watch on a file is accomplished using this command:
=> ausearch - a command that can query the audit daemon logs based for events based on different search criteria.
=> aureport - a tool that produces summary reports of the audit system logs.
Note that following all instructions are tested on CentOS 4.x and Fedora Core and RHEL 4/5 Linux.
Task: install audit package
The audit package contains the user space utilities for storing and searching the audit records generate by the audit subsystem in the Linux 2.6 kernel. CentOS/Red Hat and Fedora core includes audit rpm package. Use yum or up2date command to install package
# yum install audit
or
# up2date install audit
Auto start auditd service on boot
# ntsysv
OR
# chkconfig auditd on
Now start service:
# /etc/init.d/auditd start
How do I set a watch on a file for auditing?
Let us say you would like to audit a /etc/passwd file. You need to type command as follows:
# auditctl -w /etc/passwd -p war -k password-file
Where,
* -w /etc/passwd : Insert a watch for the file system object at given path i.e. watch file called /etc/passwd
* -p war : Set permissions filter for a file system watch. It can be r for read, w for write, x for execute, a for append.
* -k password-file : Set a filter key on a /etc/passwd file (watch). The password-file is a filterkey (string of text that can be up to 31 bytes long). It can uniquely identify the audit records produced by the watch. You need to use password-file string or phrase while searching audit logs.
In short you are monitoring (read as watching) a /etc/passwd file for anyone (including syscall) that may perform a write, append or read operation on a file.
Wait for some time or as a normal user run command as follows:
$ grep 'something' /etc/passwd
$ vi /etc/passwd
Following are more examples:
File System audit rules
Add a watch on "/etc/shadow" with the arbitrary filterkey "shadow-file" that generates records for "reads, writes, executes, and appends" on "shadow"
# auditctl -w /etc/shadow -k shadow-file -p rwxa
syscall audit rule
The next rule suppresses auditing for mount syscall exits
# auditctl -a exit,never -S mount
File system audit rule
Add a watch "tmp" with a NULL filterkey that generates records "executes" on "/tmp" (good for a webserver)
# auditctl -w /tmp -p e -k webserver-watch-tmp
syscall audit rule using pid
To see all syscalls made by a program called sshd (pid - 1005):
# auditctl -a entry,always -S all -F pid=1005
How do I find out who changed or accessed a file /etc/passwd?
Use ausearch command as follows:
# ausearch -f /etc/passwd
OR
# ausearch -f /etc/passwd | less
OR
# ausearch -f /etc/passwd -i | less
Where,
* -f /etc/passwd : Only search for this file
* -i : Interpret numeric entities into text. For example, uid is converted to account name.
Output:
----
type=PATH msg=audit(03/16/2007 14:52:59.985:55) : name=/etc/passwd flags=follow,open inode=23087346 dev=08:02 mode=file,644 ouid=root ogid=root rdev=00:00
type=CWD msg=audit(03/16/2007 14:52:59.985:55) : cwd=/webroot/home/lighttpd
type=FS_INODE msg=audit(03/16/2007 14:52:59.985:55) : inode=23087346 inode_uid=root inode_gid=root inode_dev=08:02 inode_rdev=00:00
type=FS_WATCH msg=audit(03/16/2007 14:52:59.985:55) : watch_inode=23087346 watch=passwd filterkey=password-file perm=read,write,append perm_mask=read
type=SYSCALL msg=audit(03/16/2007 14:52:59.985:55) : arch=x86_64 syscall=open success=yes exit=3 a0=7fbffffcb4 a1=0 a2=2 a3=6171d0 items=1 pid=12551 auid=unknown(4294967295) uid=lighttpd gid=lighttpd euid=lighttpd suid=lighttpd fsuid=lighttpd egid=lighttpd sgid=lighttpd fsgid=lighttpd comm=grep exe=/bin/grep
Let us try to understand output
* audit(03/16/2007 14:52:59.985:55) : Audit log time
* uid=lighttpd gid=lighttpd : User ids in numerical format. By passing -i option to command you can convert most of numeric data to human readable format. In our example user is lighttpd used grep command to open a file
* exe="/bin/grep" : Command grep used to access /etc/passwd file
* perm_mask=read : File was open for read operation
So from log files you can clearly see who read file using grep or made changes to a file using vi/vim text editor. Log provides tons of other information. You need to read man pages and documentation to understand raw log format.
Other useful examples
Search for events with date and time stamps. if the date is omitted, today is assumed. If the time is omitted, now is assumed. Use 24 hour clock time rather than AM or PM to specify time. An example date is 10/24/05. An example of time is 18:00:00.
# ausearch -ts today -k password-file
# ausearch -ts 3/12/07 -k password-file
Search for an event matching the given executable name using -x option. For example find out who has accessed /etc/passwd using rm command:
# ausearch -ts today -k password-file -x rm
# ausearch -ts 3/12/07 -k password-file -x rm
Search for an event with the given user name (UID). For example find out if user vivek (uid 506) try to open /etc/passwd:
# ausearch -ts today -k password-file -x rm -ui 506
# ausearch -k password-file -ui 506
refer:
http://www.cyberciti.biz/tips/linux-audit-files-to-see-who-made-changes-to-a-file.html
Sunday, December 27, 2009
file system related commands
dumpe2fs prints the super block and blocks group information for the filesystem present on device.
-bash-2.05b# dumpe2fs -h /dev/ubd/0
dumpe2fs 1.35 (28-Feb-2004)
Filesystem volume name:
Last mounted on:
Filesystem UUID: 47ce1382-4487-40db-949a-ce0b22d70cd0
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype needs_recovery sparse_super
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 384768
Block count: 768256
Reserved block count: 38412
Free blocks: 226824
Free inodes: 248837
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16032
Inode blocks per group: 501
Filesystem created: Tue Jul 27 12:59:32 2004
Last mount time: Sun Dec 27 18:55:42 2009
Last write time: Sun Dec 27 18:55:42 2009
Mount count: 5
Maximum mount count: 20
Last checked: Sat Feb 12 17:56:04 2005
Check interval: 15552000 (6 months)
Next check after: Thu Aug 11 18:56:04 2005
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: 20cb2afa-b1ca-4f4d-974f-e54d63b4e3ff
Journal backup: inode blocks
The basic recovery process
In this section we will go step-by-step through the data recovery process and describe the tools, and their options, in detail. We start by listing a directory below.
[abe@abe-laptop test]$ ls -al
total 27
drwxrwxr-x 2 abe abe 4096 2008-03-29 17:48 .
drwx------ 71 abe abe 4096 2008-03-29 17:47 ..
-rwxr--r-- 1 abe abe 42736 2008-03-29 17:47 weimaraner1.jpg
In the listing above we can see that there is a file named weimaraner1.jpg in the test directory. This is a picture of my dog. I don't want to delete it. I like my dog.
[abe@abe-laptop test]$ rm -f *
Here we can see I am deleting it. Whoops! Sorry buddy. Let's gather some basic information about the system so we can begin the recovery process.
[abe@abe-laptop test]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 71G 14G 53G 21% /
/dev/sda1 99M 19M 76M 20% /boot
tmpfs 1007M 12K 1007M 1% /dev/shm
/dev/sdb1 887M 152M 735M 18% /media/PUBLIC
Here we see that the full path to the test directory (which is /home/abe/test) is part of the / filesystem, represented by the device file /dev/sda2.
[abe@abe-laptop test]$ su -
Password:
[root@abe-laptop ~]# debugfs /dev/sda2
Using su to gain root access, we can start the debugfs program giving it the target of /dev/sda2. The debugfs program is an interactive file system debugger that is installed by default with most common Linux distributions. This program is used to manually examine and change the state of a filesystem. In our situation, we're going to use this program to determine the inode which stored information about the deleted file and to what block group the deleted file belonged.
debugfs 1.40.4 (31-Dec-2007)
debugfs: cd /home/abe/test
debugfs: ls -d
1835327 (12) . 65538 (4084) .. <1835328> (4072) weimaraner1.jpg
After debugfs starts, we cd into /home/abe/test and run the ls -d command. This command shows us all deleted entries in the current directory. The output shows us that we have one deleted entry and that its inode number is 1835328 -- that is, the number between the angular brackets.
debugfs: imap <1835328>
Inode 1835328 is part of block group 56
located at block 1835019, offset 0x0f80
The next command we want to run is imap, giving it the inode number above so we can determine to which block group the file belonged. We see by the output that it belonged to block group 56.
debugfs: stats
[...lots of output...]
Blocks per group: 32768
[...lots of output...]
debugfs: q
Running the stats command will generate a lot of output. The only data we are interested in from this list, however, is the number of blocks per group. In this case, and most cases, it’s 32768. Now we have enough data to be able to determine the specific set of blocks in which the data resided. We're done with debugfs now, so we type q to quit.
refer:
http://www.securityfocus.com/infocus/1902
debugfs: dump <2048262> /home/jake/recovery.file
Especially if you can't unmount the file system containing the deleted data, debugfs is a less comfortable, but usable alternative if it is already installed on your system. (If you have to install it, you can use the more comfortable e2undel as well.) Just try a
/sbin/debugfs device
Replace device by your file system, e.g. /dev/hda1 for the first partition on your first IDE drive. At the "debugfs:" prompt, enter the command
lsdel
After some time, you will be presented a list of deleted files. You must identify the file you want to recover by its owner (2nd column), size (4th column), and deletion date. When found, you can write the data of the file via
dump filename
The inode_number is printed in the 1st column of the "lsdel" command. The file filename should reside on a different file system than the one you opened with debugfs. This might be another partition, a RAM disk or even a floppy disk.
Repeat the "dump" command for all files that you want to recover; then quit debugfs by entering "q".
refer:http://e2undel.sourceforge.net/recovery-howto.html
Disable ext3 boot-time check with tune2fs
by Ryan
on October 26, 2008
The ext3 file system forces an fsck once it has been mounted a certain number of times. By default this maximum mount count is usually set between 20-30. On many systems such as laptops which can be rebooted quite often this can quickly become a problem. To turn off this checking you can use the tune2fs command.
The tune2fs command utility operates exclusively on ext2/ext3 file systems.
To run these commands you must run the command as root or use sudo. You must also make sure that your filesystem is unmounted before making any changes. If you are doing this on your root partition the best solution is to use a LiveCD.
You can run tune2fs on the ext3 partition with the ‘-l‘ option to view what your current and maximum mount count is set to currently.
tune2fs -l /dev/sda1
...
Mount count: 2
Maximum mount count: 25
...
To turn off this check set the maximum count to 0 with the ‘-c‘ option.
# tune2fs -c 0 /dev/sda1
If you do not want to completely disable the file system checking, you can also increase the maximum count.
# tune2fs -c 100 /dev/sda1
-------
debugfs: params
Open mode: read-only
Filesystem in use: /dev/ubd/0
-bash-2.05b# dumpe2fs -h /dev/ubd/0
dumpe2fs 1.35 (28-Feb-2004)
Filesystem volume name:
Last mounted on:
Filesystem UUID: 47ce1382-4487-40db-949a-ce0b22d70cd0
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype needs_recovery sparse_super
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 384768
Block count: 768256
Reserved block count: 38412
Free blocks: 226824
Free inodes: 248837
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16032
Inode blocks per group: 501
Filesystem created: Tue Jul 27 12:59:32 2004
Last mount time: Sun Dec 27 18:55:42 2009
Last write time: Sun Dec 27 18:55:42 2009
Mount count: 5
Maximum mount count: 20
Last checked: Sat Feb 12 17:56:04 2005
Check interval: 15552000 (6 months)
Next check after: Thu Aug 11 18:56:04 2005
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: 20cb2afa-b1ca-4f4d-974f-e54d63b4e3ff
Journal backup: inode blocks
The basic recovery process
In this section we will go step-by-step through the data recovery process and describe the tools, and their options, in detail. We start by listing a directory below.
[abe@abe-laptop test]$ ls -al
total 27
drwxrwxr-x 2 abe abe 4096 2008-03-29 17:48 .
drwx------ 71 abe abe 4096 2008-03-29 17:47 ..
-rwxr--r-- 1 abe abe 42736 2008-03-29 17:47 weimaraner1.jpg
In the listing above we can see that there is a file named weimaraner1.jpg in the test directory. This is a picture of my dog. I don't want to delete it. I like my dog.
[abe@abe-laptop test]$ rm -f *
Here we can see I am deleting it. Whoops! Sorry buddy. Let's gather some basic information about the system so we can begin the recovery process.
[abe@abe-laptop test]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 71G 14G 53G 21% /
/dev/sda1 99M 19M 76M 20% /boot
tmpfs 1007M 12K 1007M 1% /dev/shm
/dev/sdb1 887M 152M 735M 18% /media/PUBLIC
Here we see that the full path to the test directory (which is /home/abe/test) is part of the / filesystem, represented by the device file /dev/sda2.
[abe@abe-laptop test]$ su -
Password:
[root@abe-laptop ~]# debugfs /dev/sda2
Using su to gain root access, we can start the debugfs program giving it the target of /dev/sda2. The debugfs program is an interactive file system debugger that is installed by default with most common Linux distributions. This program is used to manually examine and change the state of a filesystem. In our situation, we're going to use this program to determine the inode which stored information about the deleted file and to what block group the deleted file belonged.
debugfs 1.40.4 (31-Dec-2007)
debugfs: cd /home/abe/test
debugfs: ls -d
1835327 (12) . 65538 (4084) .. <1835328> (4072) weimaraner1.jpg
After debugfs starts, we cd into /home/abe/test and run the ls -d command. This command shows us all deleted entries in the current directory. The output shows us that we have one deleted entry and that its inode number is 1835328 -- that is, the number between the angular brackets.
debugfs: imap <1835328>
Inode 1835328 is part of block group 56
located at block 1835019, offset 0x0f80
The next command we want to run is imap, giving it the inode number above so we can determine to which block group the file belonged. We see by the output that it belonged to block group 56.
debugfs: stats
[...lots of output...]
Blocks per group: 32768
[...lots of output...]
debugfs: q
Running the stats command will generate a lot of output. The only data we are interested in from this list, however, is the number of blocks per group. In this case, and most cases, it’s 32768. Now we have enough data to be able to determine the specific set of blocks in which the data resided. We're done with debugfs now, so we type q to quit.
refer:
http://www.securityfocus.com/infocus/1902
debugfs: dump <2048262> /home/jake/recovery.file
Especially if you can't unmount the file system containing the deleted data, debugfs is a less comfortable, but usable alternative if it is already installed on your system. (If you have to install it, you can use the more comfortable e2undel as well.) Just try a
/sbin/debugfs device
Replace device by your file system, e.g. /dev/hda1 for the first partition on your first IDE drive. At the "debugfs:" prompt, enter the command
lsdel
After some time, you will be presented a list of deleted files. You must identify the file you want to recover by its owner (2nd column), size (4th column), and deletion date. When found, you can write the data of the file via
dump
The inode_number is printed in the 1st column of the "lsdel" command. The file filename should reside on a different file system than the one you opened with debugfs. This might be another partition, a RAM disk or even a floppy disk.
Repeat the "dump" command for all files that you want to recover; then quit debugfs by entering "q".
refer:http://e2undel.sourceforge.net/recovery-howto.html
Disable ext3 boot-time check with tune2fs
by Ryan
on October 26, 2008
The ext3 file system forces an fsck once it has been mounted a certain number of times. By default this maximum mount count is usually set between 20-30. On many systems such as laptops which can be rebooted quite often this can quickly become a problem. To turn off this checking you can use the tune2fs command.
The tune2fs command utility operates exclusively on ext2/ext3 file systems.
To run these commands you must run the command as root or use sudo. You must also make sure that your filesystem is unmounted before making any changes. If you are doing this on your root partition the best solution is to use a LiveCD.
You can run tune2fs on the ext3 partition with the ‘-l‘ option to view what your current and maximum mount count is set to currently.
tune2fs -l /dev/sda1
...
Mount count: 2
Maximum mount count: 25
...
To turn off this check set the maximum count to 0 with the ‘-c‘ option.
# tune2fs -c 0 /dev/sda1
If you do not want to completely disable the file system checking, you can also increase the maximum count.
# tune2fs -c 100 /dev/sda1
-------
debugfs: params
Open mode: read-only
Filesystem in use: /dev/ubd/0
recover deleted files in ext3 FS
Download and Install ext3grep in your File system
Download the source code from: http://ext3grep.googlecode.com/files/ext3grep-0.9.0.tar.gz or you can download them through svn access. Follow the steps below for the installation:
mkdir ext3grep
svn checkout http://ext3grep.googlecode.com/svn/trunk/ ext3grep
cd ext3grep
./configure -prefix=/opt/ext3grep # Make sure that it does not get installed in
the affected partition
make
make install
The Basics of the ext3 File system:
Let’s take a look at how the basic ext3 file system uses ext3grep. Ext3 is an ext2 file system with the journaling option. Journaling is nothing but keeping track of the transactions, so that in case of a crash, the files may be recovered from a previous state. All transaction information are passed to the journaling block device layer (JDB), which is independent of the ext3 file system.
The ext3 partition consists of a set of groups which are created during disk formatting. Each group consists of a super block, a group descriptor, a block bitmap, an i-node bitmap, an i-node table and data blocks. A simple layout can be specified as follows:
,---------+---------+---------+---------+---------+---------,
| Super | FS | Block | Inode | Inode | Data |
| block | desc. | bitmap | bitmap | table | blocks |
`---------+---------+---------+---------+---------+---------'
You can get the total number of groups in the particular partition using the following command:
./ext3grep /dev/hda2 --superblock | grep 'Number of groups'
Number of groups: 24
Each group consists of a set of fixed size blocks which could be of 4096, 2048 or 1024 bytes in size.
Some of the basic terminology associated with the ext3 file system are:
Superblock:
Superblock is a header that tells the kernel about the layout of the file system. It contains information about the block size, block-count and several such details. The first superblock is the one that is used when the file system is mounted.
To get information related to the blocks per group, use the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --superblock | grep 'blocks per group'
Number of blocks per group: 32768
To get the block size details from the superblock, use the command:
/opt/ext3grep/bin/ext3grep /dev/hda5 --superblock|grep size
Block size: 4096
Fragment size: 4096
You can get a complete list of the superblock details using the command:
/opt/ext3grep/bin/ext3grep /dev/hda5 --superblock
Group Descriptor:
The next block is the group descriptor which stores information of each group. Within each group descriptor, is a pointer to the table of i-nodes and the allocation bitmaps for the i-nodes and data blocks.
Allocation Bitmap:
An allocation bitmap is a list of bits describing the block and the i-nodes which are used so that the allocation of files can be done efficiently.
I-nodes:
Each file is associated with one i-node. It contains various information about the files. The data of the files are not stored in the i-node as such, but it points to the location of the data on the disk (data structure to file).
I-nodes are stored in the i-node tables. The command: df -i will give you the total number of i-nodes in the partition and the command ls -i filename will give you the i-node number of the respective file.
df -i | grep /dev/hda5
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/hda5 18233952 33671 18200281 1% /
-------------------------------------------------
ll -i ext3grep
inode no permission owner group size in bytes date filename
6350788 -rwxr-xr-x 1 root root 2765388 Oct 5 23:49 ext3grep
Directories:
In the ext3 file system, each directory is a file. This directory uses an i-node and this i-node contains all the information about the contents of the directory. Each file has a list of directory entries and each entry associates one file name with one i-node number. You can get the directory i-node information using the command:
ll -id bin
6350787 drwxr-xr-x 2 root root 4096 Oct 5 23:49 bin
Superblock Recovery:
Sometimes the superblock gets corrupted and all the data information of that particular group is lost. In this case we can recover the superblock using the alternate superblock backup.
First, list the backup superblock
dumpe2fs -h /dev/hda5
Primary superblock at 0, Group descriptors at 1-5
Backup superblock at 32768, Group descriptors at 32769-32773
Backup superblock at 98304, Group descriptors at 98305-98309
Backup superblock at 163840, Group descriptors at 163841-163845
Backup superblock at 229376, Group descriptors at 229377-229381
Backup superblock at 294912, Group descriptors at 294913-294917
Next, find the position of backup superblock.
Usually the block size of ext3 will be 4096 bytes, unless defined manually during file system creation.
position= backup superblock *4
32768*4=131072
Now, mount the file system using an alternate superblock.
mount -o sb=131072 /dev/hda5 /main
The ext3grep is a simple tool that can aid anyone who would have accidentally deleted a file on an ext3 file system, only to later realize that they required it.
Some important commands for the partition
Find the number of group to which a particular i-node belongs.
The number of i-nodes per group can be found using ext3grep described below:
group = (inode_number - 1) / inodes_per_group
To find the block to which the i-node belongs, use the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --inode-to-block 272
Inode 272 resides in block 191 at offset 0x780.
To find the journal i-node of the drive:
/opt/ext3grep/bin/ext3grep /dev/hda2 --superblock | grep 'Inode number of
journal file'
Inode number of journal file: 8
The Recovery Process
In the recovery process the first thing to do is to list the files of the particular disk. You can use the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --dump-names
Before working on the recovery process make sure that you have unmounted the partition.
To Recover all files:
The following command will recover all the files to a new directory RESTORED_FILES which is in the current working directory. The current working directory should be a new drive.
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-all
After this, you will have a copy of all the files in the directory RESTORED_FILES .
To Recover a Single File:
If you want to recover a single file, then find the i-node corresponding to the directory that contains that file. For example, if I accidentally lost a file named backup.sql which was in /home2. First I need to find its i-node:
ll -id /home2/
2 drwxr-xr-x 5 root root 4096 Aug 27 09:21 /home2/
Here the first entry ‘2′ is the i-node of /home2. Now I can use ext3grep to list the contents of /home2.
/opt/ext3grep/bin/ext3grep /dev/hda2 --ls --inode 2
The first block of the directory is 683. Inode 2 is directory “”.
Directory block 683:
.-- File type in dir_entry (r=regular file, d=directory, l=symlink)
| .-- D: Deleted ; R: Reallocated
Index Next | I-node | Deletion time Mode File name
==========+==========+----------------data-from-inode------+-----------+=========
0 1 d 2 drwxr-xr-x .
1 2 d 2 drwxr-xr-x ..
2 3 d 11 drwx------ lost+found
3 4 d 144001 drwxr-xr-x testfol
4 6 r 13 rrw-r--r-- aba.txt
5 6 d 112001 D 1219344156 Thu Aug 21 14:42:36 2008 drwxr-xr-x db
6 end d 176001 drwxr-xr-x log
7 end r 12 D 1219843315 Wed Aug 27 09:21:55 2008 rrw-r--r-- backup.sql
Here, we see that the file backup.sql is already deleted. I can recover it using ext3grep through two methods.
Recovery using the file name:
You can recover the file by providing the path of the file to the ext3grep tool. In my case /home2 was added as a separate partition. So I should give the path of the file as simply backup.sql, since it is in root directory of that partition.
umount /home2
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-file backup.sql
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from
1217936328 = Tue Aug 5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Loading hda2.ext3grep.stage2... done
Restoring backup.sql
Ensure that the file has been recovered to the folder “RESTORED_FILES”
ll -d RESTORED_FILES/backup.sql
-rw-r--r-- 1 root root 49152 Dec 26 2006 RESTORED_FILES/backup.sql
Recovering using the i-node information.:
You can recover the file also by using the i-node information of the file. The i-node number can be obtained using the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --ls --inode 2
------------------------------------
7 end r 12 D 1219843315 Wed Aug 27 09:21:55 2008 rrw-r--r-- backup.sql
Here the i-node number is 12 and you can restore the file by issuing the following command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-inode 12
Loading journal descriptors... sorting... done
The oldest i-node block that is still in the journal, appears to be from
1217936328 = Tue Aug 5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Restoring inode.12
mv RESTORED_FILES/inode.12 backup.sql
ll -h backup.sql
-rw-r--r-- 1 root root 48K Dec 26 2006 backup.sql
To Recover files based on time:
Sometimes there can be a conflict where the ext3grep tool detects a lot of old files that were removed, but have the same name. In this case you have to use the “–after” option. In addition, you will also have to provide a Unix time stamp to recover the file. The Unix time stamp can be obtained from the following link: http://www.onlineconversion.com/unix_time.htm.
For example, if I would like to recover all the files that were deleted after Wed Aug 27 05:20:00 2008, the command used should be as follows:
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-all --after=1219828800
Only show/process deleted entries if they are deleted on or after Wed Aug 27 05:20:00 2008.
Number of groups: 23
Minimum / maximum journal block: 689 / 17091
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from
1217936328 = Tue Aug 5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Writing output to directory RESTORED_FILES/
Loading hda2.ext3grep.stage2... done
Restoring aba.txt
Restoring backup.sql
You can also use the ‘–before’ option to get a file before that date.
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-all --before=1219828800
You can recover files between a set of dates combining both the above options. For example, in order to recover a file between 12/12/2007 and 12/9/2008, I need to use a command as follows:
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-all --after=1197417600 --before=1228780800
To List the Correct hard links
A recovery of the files can cause a lot of hard link related issues. To find out the hard linked files, you can use the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --show-hardlinks
After this, remove the unwanted hard linked files which are duplicates.
To List the Deleted files.
You can use the following command to list the deleted files.
/opt/ext3grep/bin/ext3grep /dev/hda2 --deleted
Reference
bobcares.com
http://www.xs4all.nl/~carlo17/howto/undelete_ext3.html
Download the source code from: http://ext3grep.googlecode.com/files/ext3grep-0.9.0.tar.gz or you can download them through svn access. Follow the steps below for the installation:
mkdir ext3grep
svn checkout http://ext3grep.googlecode.com/svn/trunk/ ext3grep
cd ext3grep
./configure -prefix=/opt/ext3grep # Make sure that it does not get installed in
the affected partition
make
make install
The Basics of the ext3 File system:
Let’s take a look at how the basic ext3 file system uses ext3grep. Ext3 is an ext2 file system with the journaling option. Journaling is nothing but keeping track of the transactions, so that in case of a crash, the files may be recovered from a previous state. All transaction information are passed to the journaling block device layer (JDB), which is independent of the ext3 file system.
The ext3 partition consists of a set of groups which are created during disk formatting. Each group consists of a super block, a group descriptor, a block bitmap, an i-node bitmap, an i-node table and data blocks. A simple layout can be specified as follows:
,---------+---------+---------+---------+---------+---------,
| Super | FS | Block | Inode | Inode | Data |
| block | desc. | bitmap | bitmap | table | blocks |
`---------+---------+---------+---------+---------+---------'
You can get the total number of groups in the particular partition using the following command:
./ext3grep /dev/hda2 --superblock | grep 'Number of groups'
Number of groups: 24
Each group consists of a set of fixed size blocks which could be of 4096, 2048 or 1024 bytes in size.
Some of the basic terminology associated with the ext3 file system are:
Superblock:
Superblock is a header that tells the kernel about the layout of the file system. It contains information about the block size, block-count and several such details. The first superblock is the one that is used when the file system is mounted.
To get information related to the blocks per group, use the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --superblock | grep 'blocks per group'
Number of blocks per group: 32768
To get the block size details from the superblock, use the command:
/opt/ext3grep/bin/ext3grep /dev/hda5 --superblock|grep size
Block size: 4096
Fragment size: 4096
You can get a complete list of the superblock details using the command:
/opt/ext3grep/bin/ext3grep /dev/hda5 --superblock
Group Descriptor:
The next block is the group descriptor which stores information of each group. Within each group descriptor, is a pointer to the table of i-nodes and the allocation bitmaps for the i-nodes and data blocks.
Allocation Bitmap:
An allocation bitmap is a list of bits describing the block and the i-nodes which are used so that the allocation of files can be done efficiently.
I-nodes:
Each file is associated with one i-node. It contains various information about the files. The data of the files are not stored in the i-node as such, but it points to the location of the data on the disk (data structure to file).
I-nodes are stored in the i-node tables. The command: df -i will give you the total number of i-nodes in the partition and the command ls -i filename will give you the i-node number of the respective file.
df -i | grep /dev/hda5
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/hda5 18233952 33671 18200281 1% /
-------------------------------------------------
ll -i ext3grep
inode no permission owner group size in bytes date filename
6350788 -rwxr-xr-x 1 root root 2765388 Oct 5 23:49 ext3grep
Directories:
In the ext3 file system, each directory is a file. This directory uses an i-node and this i-node contains all the information about the contents of the directory. Each file has a list of directory entries and each entry associates one file name with one i-node number. You can get the directory i-node information using the command:
ll -id bin
6350787 drwxr-xr-x 2 root root 4096 Oct 5 23:49 bin
Superblock Recovery:
Sometimes the superblock gets corrupted and all the data information of that particular group is lost. In this case we can recover the superblock using the alternate superblock backup.
First, list the backup superblock
dumpe2fs -h /dev/hda5
Primary superblock at 0, Group descriptors at 1-5
Backup superblock at 32768, Group descriptors at 32769-32773
Backup superblock at 98304, Group descriptors at 98305-98309
Backup superblock at 163840, Group descriptors at 163841-163845
Backup superblock at 229376, Group descriptors at 229377-229381
Backup superblock at 294912, Group descriptors at 294913-294917
Next, find the position of backup superblock.
Usually the block size of ext3 will be 4096 bytes, unless defined manually during file system creation.
position= backup superblock *4
32768*4=131072
Now, mount the file system using an alternate superblock.
mount -o sb=131072 /dev/hda5 /main
The ext3grep is a simple tool that can aid anyone who would have accidentally deleted a file on an ext3 file system, only to later realize that they required it.
Some important commands for the partition
Find the number of group to which a particular i-node belongs.
The number of i-nodes per group can be found using ext3grep described below:
group = (inode_number - 1) / inodes_per_group
To find the block to which the i-node belongs, use the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --inode-to-block 272
Inode 272 resides in block 191 at offset 0x780.
To find the journal i-node of the drive:
/opt/ext3grep/bin/ext3grep /dev/hda2 --superblock | grep 'Inode number of
journal file'
Inode number of journal file: 8
The Recovery Process
In the recovery process the first thing to do is to list the files of the particular disk. You can use the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --dump-names
Before working on the recovery process make sure that you have unmounted the partition.
To Recover all files:
The following command will recover all the files to a new directory RESTORED_FILES which is in the current working directory. The current working directory should be a new drive.
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-all
After this, you will have a copy of all the files in the directory RESTORED_FILES .
To Recover a Single File:
If you want to recover a single file, then find the i-node corresponding to the directory that contains that file. For example, if I accidentally lost a file named backup.sql which was in /home2. First I need to find its i-node:
ll -id /home2/
2 drwxr-xr-x 5 root root 4096 Aug 27 09:21 /home2/
Here the first entry ‘2′ is the i-node of /home2. Now I can use ext3grep to list the contents of /home2.
/opt/ext3grep/bin/ext3grep /dev/hda2 --ls --inode 2
The first block of the directory is 683. Inode 2 is directory “”.
Directory block 683:
.-- File type in dir_entry (r=regular file, d=directory, l=symlink)
| .-- D: Deleted ; R: Reallocated
Index Next | I-node | Deletion time Mode File name
==========+==========+----------------data-from-inode------+-----------+=========
0 1 d 2 drwxr-xr-x .
1 2 d 2 drwxr-xr-x ..
2 3 d 11 drwx------ lost+found
3 4 d 144001 drwxr-xr-x testfol
4 6 r 13 rrw-r--r-- aba.txt
5 6 d 112001 D 1219344156 Thu Aug 21 14:42:36 2008 drwxr-xr-x db
6 end d 176001 drwxr-xr-x log
7 end r 12 D 1219843315 Wed Aug 27 09:21:55 2008 rrw-r--r-- backup.sql
Here, we see that the file backup.sql is already deleted. I can recover it using ext3grep through two methods.
Recovery using the file name:
You can recover the file by providing the path of the file to the ext3grep tool. In my case /home2 was added as a separate partition. So I should give the path of the file as simply backup.sql, since it is in root directory of that partition.
umount /home2
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-file backup.sql
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from
1217936328 = Tue Aug 5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Loading hda2.ext3grep.stage2... done
Restoring backup.sql
Ensure that the file has been recovered to the folder “RESTORED_FILES”
ll -d RESTORED_FILES/backup.sql
-rw-r--r-- 1 root root 49152 Dec 26 2006 RESTORED_FILES/backup.sql
Recovering using the i-node information.:
You can recover the file also by using the i-node information of the file. The i-node number can be obtained using the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --ls --inode 2
------------------------------------
7 end r 12 D 1219843315 Wed Aug 27 09:21:55 2008 rrw-r--r-- backup.sql
Here the i-node number is 12 and you can restore the file by issuing the following command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-inode 12
Loading journal descriptors... sorting... done
The oldest i-node block that is still in the journal, appears to be from
1217936328 = Tue Aug 5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Restoring inode.12
mv RESTORED_FILES/inode.12 backup.sql
ll -h backup.sql
-rw-r--r-- 1 root root 48K Dec 26 2006 backup.sql
To Recover files based on time:
Sometimes there can be a conflict where the ext3grep tool detects a lot of old files that were removed, but have the same name. In this case you have to use the “–after” option. In addition, you will also have to provide a Unix time stamp to recover the file. The Unix time stamp can be obtained from the following link: http://www.onlineconversion.com/unix_time.htm.
For example, if I would like to recover all the files that were deleted after Wed Aug 27 05:20:00 2008, the command used should be as follows:
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-all --after=1219828800
Only show/process deleted entries if they are deleted on or after Wed Aug 27 05:20:00 2008.
Number of groups: 23
Minimum / maximum journal block: 689 / 17091
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from
1217936328 = Tue Aug 5 07:38:48 2008
Number of descriptors in journal: 1315; min / max sequence numbers: 203 / 680
Writing output to directory RESTORED_FILES/
Loading hda2.ext3grep.stage2... done
Restoring aba.txt
Restoring backup.sql
You can also use the ‘–before’ option to get a file before that date.
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-all --before=1219828800
You can recover files between a set of dates combining both the above options. For example, in order to recover a file between 12/12/2007 and 12/9/2008, I need to use a command as follows:
/opt/ext3grep/bin/ext3grep /dev/hda2 --restore-all --after=1197417600 --before=1228780800
To List the Correct hard links
A recovery of the files can cause a lot of hard link related issues. To find out the hard linked files, you can use the command:
/opt/ext3grep/bin/ext3grep /dev/hda2 --show-hardlinks
After this, remove the unwanted hard linked files which are duplicates.
To List the Deleted files.
You can use the following command to list the deleted files.
/opt/ext3grep/bin/ext3grep /dev/hda2 --deleted
Reference
bobcares.com
http://www.xs4all.nl/~carlo17/howto/undelete_ext3.html
Subscribe to:
Posts (Atom)