Linux experts tips: compression in smp+auditing

PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 or newer (ie: anything compressed with pbzip2 can be decompressed with bzip2). PBZIP2 should work on any system that has a pthreads compatible C++ compiler (such as gcc). It has been tested on: Linux, Windows (cygwin & MinGW), Solaris, Tru64/OSF1, HP-UX, OS/2, and Irix.

NOTE: If you are looking for a parallel BZIP2 that works on cluster machines, you should check out MPIBZIP2 which was designed for a distributed-memory message-passing architecture.

The pbzip2 program is a parallel version of bzip2 for use on shared memory machines. It provides near-linear speedup when used on true multi-processor machines and 5-10% speedup on Hyperthreaded machines. The output is fully compatible with the regular bzip2 data so any files created with pbzip2 can be uncompressed by bzip2 and vice-versa.

The default settings for pbzip2 will work well in most cases. The only switch you will likely need to use is -d to decompress files and -p to set the # of processors for pbzip2 to use if autodetect is not supported on your system, or you want to use a specific # of CPUs.

Example 1: pbzip2 -v myfile.tar

This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with the default file block size of 900k and default BWT block size of 900k.

The program would report something like:
===================================================================

Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)

# CPUs: 2
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar
Output Name: myfile.tar.bz2

Input Size: 7428687 bytes
Compressing data...
Output Size: 3236549 bytes
-------------------------------------------

Wall Clock: 2.809000 seconds

===================================================================

Example 2: pbzip2 -b15vk myfile.tar

This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with a file block size of 1500k and a BWT block size of 900k. The file "myfile.tar" will not be deleted after compression is finished.

The program would report something like:
===================================================================

Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)

# CPUs: 2
BWT Block Size: 900k
File Block Size: 1500k
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar
Output Name: myfile.tar.bz2

Input Size: 7428687 bytes
Compressing data...
Output Size: 3236394 bytes
-------------------------------------------

Wall Clock: 3.059000 seconds

===================================================================

Example 3: pbzip2 -p4 -r -5 -v myfile.tar second*.txt

This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use 4 processors with a BWT block size of 500k. The file block size will be the size of "myfile.tar" divided by 4 (# of processors) so that the data will be split evenly among each processor. This requires you have enough RAM for pbzip2 to read the entire file into memory for compression. Pbzip2 will then use the same options to compress all other files that match the wildcard "second*.txt" in that directory.

The program would report something like:
===================================================================

Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)

# CPUs: 4
BWT Block Size: 500k
File Block Size: 1857k
-------------------------------------------
File #: 1 of 3
Input Name: myfile.tar
Output Name: myfile.tar.bz2

Input Size: 7428687 bytes
Compressing data...
Output Size: 3237105 bytes
-------------------------------------------
File #: 2 of 3
Input Name: secondfile.txt
Output Name: secondfile.txt.bz2

Input Size: 5897 bytes
Compressing data...
Output Size: 3192 bytes
-------------------------------------------
File #: 3 of 3
Input Name: secondbreakfast.txt
Output Name: secondbreakfast.txt.bz2

Input Size: 83531 bytes
Compressing data...
Output Size: 11832 bytes
-------------------------------------------

Wall Clock: 5.127381 seconds

===================================================================

Example 4: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example 4: tar -c directory_to_compress/ | pbzip2 -vc > myfile.tar.bz2

This example will compress the data being given to pbzip2 via pipe from TAR into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with the default file block size of 900k and default BWT block size of 900k. TAR is collecting all of the files from the "directory_to_compress/" directory and passing the data to pbzip2 as it works.

The program would report something like:
===================================================================

Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)

# CPUs: 2
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name:
Output Name:

Compressing data...
-------------------------------------------

Wall Clock: 0.176441 seconds

===================================================================

Example 5: pbzip2 -dv myfile.tar.bz2

This example will decompress the file "myfile.tar.bz2" into the decompressed file "myfile.tar". It will use the autodetected # of processors (or 2 processors if autodetect not supported). The switches -b, -r, and -1..-9 are not valid for decompression.

The program would report something like:
===================================================================

Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)

# CPUs: 2
-------------------------------------------
File #: 1 of 1
Input Name: myfile.tar.bz2
Output Name: myfile.tar

BWT Block Size: 900k
Input Size: 3236549 bytes
Decompressing data...
Output Size: 7428687 bytes
-------------------------------------------

Wall Clock: 1.154000 seconds

refer:
http://compression.ca/pbzip2/
----------------------------------------
Linux Setting processor affinity for a certain task or process

by nixcraft · 25 comments

When you are using SMP (Symmetric MultiProcessing) you might want to override the kernel's process scheduling and bind a certain process to a specific CPU(s).
But what is CPU affinity?

CPU affinity is nothing but a scheduler property that "bonds" a process to a given set of CPUs on the SMP system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity:

The scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications. For example, application such as Oracle (ERP apps) use # of cpusÂ per instance licensed. You can bound Oracle to specific CPU to avoid license problem. This is a really useful on large server having 4 or 8 CPUS

Setting processor affinity for a certain task or process using taskset command

taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. However taskset is not installed by default. You need to install schedutils (Linux scheduler utilities) package.
Install schedutils

Debian Linux:
# apt-get install schedutils
Red Hat Enterprise Linux:
# up2date schedutils
OR
# rpm -ivh schedutils*
Under latest version of Debian / Ubuntu Linux taskset is installed by default using util-linux package.

The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. For example:

* 0x00000001 is processor #0 (1st processor)
* 0x00000003 is processors #0 and #1
* 0x00000004 is processors #2 (3rd processor)

To set the processor affinity of process 13545 to processor #0 (1st processor) type following command:
# taskset 0x00000001 -p 13545
If you find a bitmask hard to use, then you can specify a numerical list of processors instead of a bitmask using -c flag:
# taskset -c 1 -p 13545
# taskset -c 3,4 -p 13545
Where,

* -p : Operate on an existing PID and not launch a new task (default is to launch a new task)

---------------------------
Linux audit files to see who made changes to a file

by Vivek Gite · 24 comments

This is one of the key questions many new sys admin ask:

How do I audit file events such as read / write etc? How can I use audit to see who changed a file in Linux?

The answer is to use 2.6 kernel’s audit system. Modern Linux kernel (2.6.x) comes with auditd daemon. It’s responsible for writing audit records to the disk. During startup, the rules in /etc/audit.rules are read by this daemon. You can open /etc/audit.rules file and make changes such as setup audit file log location and other option. The default file is good enough to get started with auditd.

In order to use audit facility you need to use following utilities
=> auditctl - a command to assist controlling the kernel’s audit system. You can get status, and add or delete rules into kernel audit system. Setting a watch on a file is accomplished using this command:

=> ausearch - a command that can query the audit daemon logs based for events based on different search criteria.

=> aureport - a tool that produces summary reports of the audit system logs.

Note that following all instructions are tested on CentOS 4.x and Fedora Core and RHEL 4/5 Linux.
Task: install audit package

The audit package contains the user space utilities for storing and searching the audit records generate by the audit subsystem in the Linux 2.6 kernel. CentOS/Red Hat and Fedora core includes audit rpm package. Use yum or up2date command to install package
# yum install audit
or
# up2date install audit

Auto start auditd service on boot
# ntsysv
OR
# chkconfig auditd on
Now start service:
# /etc/init.d/auditd start
How do I set a watch on a file for auditing?

Let us say you would like to audit a /etc/passwd file. You need to type command as follows:
# auditctl -w /etc/passwd -p war -k password-file

Where,

* -w /etc/passwd : Insert a watch for the file system object at given path i.e. watch file called /etc/passwd
* -p war : Set permissions filter for a file system watch. It can be r for read, w for write, x for execute, a for append.
* -k password-file : Set a filter key on a /etc/passwd file (watch). The password-file is a filterkey (string of text that can be up to 31 bytes long). It can uniquely identify the audit records produced by the watch. You need to use password-file string or phrase while searching audit logs.

In short you are monitoring (read as watching) a /etc/passwd file for anyone (including syscall) that may perform a write, append or read operation on a file.

Wait for some time or as a normal user run command as follows:
$ grep 'something' /etc/passwd
$ vi /etc/passwd

Following are more examples:
File System audit rules

Add a watch on "/etc/shadow" with the arbitrary filterkey "shadow-file" that generates records for "reads, writes, executes, and appends" on "shadow"
# auditctl -w /etc/shadow -k shadow-file -p rwxa
syscall audit rule

The next rule suppresses auditing for mount syscall exits
# auditctl -a exit,never -S mount
File system audit rule

Add a watch "tmp" with a NULL filterkey that generates records "executes" on "/tmp" (good for a webserver)
# auditctl -w /tmp -p e -k webserver-watch-tmp
syscall audit rule using pid

To see all syscalls made by a program called sshd (pid - 1005):
# auditctl -a entry,always -S all -F pid=1005
How do I find out who changed or accessed a file /etc/passwd?

Use ausearch command as follows:
# ausearch -f /etc/passwd
OR
# ausearch -f /etc/passwd | less
OR
# ausearch -f /etc/passwd -i | less
Where,

* -f /etc/passwd : Only search for this file
* -i : Interpret numeric entities into text. For example, uid is converted to account name.

Output:

----
type=PATH msg=audit(03/16/2007 14:52:59.985:55) : name=/etc/passwd flags=follow,open inode=23087346 dev=08:02 mode=file,644 ouid=root ogid=root rdev=00:00
type=CWD msg=audit(03/16/2007 14:52:59.985:55) : cwd=/webroot/home/lighttpd
type=FS_INODE msg=audit(03/16/2007 14:52:59.985:55) : inode=23087346 inode_uid=root inode_gid=root inode_dev=08:02 inode_rdev=00:00
type=FS_WATCH msg=audit(03/16/2007 14:52:59.985:55) : watch_inode=23087346 watch=passwd filterkey=password-file perm=read,write,append perm_mask=read
type=SYSCALL msg=audit(03/16/2007 14:52:59.985:55) : arch=x86_64 syscall=open success=yes exit=3 a0=7fbffffcb4 a1=0 a2=2 a3=6171d0 items=1 pid=12551 auid=unknown(4294967295) uid=lighttpd gid=lighttpd euid=lighttpd suid=lighttpd fsuid=lighttpd egid=lighttpd sgid=lighttpd fsgid=lighttpd comm=grep exe=/bin/grep

Let us try to understand output

* audit(03/16/2007 14:52:59.985:55) : Audit log time
* uid=lighttpd gid=lighttpd : User ids in numerical format. By passing -i option to command you can convert most of numeric data to human readable format. In our example user is lighttpd used grep command to open a file
* exe="/bin/grep" : Command grep used to access /etc/passwd file
* perm_mask=read : File was open for read operation

So from log files you can clearly see who read file using grep or made changes to a file using vi/vim text editor. Log provides tons of other information. You need to read man pages and documentation to understand raw log format.
Other useful examples

Search for events with date and time stamps. if the date is omitted, today is assumed. If the time is omitted, now is assumed. Use 24 hour clock time rather than AM or PM to specify time. An example date is 10/24/05. An example of time is 18:00:00.
# ausearch -ts today -k password-file
# ausearch -ts 3/12/07 -k password-file

Search for an event matching the given executable name using -x option. For example find out who has accessed /etc/passwd using rm command:
# ausearch -ts today -k password-file -x rm
# ausearch -ts 3/12/07 -k password-file -x rm

Search for an event with the given user name (UID). For example find out if user vivek (uid 506) try to open /etc/passwd:
# ausearch -ts today -k password-file -x rm -ui 506
# ausearch -k password-file -ui 506

refer:
http://www.cyberciti.biz/tips/linux-audit-files-to-see-who-made-changes-to-a-file.html

Linux experts tips

Monday, December 28, 2009

compression in smp+auditing

No comments:

Post a Comment

Followers

Blog Archive

About Me