Novell Home

OProfile -- Linux Profiling Tool

Novell Cool Solutions: Feature
By Darren R. Davis

Digg This - Slashdot This

Posted: 23 Mar 2005
 

So, you have created that brand new killer Linux application for SUSE Linux and you're running it on the latest hardware. While testing, you hypothesize that your recently purchased server is capable of giving you a little more out of your application. How do you go about figuring out where you can gain more performance?

Well, generally you would go grab a profiling tool. But, in the past, you would have to build a version of your application with profiling turned on in the compiler. What if you have a system with a collection of running applications and you don't have the option of building with profiling? What if you want to profile a collection of applications and their impact on the server? Well, enter the profiling tool OProfile.

OProfile consists of a loadable kernel module and a system daemon process that collects sample data from a running system. It also has several post-profiling tools for taking the data and turning it into useful information about your system. OProfile takes advantage of the hardware performance counters available in today's microprocessors to enable profiling of the entire system. OProfile is capable of profiling all code including the kernel, kernel modules, kernel interrupt handlers, system shared libraries, and the applications.

Features outlined from the OProfile web site ( http://oprofile.sourceforge.net/ ):

Unobtrusive

No special recompilations, wrapper libraries or the like are necessary. Even debug symbols (-g option to gcc) are not necessary unless you want to produce annotated source.
No kernel patch is needed - just insert the module.

System-wide profiling

All code running on the system is profiled, enabling analysis of system performance.

Performance counter support

Enables collection of various low-level data, and association with particular sections of code.

Call-graph support

With an x86 2.6 kernel, OProfile can provide gprof-style call-graph profiling data.

Low overhead

OProfile has a typical overhead of 1-8%, dependent on sampling frequency and workload.

Post-profile analysis

Profile data can be produced on the function-level or instruction-level detail. Source trees annotated with profile information can be created. A hit list of applications and functions that take the most time across the whole system can be produced.

System support

OProfile works across a range of CPUs, include the Intel range, AMD's Athlon and AMD64 processors range, the Alpha, and more. OProfile will work against almost any 2.2, 2.4 and 2.6 kernels, and works on both UP and SMP systems from desktops to the scariest NUMAQ boxes.

OK, you sold on using OProfile and now you're ready to get started profiling your application or system. First, you need to make sure that the OProfile package has been installed. SUSE Linux provides a OProfile package ready to be used. Using YaST, go to the Install and Remove Software option. Type oprofile in the search field and click the search button. Select the oprofile package for installation, and install it. Now your ready to get started.

First, we need to be the root user to use OProfile. So, either login as the root user, or use the su command and switch to the root user. Next, we need to setup OProfile. We have two options. We can either profile our application with, or without the Linux kernel. If we want to profile with the Linux kernel, we need to reference the uncompressed kernel image file in the /boot directory. To do that, we do the following:

	# opcontrol --vmlinux=/boot/vmlinux-`uname -r`

Now, most likely, your /boot/vmlinux file is compressed (you can tell by checking for .gz file extension on vmlinux image). The opcontrol command needs the uncompressed version of your kernel image, so you may need to run gunzip on your vmlinux file.

Now, if you don't want to profile the Linux kernel, you need to do this:

	# opcontrol --no-vmlinux

So, let's assume we are going to profile without the Linux kernel. We are now ready to start the oprofile daemon process to start collecting profile data. We do this with:

	
	# opcontrol --start

This begins the data collection process. It is now time to start running your application. Remember, we did not need to build our application with the gcc option for profiling '-pg'. After running your application through it's paces, we need to dump the collected profile data. There are a couple of ways to do this. We can either shutdown profiling all together, or we can just tell OProfile to dump the collected data, but it will continue to collect more data.

To shutdown OProfile:

	# opcontrol --shutdown

To just dump the collected data:

	# opcontrol --dump

The profile data is written to the directory /var/lib/oprofile/samples

Now if for some reason you want to clear the profile data, at any time you can just do a reset with:

	# opcontrol --reset

That is pretty much the quick way to get started. Remember though, OProfile is quite a sophisticated tool, so there are many more options to opcontrol for controlling your profiling session. For example, if we want to take advantage of the performance counter registers in our CPU to watch a couple of specific events, you would do something like:

	# opcontrol --event=CPU_CLK_UNHALTED:400000 \
		--event=DATA_MEM_REFS:10000
	# opcontrol --vmlinux=/boot/vmlinux-2.6.8-24.11
	# opcontrol --start
I recommend reading the OProfile manual at:
http://oprofile.sourceforge.net/doc/index.html

OK, now we have collected all this data of our running application, how do we view it? We use the opreport command to generate us a report:

	# opreport

	CPU: CPU with timer interrupt, speed 0 Mhz (estimated)
	Profiling through timer interrupt
	          TIMER:0|
	  samples|      %|
	------------------
	     3122 98.5791 no-vmlinux
	       16  0.5052 libc.so.6
	        8  0.2526 bash
	        4  0.1263 ld-2.3.3.so
	        4  0.1263 libgdk_pixbuf-2.0.so.0.400.9
	        3  0.0947 libglib-2.0.so.0.400.6
	        2  0.0632 libgobject-2.0.so.0.400.6
	        2  0.0632 libgtk-x11-2.0.so.0.400.9
	        1  0.0316 grep
	        1  0.0316 libpthread.so.0
	        1  0.0316 Xorg
	        1  0.0316 libX11.so.6.2
	        1  0.0316 libsw645li.so
	        1  0.0316 libvclplug_gtk645li.so

The default for opreport is to give a summary view. You can get a very detailed report by using the '-l' option to opreport. The output would be too long to include here in this article, but I will leave it as an exercise for the reader. When you get a chance try:

	# opreport -l /boot/vmlinux-`uname -r`

OK, now that we know how to basically control OProfile, it is time for us to profile a real application. So, where would any good developer article on open source be without the essential application of Hello World. Now, I could write that application quickly, but I am going to take advantage of the open source community and use the GNU version of this program.

Yes, that is right, there is a GNU version of this program at:

http://www.gnu.org/software/hello/hello.html

Now this is no simple version of hello world. It is capable of taking command line arguments to modify it's behavior and support greetings in many different languages based on your chosen language environment. So after downloading the source from the provided URL, we need to build our application. I grabbed the hello-2.1.1 version so lets extract the tar file, configure the application, and make it.

	~> tar -xzvf hello-2.1.1.tar.gz
	~> cd hello-2.1.1/
	~/hello-2.1.1> ./configure
	...
	~/hello-2.1.1> make
	...
	~>

Given that you have the developer tools installed (gcc, etc.), you should now have the built hello application in ~/hello-2.1.1/src/hello

Now we are ready to profile this application, though it is not likely this is going to take a huge amount of system resources. So, lets setup OProfile. I am not going to include the kernel during profiling. Also, in order to actually even show up on the radar map, I am going to run the hello world application 100 times with a quick shell script.

Lets get started:

	# opcontrol --no-vmlinux
	# opcontrol --start
	Profiler running.
	# ~/hello-2.1.1/src/hello
	Hello, world!

	... 100 times ...

	# opcontrol --stop
	Stopping profiling.
	# opcontrol --dump
	# opreport
	CPU: CPU with timer interrupt, speed 0 MHz (estimated)
	Profiling through timer interrupt
	          TIMER:0|
	  samples|      %|
	------------------
	   235450 97.9601 no-vmlinux
	     1710  0.7115 libpixmap.so
	      677  0.2817 libgdk_pixbuf-2.0.so.0.400.9
	      571  0.2376 libc.so.6
	      388  0.1614 Xorg
	      332  0.1381 libglib-2.0.so.0.400.6
	      283  0.1177 libvte.so.4.1.1
	      233  0.0969 libgobject-2.0.so.0.400.6
	      136  0.0566 libXft.so.2.1.2
	      125  0.0520 libpthread.so.0
	       98  0.0408 libgdk-x11-2.0.so.0.400.9
	       94  0.0391 ld-2.3.3.so
	       47  0.0196 opreport
	       35  0.0146 libstdc++.so.5.0.7
	       34  0.0141 bash
	       34  0.0141 libgtk-x11-2.0.so.0.400.9
	       24  0.0100 oprofiled
	       22  0.0092 libX11.so.6.2
	       18  0.0075 libXrender.so.1.2.2
	        9  0.0037 libqt-mt.so.3.3.3
	        5  0.0021 libreadline.so.5.0
	        4  0.0017 gnome-terminal
	        4  0.0017 libkdecore.so.4.2.0
	        4  0.0017 libXt.so.6.0
	        3  0.0012 libgthread-2.0.so.0.400.6
	        2 8.3e-04 ps
	        2 8.3e-04 hello
	        2 8.3e-04 libcppu.so.3.1.0
	        2 8.3e-04 libvclplug_gtk645li.so
	        1 4.2e-04 grep
	        1 4.2e-04 gnome-smproxy
	        1 4.2e-04 metacity
	        1 4.2e-04 libwnck-1.so.4.9.0
	        1 4.2e-04 libdtransX11645li.so

	# opcontrol --shutdown
	Killing daemon.

So, from the output of opreport, we can see even running hello a 100 times, our application is taking a very small amount of system resources. You can also see that our application exercised more of the X Windowing system (since I ran the application in a gnome-terminal) then resources it used itself. Maybe we should optimize the X Windowing system? ;-)

There are many tools provided by OProfile that can help you with your profiling task. Here is a quick summary of them:

op_help

This utility lists the available events you can profile and short descriptions of them.

opcontrol

Used for controlling the OProfile data collection.

opreport

Used for retrieving the profile data.

opannotate

Used for producing annotated source, assembly, or mixed source and assembly. Application must be compiled with debugging symbols to get source level annotation.

opstack

Used for getting call-graph profile output. This requires x86 and the Linux 2.6 kernel (SLES 9, SUSE Pro 9.2, Novell Linux Desktop).

opgrprof

Used for getting gprof-style data file output for a binary. Used with gprof -p.

oparchive

Used to collect executables, debug information, and sample files into an archive. The files are copied into an archive that can be sent to another person or machine for further analysis.

op_import

Used to convert sample database files from a foreign binary format (ABI) to the native format. This is useful only when moving sample files between hosts for analysis on platforms other than the one used for collection.

So, given that quick introduction, how does this ?bad boy? of the Linux profiling world do it's amazing work? It uses Voodoo. OK, maybe not, it just sounds like Voodoo. OProfile is a statistical continuous profiler. It continuously samples the saved program counter from an interrupt handler and converts the runtime program counter into useful data for the system profiler. To do this, it takes a stream of sampled values, along with information on which task was running when the interrupt occurred, and converts it into a file offset against a particular binary file.

What does it mean to be a statistical continuous profiler? Think of a simplified system where only two tasks are running. Task A only takes 1 cycle to execute compared to Task B which takes 99 cycles to execute. If we run our system for a 100 cycles and we interrupt the system at a regular event say every clock cycle, we should see that 1% of the time we were in Task A, and 99% of the time we were in Task B. OProfile has the capability of doing analysis on many types of system events since you can set the CPU performance counters to measure the number of events that occurred like cache misses or pageouts. Many of the modern CPUs have multiple counters to take advantage of. Even if the CPU does not have these performance counters, we can take advantage of the kernel timer interrupt to drive a simple time based measurement.

I recommend that the developer take the time to learn this powerful tool and to profile a set of applications to understand the output from OProfile. This is an excellent tool to have in your toolbox when you quickly need to determine what is impacting your application.


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell