GSoC’13 Summary-1: Profiling Python and its c-level code.
The very first objective to find bottlenecks is profiling for time or space. During the project, I have used few tools for profiling and visualizing data of NumPy execution flow.
Small NumPy arrays are very similar to Python scalars but NumPy incurs a fair amount of extra overhead for simple operations. For large arrays this doesn’t matter, but for code that manipulates a lot of small pieces of data, it can be a serious bottleneck.
For example
In [1]: x = 1.0In [2]: numpy_x = np.asarray(x)In [3]: timeit x + x
10000000 loops, best of 3: 61 ns per loopIn [4]: timeit numpy_x + numpy_x
1000000 loops, best of 3: 1.66 us per loop
This project involved
- profiling simple operations like the above
- determining possible bottlenecks
- devising improved algorithms to solve them, with the goal of getting the NumPy time as close as possible to the Python time.
Profiling tools
The very first objective to find bottlenecks is profiling for time or space. During the project, I have used few tools for profiling and visualizing data of NumPy execution flow.
Google profiling tool
This is the suit of different tools provided by Google. It Includes TCMalloc, heap-checker, heap-profiler, and CPU-profiler. As a need of the project was to reduce time, so CPU-Profiler was used.
Setting up Gperftools
Following are the steps used to set up a python C-level profiler on Ubuntu 13.04. (For any other system, options see [ 1])
- Make sure to build it from the source. Clone svn repository from http://gperftools.googlecode.com/svn/trunk/
- In order to build gperftools checked out from the subversion repository you need to have autoconf, automake and libtool installed.
- First, run ./autogen.sh script which generate ./configure and other files. Then run ./configure
- ’make check’, to run any self-tests that come with the package. Check is optional but recommended to use
- After all, the test gets passed, type ‘sudo make install’ to install the programs and any data files and documentation.
Running CPU profiler
I evoked the profiler manually before running the sample code. Consider python code to profiled is in num.py file.
$CPUPROFILE=num.py.prof LD_PRELOAD=/usr/lib/libprofiler.so python num.py
Alternatively, include the profiler in code as follow
import ctypes
import timeit
profiler = ctypes.CDLL(\"libprofiler.so\")
profiler.ProfilerStart(\"num.py.prof\")
timeit.timeit(\'x+y\',number=10000000,
setup=\'import numpy as np;x = np.asarray(1.0);y = np.asarray(2.0);\')
profiler.ProfilerStop()
To analyze stats use
$pprof --gv ./num.py num.py.prof
OProfile
OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.
Setting up Oprofile
- Access the source via Git : git clone git://git.code.sf.net/p/oprofile/oprofile
- Automake and autoconf is needed.
- Run autogen.sh before attempting to build as normal.
Running CPU profiler
$opcontrol --callgraph=16
$opcontrol --start
$python num.py
$opcontrol --stop
$opcontrol --dump
$opreport -cgf | gprof2dot.py -f oprofile | dot -Tpng -o output.png
Perf from Linux-tools
Perf provides rich generalized abstractions over hardware-specific capabilities. Among others, it provides per task, per CPU, and per-workload counters, sampling on top of these, and source code event annotation.
Setting up perf
$sudo apt-get install linux-tools-common
$sudo apt-get install linux-tools-
Running Profiler and visualizing data as flame-graph
$perf record -a -g -F 1000 ./num.py
$perf script | ./stackcollapse-perf.pl > out.perf-folded
$cat out.perf-folded | ./flamegraph.pl > perf-numpy.svg
The first command runs perf in sampling mode (polling) at 1000 Hertz (-F 1000; more on this later) across all CPUs (-a), capturing stack traces so that a call graph (-g) of function ancestry can be generated later. The samples are saved in a perf.data
Originally published at http://arinkverma1.wordpress.com on September 23, 2013.