GSoC’13 Summary-1: Profiling Python and its c-level code.

Arink Verma
3 min readSep 23, 2013

--

The very first objective to find bottlenecks is profiling for time or space. During the project, I have used few tools for profiling and visualizing data of NumPy execution flow.

Small NumPy arrays are very similar to Python scalars but NumPy incurs a fair amount of extra overhead for simple operations. For large arrays this doesn’t matter, but for code that manipulates a lot of small pieces of data, it can be a serious bottleneck.
For example

In [1]: x = 1.0In [2]: numpy_x = np.asarray(x)In [3]: timeit x + x
10000000 loops, best of 3: 61 ns per loop
In [4]: timeit numpy_x + numpy_x
1000000 loops, best of 3: 1.66 us per loop

This project involved

  • profiling simple operations like the above
  • determining possible bottlenecks
  • devising improved algorithms to solve them, with the goal of getting the NumPy time as close as possible to the Python time.

Profiling tools

The very first objective to find bottlenecks is profiling for time or space. During the project, I have used few tools for profiling and visualizing data of NumPy execution flow.

Google profiling tool

This is the suit of different tools provided by Google. It Includes TCMalloc, heap-checker, heap-profiler, and CPU-profiler. As a need of the project was to reduce time, so CPU-Profiler was used.

Setting up Gperftools

Following are the steps used to set up a python C-level profiler on Ubuntu 13.04. (For any other system, options see [ 1])

  1. Make sure to build it from the source. Clone svn repository from http://gperftools.googlecode.com/svn/trunk/
  2. In order to build gperftools checked out from the subversion repository you need to have autoconf, automake and libtool installed.
  3. First, run ./autogen.sh script which generate ./configure and other files. Then run ./configure
  4. ’make check’, to run any self-tests that come with the package. Check is optional but recommended to use
  5. After all, the test gets passed, type ‘sudo make install’ to install the programs and any data files and documentation.

Running CPU profiler

I evoked the profiler manually before running the sample code. Consider python code to profiled is in num.py file.

$CPUPROFILE=num.py.prof LD_PRELOAD=/usr/lib/libprofiler.so python num.py

Alternatively, include the profiler in code as follow

import ctypes
import timeit
profiler = ctypes.CDLL(\"libprofiler.so\")
profiler.ProfilerStart(\"num.py.prof\")
timeit.timeit(\'x+y\',number=10000000,
setup=\'import numpy as np;x = np.asarray(1.0);y = np.asarray(2.0);\')
profiler.ProfilerStop()

To analyze stats use

$pprof --gv ./num.py num.py.prof

Callgraph generated by gperftools. Each block represents method with local and cumulative percentage.

OProfile

OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

Setting up Oprofile

  1. Access the source via Git : git clone git://git.code.sf.net/p/oprofile/oprofile
  2. Automake and autoconf is needed.
  3. Run autogen.sh before attempting to build as normal.

Running CPU profiler

$opcontrol --callgraph=16
$opcontrol --start
$python num.py
$opcontrol --stop
$opcontrol --dump
$opreport -cgf | gprof2dot.py -f oprofile | dot -Tpng -o output.png
Callgraph is visualized with help of script gprof2dot.py

Perf from Linux-tools

Perf provides rich generalized abstractions over hardware-specific capabilities. Among others, it provides per task, per CPU, and per-workload counters, sampling on top of these, and source code event annotation.

Setting up perf

$sudo apt-get install linux-tools-common
$sudo apt-get install linux-tools-

Running Profiler and visualizing data as flame-graph

$perf record -a -g -F 1000 ./num.py
$perf script | ./stackcollapse-perf.pl > out.perf-folded
$cat out.perf-folded | ./flamegraph.pl > perf-numpy.svg

The first command runs perf in sampling mode (polling) at 1000 Hertz (-F 1000; more on this later) across all CPUs (-a), capturing stack traces so that a call graph (-g) of function ancestry can be generated later. The samples are saved in a perf.data

Script to visualize the above flame graph is at https://github.com/brendangregg/FlameGraph.

Originally published at http://arinkverma1.wordpress.com on September 23, 2013.

--

--

Arink Verma

Code, arts, process and aspirations. co-Founded GreedyGame | IIT Ropar. Found at www.arinkverma.in