yoctoprojectprofilingandtracingmanual.pdf资源-CSDN文库

需积分: 10 122 浏览量 2019-01-08 17:00:44 上传评论收藏 4.11MB PDF 举报

资源推荐

资源详情

资源评论

Scott Rifenbark

Scotty's Documentation Services, INC

<srifenbark@gmail.com>

Permission is granted to copy, distribute and/or modify this document under the terms of the Creative Commons Attribution-Share Alike 2.0 UK: England & Wales as published by

Creative Commons.

Manual Notes

This version of the Yocto Project Profiling and Tracing Manual is for the 2.5.1 release of the Yocto Project. To be sure you have the latest version of the

manual for this release, go to the Yocto Project documentation page and select the manual from that site. Manuals from the site are more up-to-date than

manuals derived from the Yocto Project released TAR files.

If you located this manual through a web search, the version of the manual might not be the one you want (e.g. the search might have returned a manual

much older than the Yocto Project version with which you are working). You can see all Yocto Project major releases by visiting the Releases page. If you

need a version of this manual for a different Yocto Project release, visit the Yocto Project documentation page and select the manual set by using the

"ACTIVE RELEASES DOCUMENTATION" or "DOCUMENTS ARCHIVE" pull-down menus.

To report any inaccuracies or problems with this manual, send an email to the Yocto Project discussion group at yocto@yoctoproject.com or log into the

freenode #yocto channel.

Revision History

Revision 1.4 April 2013

Released with the Yocto Project 1.4 Release.

Revision 1.5 October 2013

Released with the Yocto Project 1.5 Release.

Revision 1.5.1 January 2014

Released with the Yocto Project 1.5.1 Release.

Revision 1.6 April 2014

Released with the Yocto Project 1.6 Release.

Revision 1.7 October 2014

Released with the Yocto Project 1.7 Release.

Revision 1.8 April 2015

Released with the Yocto Project 1.8 Release.

Revision 2.0 October 2015

Released with the Yocto Project 2.0 Release.

Revision 2.1 April 2016

Released with the Yocto Project 2.1 Release.

Revision 2.2 October 2016

Released with the Yocto Project 2.2 Release.

Revision 2.3 May 2017

Released with the Yocto Project 2.3 Release.

Revision 2.4 October 2017

Released with the Yocto Project 2.4 Release.

Revision 2.5 May 2018

Released with the Yocto Project 2.5 Release.

Revision 2.5.1 September 2018

The initial document released with the Yocto Project 2.5.1 Release.

Table of Contents

1. Yocto Project Profiling and Tracing Manual

1.1. Introduction

1.2. General Setup

2. Overall Architecture of the Linux Tracing and Profiling Tools

2.1. Architecture of the Tracing and Profiling Tools

3. Basic Usage (with examples) for each of the Yocto Tracing Tools

3.1. perf

3.1.1. Setup

3.1.2. Basic Usage

3.1.3. Documentation

3.2. ftrace

3.2.1. Setup

3.2.2. Basic ftrace usage

3.2.3. The 'trace events' Subsystem

3.2.4. trace-cmd/kernelshark

3.2.5. Documentation

3.3. systemtap

3.3.1. Setup

3.3.2. Running a Script on a Target

3.3.3. Documentation

3.4. Sysprof

3.4.1. Setup

3.4.2. Basic Usage

3.4.3. Documentation

3.5. LTTng (Linux Trace Toolkit, next generation)

3.5.1. Setup

3.5.2. Collecting and Viewing Traces

3.5.3. Documentation

3.6. blktrace

3.6.1. Setup

3.6.2. Basic Usage

3.6.3. Documentation

4. Real-World Examples

4.1. Slow Write Speed on Live Images

Chapter 1. Yocto Project Profiling and Tracing Manual¶

Table of Contents

1.1. Introduction

1.2. General Setup

1.1. Introduction¶

Yocto bundles a number of tracing and profiling tools - this 'HOWTO' describes their basic usage and shows by example how to make use of them to examine application and

system behavior.

The tools presented are for the most part completely open-ended and have quite good and/or extensive documentation of their own which can be used to solve just about any

problem you might come across in Linux. Each section that describes a particular tool has links to that tool's documentation and website.

The purpose of this 'HOWTO' is to present a set of common and generally useful tracing and profiling idioms along with their application (as appropriate) to each tool, in the

context of a general-purpose 'drill-down' methodology that can be applied to solving a large number (90%?) of problems. For help with more advanced usages and problems,

please see the documentation and/or websites listed for each tool.

The final section of this 'HOWTO' is a collection of real-world examples which we'll be continually adding to as we solve more problems using the tools - feel free to add your own

examples to the list!

1.2. General Setup¶

Most of the tools are available only in 'sdk' images or in images built after adding 'tools-profile' to your local.conf. So, in order to be able to access all of the tools described here,

please first build and boot an 'sdk' image e.g.

$ bitbake core-image-sato-sdk

or alternatively by adding 'tools-profile' to the EXTRA_IMAGE_FEATURES line in your local.conf:

EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile"

If you use the 'tools-profile' method, you don't need to build an sdk image - the tracing and profiling tools will be included in non-sdk images as well e.g.:

$ bitbake core-image-sato

Note

By default, the Yocto build system strips symbols from the binaries it packages, which makes it difficult to use some of the tools.

You can prevent that by setting the INHIBIT_PACKAGE_STRIP variable to "1" in your local.conf when you build the image:

INHIBIT_PACKAGE_STRIP = "1"

The above setting will noticeably increase the size of your image.

If you've already built a stripped image, you can generate debug packages (xxx-dbg) which you can manually install as needed.

To generate debug info for packages, you can add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example:

EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs"

Additionally, in order to generate the right type of debuginfo, we also need to add the following to local.conf:

PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory'

Chapter 2. Overall Architecture of the Linux Tracing and Profiling Tools¶

Table of Contents

2.1. Architecture of the Tracing and Profiling Tools

2.1. Architecture of the Tracing and Profiling Tools¶

It may seem surprising to see a section covering an 'overall architecture' for what seems to be a random collection of tracing tools that together make up the Linux tracing and

profiling space. The fact is, however, that in recent years this seemingly disparate set of tools has started to converge on a 'core' set of underlying mechanisms:

static tracepoints

dynamic tracepoints

kprobes

uprobes

the perf_events subsystem

debugfs

Tying it Together: Rather than enumerating here how each tool makes use of these common mechanisms, textboxes like this will make note of the specific usages in each

tool as they come up in the course of the text.

Chapter 3. Basic Usage (with examples) for each of the Yocto Tracing Tools¶

Table of Contents

3.1. perf

3.1.1. Setup

3.1.2. Basic Usage

3.1.3. Documentation

3.2. ftrace

3.2.1. Setup

3.2.2. Basic ftrace usage

3.2.3. The 'trace events' Subsystem

3.2.4. trace-cmd/kernelshark

3.2.5. Documentation

3.3. systemtap

3.3.1. Setup

3.3.2. Running a Script on a Target

3.3.3. Documentation

3.4. Sysprof

3.4.1. Setup

3.4.2. Basic Usage

3.4.3. Documentation

3.5. LTTng (Linux Trace Toolkit, next generation)

3.5.1. Setup

3.5.2. Collecting and Viewing Traces

3.5.3. Documentation

3.6. blktrace

3.6.1. Setup

3.6.2. Basic Usage

3.6.3. Documentation

This chapter presents basic usage examples for each of the tracing tools.

3.1. perf¶

The 'perf' tool is the profiling and tracing tool that comes bundled with the Linux kernel.

Don't let the fact that it's part of the kernel fool you into thinking that it's only for tracing and profiling the kernel - you can indeed use it to trace and profile just the kernel, but

you can also use it to profile specific applications separately (with or without kernel context), and you can also use it to trace and profile the kernel and all applications on the

system simultaneously to gain a system-wide view of what's going on.

In many ways, perf aims to be a superset of all the tracing and profiling tools available in Linux today, including all the other tools covered in this HOWTO. The past couple of

years have seen perf subsume a lot of the functionality of those other tools and, at the same time, those other tools have removed large portions of their previous functionality

and replaced it with calls to the equivalent functionality now implemented by the perf subsystem. Extrapolation suggests that at some point those other tools will simply become

completely redundant and go away; until then, we'll cover those other tools in these pages and in many cases show how the same things can be accomplished in perf and the

other tools when it seems useful to do so.

The coverage below details some of the most common ways you'll likely want to apply the tool; full documentation can be found either within the tool itself or in the man pages

at perf(1).

3.1.1. Setup¶

For this section, we'll assume you've already performed the basic setup outlined in the General Setup section.

In particular, you'll get the most mileage out of perf if you profile an image built with the following in your local.conf file:

INHIBIT_PACKAGE_STRIP = "1"

perf runs on the target system for the most part. You can archive profile data and copy it to the host for analysis, but for the rest of this document we assume you've ssh'ed to

the host and will be running the perf commands on the target.

3.1.2. Basic Usage¶

The perf tool is pretty much self-documenting. To remind yourself of the available commands, simply type 'perf', which will show you basic usage along with the available perf

subcommands:

root@crownbay:~# perf

usage: perf [--version] [--help] COMMAND [ARGS]

The most commonly used perf commands are:

annotate Read perf.data (created by perf record) and display annotated code

archive Create archive with object files with build-ids found in perf.data file

bench General framework for benchmark suites

buildid-cache Manage build-id cache.

buildid-list List the buildids in a perf.data file

diff Read two perf.data files and display the differential profile

evlist List the event names in a perf.data file

inject Filter to augment the events stream with additional information

kmem Tool to trace/measure kernel memory(slab) properties

kvm Tool to trace/measure kvm guest os

list List all symbolic event types

lock Analyze lock events

probe Define new dynamic tracepoints

record Run a command and record its profile into perf.data

report Read perf.data (created by perf record) and display the profile

sched Tool to trace/measure scheduler properties (latencies)

script Read perf.data (created by perf record) and display trace output

stat Run a command and gather performance counter statistics

test Runs sanity tests.

timechart Tool to visualize total system behavior during a workload

top System profiling tool.

See 'perf help COMMAND' for more information on a specific command.

3.1.2.1. Using perf to do Basic Profiling¶

As a simple test case, we'll profile the 'wget' of a fairly large file, which is a minimally interesting case because it has both file and network I/O aspects, and at least in the case

of standard Yocto images, it's implemented as part of busybox, so the methods we use to analyze it can be used in a very similar way to the whole host of supported busybox

applets in Yocto.

root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \

wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2

The quickest and easiest way to get some basic overall data about what's going on for a particular workload is to profile it using 'perf stat'. 'perf stat' basically profiles using a

few default counters and displays the summed counts at the end of the run:

root@crownbay:~# perf stat wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA

Performance counter stats for 'wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2':

4597.223902 task-clock # 0.077 CPUs utilized

23568 context-switches # 0.005 M/sec

68 CPU-migrations # 0.015 K/sec

241 page-faults # 0.052 K/sec

3045817293 cycles # 0.663 GHz

<not supported> stalled-cycles-frontend

<not supported> stalled-cycles-backend

858909167 instructions # 0.28 insns per cycle

165441165 branches # 35.987 M/sec

19550329 branch-misses # 11.82% of all branches

59.836627620 seconds time elapsed

Many times such a simple-minded test doesn't yield much of interest, but sometimes it does (see Real-world Yocto bug (slow loop-mounted write speed)).

Also, note that 'perf stat' isn't restricted to a fixed set of counters - basically any event listed in the output of 'perf list' can be tallied by 'perf stat'. For example, suppose we

wanted to see a summary of all the events related to kernel memory allocation/freeing along with cache hits and misses:

root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA

Performance counter stats for 'wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2':

5566 kmem:kmalloc

125517 kmem:kmem_cache_alloc

0 kmem:kmalloc_node

0 kmem:kmem_cache_alloc_node

34401 kmem:kfree

69920 kmem:kmem_cache_free

133 kmem:mm_page_free

41 kmem:mm_page_free_batched

11502 kmem:mm_page_alloc

11375 kmem:mm_page_alloc_zone_locked

0 kmem:mm_page_pcpu_drain

0 kmem:mm_page_alloc_extfrag

66848602 cache-references

2917740 cache-misses # 4.365 % of all cache refs

44.831023415 seconds time elapsed

So 'perf stat' gives us a nice easy way to get a quick overview of what might be happening for a set of events, but normally we'd need a little more detail in order to understand

what's going on in a way that we can act on in a useful way.

To dive down into a next level of detail, we can use 'perf record'/'perf report' which will collect profiling data and present it to use using an interactive text-based UI (or simply as

text if we specify --stdio to 'perf report').

As our first attempt at profiling this workload, we'll simply run 'perf record', handing it the workload we want to profile (everything after 'perf record' and any perf options we hand

it - here none - will be executed in a new shell). perf collects samples until the process exits and records them in a file named 'perf.data' in the current working directory.

root@crownbay:~# perf record wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA

[ perf record: Woken up 1 times to write data ]

[ perf record: Captured and wrote 0.176 MB perf.data (~7700 samples) ]

To see the results in a 'text-based UI' (tui), simply run 'perf report', which will read the perf.data file in the current working directory and display the results in an interactive UI:

root@crownbay:~# perf report

The above screenshot displays a 'flat' profile, one entry for each 'bucket' corresponding to the functions that were profiled during the profiling run, ordered from the most popular

to the least (perf has options to sort in various orders and keys as well as display entries only above a certain threshold and so on - see the perf documentation for details). Note

that this includes both userspace functions (entries containing a [.]) and kernel functions accounted to the process (entries containing a [k]). (perf has command-line modifiers

that can be used to restrict the profiling to kernel or userspace, among others).

Notice also that the above report shows an entry for 'busybox', which is the executable that implements 'wget' in Yocto, but that instead of a useful function name in that entry, it

displays a not-so-friendly hex value instead. The steps below will show how to fix that problem.

Before we do that, however, let's try running a different profile, one which shows something a little more interesting. The only difference between the new profile and the previous

one is that we'll add the -g option, which will record not just the address of a sampled function, but the entire callchain to the sampled function as well:

root@crownbay:~# perf record -g wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2

Connecting to downloads.yoctoproject.org (140.211.169.59:80)

linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA

[ perf record: Woken up 3 times to write data ]

[ perf record: Captured and wrote 0.652 MB perf.data (~28476 samples) ]

root@crownbay:~# perf report

Using the callgraph view, we can actually see not only which functions took the most time, but we can also see a summary of how those functions were called and learn

something about how the program interacts with the kernel in the process.

Notice that each entry in the above screenshot now contains a '+' on the left-hand side. This means that we can expand the entry and drill down into the callchains that feed into

that entry. Pressing 'enter' on any one of them will expand the callchain (you can also press 'E' to expand them all at the same time or 'C' to collapse them all).

In the screenshot above, we've toggled the __copy_to_user_ll() entry and several subnodes all the way down. This lets us see which callchains contributed to the profiled

__copy_to_user_ll() function which contributed 1.77% to the total profile.

As a bit of background explanation for these callchains, think about what happens at a high level when you run wget to get a file out on the network. Basically what happens is

that the data comes into the kernel via the network connection (socket) and is passed to the userspace program 'wget' (which is actually a part of busybox, but that's not

important for now), which takes the buffers the kernel passes to it and writes it to a disk file to save it.

The part of this process that we're looking at in the above call stacks is the part where the kernel passes the data it's read from the socket down to wget i.e. a copy-to-user.

Notice also that here there's also a case where the hex value is displayed in the callstack, here in the expanded sys_clock_gettime() function. Later we'll see it resolve to a

userspace function call in busybox.

剩余22页未读，继续阅读

评论收藏

内容反馈