LOSUG February – DTrace

Finally. I have a chance to write up the interesting introduction to DTrace given to a well attended group by Jim Mauro. This man loves to talk about DTrace, a fact which came across quickly on the night (and he also told us the same)!

I’ve hardly used DTrace myself, having spent the latter part of my Unix sysadmin career mainly on Linux platforms. All the text below is straight from my notes of Jim’s talk. Inaccuracies will be all mine.

First of all: the shameless plug. Yes, The DTrace Cookbook will be available soon – 1200 pages of DTrace tips and recipes. See www.dtracebook.com (actually can’t find this but turned up some youtube videos with Jim talking about the book).

Jim wanted a main take-away from the talk to be that DTrace was complicated, but by neccessity as it was designed to look at complex systems. It is like an MRI scan. The output is complex but, like an MRI scan operator you don’t need much experience to use or learn. Interpreting the output is where experience counts.

Within the DTrace toolkit, which is all open source, you get DTrace plus perl or shell scripts. The three main DTrace components are Probes, Providers and Consumers.

Probes can insert codes dynamically to unmodified running code by altering it’s image in memory. Typically this is done at the entry or exit point of functions.

A Provider is a library of probes and used to manage probes with sensible names, e.g. IO. In Jim’s experience 50% of problems are due to IO. There is a lot of code written to do disk IO and a key question is often who is starting the IO.

Blank fields in a provider specification match all four probe fields. (Unfortunately I didn’t get the examples down. Incidentally the slides should be available from the losug website http://opensolaris.org/jive/forum.jspa?forumID=64).

Consumers are the commands: dtrace, lockstat, plockstat and intrstat.

DTrace User Components: comprise predicates and actions. Traditional performance analysis involves gathering a lot of data followed by 80% of the effort pruning the data down and 20% of the time looking at the resulting good data. Predicates in DTrace do this pruning for you.

A D program: syscall is a very useful provider. e.g. collect some data on entry point of all syscalls. Use D when the cli gets complex. DTrace has aggregating functions and variables. “@” indicates and aggregating variable which are akin to associative arrays – the index is a dtrace variable.

Getting Started: DTrace was created to debug production systems. Previously the right tools were not available. You had to core dump a running system! DTrace is safe and the probe effect is minimal. It has a built-in watchdog which turns DTrace off if it detects problems. DTrace is not necessarily the first tool to use.

Performance Metrics: How Fast (throughput) / How Long (latency) / How Many (IOPS) / How Much (utilisation).

DTrace – Getting the Big Picture: After the “stat” tools, use the “big” providers.

Getting Strated One-Liners: looking at CPU: profile provider, time based data collection. Use an odd number (because housekeeping is done every 10ms). tick can exit a script after a period. Even if you can’t read stack traces, you can get useful hints from looking.

System Metrics – Example: sysinfo procider.

Memory One-Liners: vminfo.

DTrace can “connect the dots”.

Well, that is the end of my notes, which doesn’t seem like much for 90 mins of fast chat. In my defence, Jim is a difficult talker to make notes on and there were a lot of examples! I noticed a camera at the back of the room on my way out so perhaps a video of the event is available from LOSUG.

There is, of course, a wikipedia entry for DTrace which can be found at http://en.wikipedia.org/wiki/DTrace.

While DTrace is undoubtably brilliant, the main drawback is of course that it is not available for more systems.