Searching around the web will reveal a number of debugging setup guides. There are lots of little tips and tricks that you pick up through a career of figuring out why your production code is misbehaving, and it's helpful to jot it all down in one place.
This is my toolkit. There are many like it, but this one is mine.
More specifically, this is for dealing with .Net applications on Windows. I might create something similar for dealing with java on Linux at some point.
- Windows Driver Kit 8.1 (requires VS2013). WDK 8 is no longer supported.
- SOSEX 4
- Psscor4 Managed-Code Debugging Extension for WinDbg
Alternatively, you can download my zip file of these tools. No installation needed, just copy it where you need it. Of course, if you don't trust me, get everything from source. And get permission from your friendly sysadmin before putting this stuff on a production box.
Create directories on a disk with a couple of gigabytes free space:
mkdir C:\Symbols mkdir C:\SymbolCache
Create the following environment variables:
Add any local/app symbols. For instance if you have an application and
associated PDB files in
C:\> "C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x86\symstore" add /f "C:\temp\PDB\*.*" /s c:\Symbols /t "Debuggable Server" C:\> "C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x86\symstore" query /s c:\Symbols /f C:\temp\PDB\Server.exe
C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x86 to your path if
you expect to use
For more information on symstore, check out the symstore docs.
Also see Setting Yourself up for Debugging at Thomas Kejser's Database Blog.
Event Tracing for Windows is a low-level, low-impact form of system tracing that lies dormant until activated with either xperf or Windows Performance Recorder. It is analogous to dtrace on *nix systems.
Windows Performance Recorder in conjunction with Windows Performance Analyzer is an insanely powerful way of profiling performance of .Net applications running in production, without the overhead of more traditional code profilers.
To look at kernel context switches (indicative of blocking calls and lock
Computation -> CPU Usage (Precise) -> Context Switch Count
by Process, Thread. Rearrange the columns so that
ReadyThreadStack are to the left of
the thick yellow line. Sort descending by
Waits (us) on the right. Select
Load Symbols from the Trace menu. This will take a while, but once done you
can drill down into your code and see exactly where threads are being switched
back in and what happened to allow them to continue (e.g. which line of code
was blocking, and which line of code unblocked it).
With a bit of practice, this is like having the Eye of freakin' Sauron glaring at your code for you. Coarse-grained locks deep in the .Net framework itself are dragged kicking and screaming into the sunlight. Awful connection pool management in your database driver is held up for all to see. No-one escapes.
Flame graphs are a very useful visualisation of CPU usage broken down by stack trace. They were originally designed to process dtrace profiles, but Bruce Dawson wrote a pre-processor that converts xperf/WPR traces to a compatible format. Check out the linked blog posts for details. Note that you probably want to use WPA first to pin down short intervals of interest, as trying to generate a flamegraph of, say, 5 seconds duration on software doing 30k requests per second is a bit of a system killer to say the least.
Production debugging is a tricky beast. If you have a route through the network and some off-peak time, you can connect with Visual Studio's remote debugger. This can kill performance though. For memory problems, you can just as usefully grab a process dump and debug it on your own workstation at your leisure.
Open crash dump file (Ctrl-D)
.loadby sos clr
Try running a SOS command, e.g.
!threads. If it fails with a 'load data
access DLL' error, it's probably the wrong version of SOS (even the revision
numbers have to match). Follow the instructions and run
.cordll -ve -u -l to
check, and if necessary grab SOS.dll from the dump machine (typical path
Set up symbol path. If you have a local symstore (as above), use:
If you just have an app directory containing PDBs, use:
.sympath srv*http://msdl.microsoft.com/download/symbols .sympath+ "C:\Program Files\DeployedServer"
Toggle debug info with:
!sym noisy !sym quiet
Enable DML (hyperlinks symbols so you can navigate the object graph with the mouse):
WinDbg is not what you'd call beginner-friendly. The following pages have some useful lists of commands in addition to those I've covered below.
||Display all threads|
||pipes output of
||switch debugger to thread ordinal 22|
||switch to managed thread ID 0x12AB|
||dump exception on current thread|
||dump managed stack|
||dump managed and native stack|
||search for deadlocks|
||search for threads holding locks|
||search for threads waiting on locks|
||run command for all threads|
||run command for all threads|
||set current frame for
||display arguments and parameters for current stack frame|
||heap for given type|