It's in the nature of profiling on modern CPUs that the more instrumentation you add, the harder it is to work out how the program would have behaved without the profiler. ANTS has to compromise and instead tries to ensure that the relationship between method times remains constant. Turning down the detail level using the startup dialog will reduce the amount of instrumentation and improves ANTS estimation of the performance of the application.
With multithreaded applications, you should probably consider using sampling mode wherever possible to get the best impression of how the application is performing (with instrumentation, the profiler may cause threads to be synchronised when under heavy load where they would not normally get synchronised)
It's also worth noting that ANTS counts CPU ticks and not real-time performance (except in sampling mode). This does provide a better representation of the work that an application must do, but technologies like SpeedStep and TurboBoost mean that any given clock tick can take a very variable amount of time. ANTS works out the clock speed at any given point in time and uses this to get the real time values, but because the CPU is varying in speed this doesn't necessarily correspond very well to the amount of work the application is actually doing at any given point.
Finally, with multi-threaded applications the amount of 'time' spent in total is multiplied by the number of threads (or the number of running threads if you choose CPU time). This is particularly noticeable when using wallclock time (where having 3 threads for 3 minutes will give you 9 minutes of time in the call tree), but will also happen with CPU time. We do this because dividing the time between the threads doesn't actually make sense - especially with multicore systems where you really can do 9 minutes of work in 3 minutes of time. The timeline can be used to untangle this: when you select a method it will show a bar showing exactly when it was running: for a method that's running in parallel this will be shorter than the amount of time shown in the call tree. You can also select an individual thread using the drop-down to eliminate extra time caused by parallelism.
Red Gate Software Ltd.