Live Chat Programs

March 1, 2008

Web Access for Visual Studio Team System

Filed under: Live Chat software

Microsoft Distinguished Engineer, Brian Harry follows up on his recent announcement about Microsoft’s acquisition of of devBiz , developers of TeamPlain Web Access for VSTS. Brian heads up our Visual Studio Team Foundation development team, based in Raleigh, NC. Note that TeamPlain is unrelated to Teamprise , which to quote Jim Newkirk , recently ”announced a complimentary license of the Teamprise client suite for anyone wanting to connect to an open source project on CodePlex.”

If you are an existing devBiz customer, I encourage you to think about and weigh in on the following comment, on Brian’s weblog.

Another set of feedback we’ve gotten revolves around the devBiz components products - devMail, devDns and others.  We have removed these products from the market and are unsure what our future plans for them are.  I’ve seen requests that we open source them among other things.  We are considering many options ranging from including them in other products to making the source available in some form - either to existing customers, publicly or otherwise.  We want to make sure that customers feel that they have a good path forward.  We hope to reach a conclusion on a plan in the next few weeks on this issue as well.”

If the VSTS team decides to release some of the products Brian mentions as open source projects, I hope and wouldn’t be surprised to see them posted on CodePlex , which runs on Visual Studio Team System.


http://blogs.msdn.com/korbyp/archive/2007/04/02/web-access-for-visual-studio-team-system.aspx

Exception Handling in Running a Business

Filed under: Live Chat software

I’m going to the Rose Bowl.

I am a University of Illinois alum and an avid fan of college
sports.  The Illini football team had a great season this year and will play
USC in Pasadena on January 1st.  In fact, this is the just the second time in
my lifetime that Illinois has made it to the Rose Bowl.  For those of us here
in central Illinois, this is a really big deal.  Who knows when it will happen
again?

So last week when the University started selling tickets, I
placed my order.  A few days later I received confirmation that I was going to
actually get the tickets I had requested.  That email said:

“tickets will be shipped to the
address listed above via UPS Overnight Delivery”

I laughed out loud.  UPS Overnight?  I live right here in
Champaign-Urbana.  The University of Illinois Athletic Ticket Office is less
than two miles from my office.  Surely I could just go over during my lunch
hour and pick them up?

No, I suppose not.  These folks are trying to process orders
for over 25,000 tickets and they have very little time to do it.  They probably
just want to have one standard method of handling them all.  Dealing with the
special cases would slow everything down.

The next day I got email from UPS with a tracking number for
my tickets:

Sure enough — my tickets were being sent 1.8 miles by “Next
Day Air”.  At this point, I fully expected that this envelope would be
traveling across town by way of O’Hare.

Much to my surprise, UPS actually figured out that it was
already in its destination city:

So, let’s review:  Both the University and UPS faced a
situation which was somewhat of an exception to their normal workflow.  One of
them treated the exception as a special case.  The other one did not. 

And in my opinion, both of these organizations did exactly
the right thing.

I think one of the toughest parts of running a business is
dealing with all the exceptions.  These things never get much attention at the
genesis of a company.  We write our business plan and we try to figure out how
we’re going to handle everything from customer issues to staffing issues to
bugs to parking.  But then life hands us a diversity of circumstances we never
expected.

  • One of your staff needs to have surgery but they’ve used
    up all their leave days.

  • Your biggest customer wants you to add a special feature
    that won’t be useful to anybody else.

  • The policy says anybody who purchased on or after June 17th
    will get the upgrade for free.  The guy who bought at 10:00pm on June 16th
    is on the phone.

Sometimes the right thing to do is to handle the situation
as a special case, even if doing so takes extra time.

And sometimes, it’s best to just shove everything into the
meat grinder and let sausage come out the other side.

But how do we know which approach to use for a given
situation?  The issues in play can include fairness, cost, ethics, focus, and
so on.

And when is the time to realize that a certain kind of
exception is happening often enough that it’s worth defining a way to handle
it?

I don’t have any silver bullet answers for these questions. 
In entrepreneurship, there is no substitute for good judgment. 

Just keep in mind that exceptions are going to
happen, and how we navigate them can be a major definer of our success in
business.  Pay attention, and use common sense.


http://software.ericsink.com/entries/Business_Exceptions.html

Natural Sorting in C#

Filed under: Live Chat software

Jeff Atwood recently posted about natural sorting. This is all about making sure that strings that contain numbers sort numerically. I’m slightly surprised to see that he wants to call it alphabetical sorting. Surely by definition, alphabetical sorting is defined by, well, the alphabet. This is an issue about numbers, not letters.

Anyway, he says he tried and gave up on a succinct C# version. He suggests that it will take 40+ lines of code. I believe that’s misleading, because as far as I can tell, the Python versions are only able to be so succinct because Python already appears to know how to sort an array. Both examples he shows rely on this. In.NET, collections aren’t intrinsically sortable. Let’s sort that:

/// <summary>
/// Compares two sequences.
/// </summary>
/// <typeparam name=”T”>Type of item in the sequences.</typeparam>
/// <remarks>
/// Compares elements from the two input sequences in turn. If we
/// run out of list before finding unequal elements, then the shorter
/// list is deemed to be the lesser list.
/// </remarks>
public class EnumerableComparer<T> : IComparer<IEnumerable<T>>
{
 /// <summary>
 /// Create a sequence comparer using the default comparer for T.
 /// </summary>
 public EnumerableComparer()
 {
 comp = Comparer<T>.Default;
 }
	
 /// <summary>
 /// Create a sequence comparer, using the specified item comparer
 /// for T.
 /// </summary>
 /// <param name=”comparer”>Comparer for comparing each pair of
 /// items from the sequences.</param>
 public EnumerableComparer(IComparer<T> comparer)
 {
 comp = comparer;
 }
	
 /// <summary>
 /// Object used for comparing each element.
 /// </summary>
 private IComparer<T> comp;
	
 /// <summary>
 /// Compare two sequences of T.
 /// </summary>
 /// <param name=”x”>First sequence.</param>
 /// <param name=”y”>Second sequence.</param>
 public int Compare(IEnumerable<T> x, IEnumerable<T> y)
 {
 using (IEnumerator<T> leftIt = x.GetEnumerator())
 using (IEnumerator<T> rightIt = y.GetEnumerator())
 {
 while (true)
 {
 bool left = leftIt.MoveNext();
 bool right = rightIt.MoveNext();
	
 if (!(left || right)) return 0;
	
 if (!left) return -1;
 if (!right) return 1;
	
 int itemResult = comp.Compare(leftIt.Current, rightIt.Current);
 if (itemResult != 0) return itemResult;
 }
 }
 }
}
	

(Note: I offer the code samples on this page under the MIT license.)

So yes, I need a lot of code. However, that’s a utility class that is applicable to a wide range of scenarios, not just this one. It’s slightly irritating that it’s not already built into the.NET framework. Heck, maybe it is, and I’ve just been looking in the wrong place.

Given easy way to compare two sequences, a C# 3.0 natural sort becomes roughly as trivial as the Python examples in Jeff’s blog:

string[] testItems = { “z24″, “z2″, “z15″, “z1″,
 “z3″, “z20″, “z5″, “z11″,
 “z 21″, “z22″ };
	
Func<string, object> convert = str =>
{ try { return int.Parse(str); }
 catch { return str; } };
var sorted = testItems.OrderBy(
 str => Regex.Split(str.Replace(” “, “”), “([0-9]+)”).Select(convert),
 new EnumerableComparer<object>());

It’s probably not meaningful to count lines of code. This being C#, I could have put it all on one line. As it is, I split it across more lines than I normally would, to avoid an annoying HTML layout issue. (I put my code samples in PRE blocks to get the formatting right, PRE blocks and long lines are a bad combination.) But I think it’s fair to say that any differences in size are due merely to syntactic differences between Python and C#. Structurally, there’s no substantial difference – I’ve been able to apply exactly the same techniques the Python examples used in C#.

If I print out the results using this code:

foreach (string s in sorted)
{
 Console.WriteLine(s);
}

It prints out the test items in this order:

z1
z2
z3
z5
z11
z15
z20
z 21
z22
z24

I.e., ascending numeric order, rather than what you’d get with most string ordering.

[Updated 21st December 2007: Charles Petzold didn’t like the original version, which treated spaces as significant for sorting. So I’ve updated the example to ignore spaces, as the position of “z 21” in the output above shows. I simply added a call to Replace(" ", "") on the string before passing it into Regex.Split.]


http://www.interact-sw.co.uk/iangblog/2007/12/13/natural-sorting

NHL seven days a week

Filed under: Live Chat software

Trying to find hockey on TV is like, well, even worse than it used to be. Much worse. The national coverage is pretty awful, even among five networks. I’d resigned myself to the fact that I’ll have to go to one of the neighborhood Red Wings bars if I want to catch the Wings. I don’t have a cable box, and I’m not about to get one just to watch one game per week on the Outdoor Life Channel. (Yes, non-hockey fans, that is the channel that carries the NHL these days. Hey, at least they’re playing this year.)

Still, just having hockey on in the background is comforting. I’d almost given up when I remembered that Comcast.net shows games live over the internet for Comcast subscribers. This is awesome. Even better is that they have at least one game on every day! (2/3 of the days have two games.)

A Windows Media Center computer drives my TV, so making this happen is incredibly easy. Launch IE, click the link on the Comcast.net front page, right-click the video feed, select “Full Screen” and I’m enjoying NHL hockey in all of its compressed glory.

The video quality is surprisingly good. If HD is a 10 and normal TV is a 5, I’d say this is a 4. I’m just happy I can stream live TV over wireless without a single blip.

Nice job, Comcast. Now how about a Media Center plug-in and HD-quality? That I’d pay for.

http://weblogs.asp.net/jkey/archive/2005/11/19/430980.aspx

Interested in Artificial Intelligence? What about Wiki’s? Well, now you can have both.

Filed under: Live Chat software

Unfortunately I’m not talking about a Wiki that actually is artificially intelligent, summarily filling itself out and saving me gobs of time by learning off of the Google-Sphere. What I am talking about is a site focused on covering the algorithms that a first year AI student might be faced with during their coursework. Hopefully they’ll get some additional material in there as well, but the initial focus is just that first yet.

http://ai.squeakydolphin.com/wiki.php?pagename=AIAWiki.HomePage

If you are a.NET supporter like me, maybe you’ll try and throw in your hat by providing alternate versions of some of the programs seen on the site. I have my eyes on a few of them already.


http://weblogs.asp.net/justin_rogers/archive/2004/11/06/253471.aspx

Finally, the Killer App

Filed under: Live Chat software

If you’ve yet to be sold on the Internet, grab a seat and your favorite pointing device. My good man Ryan sent me a link to what is undoubtedly the Internet’s Killer App: The Beer Mapping Project

Chicago’s map is a bit limited — my neighborhood alone has over 450 bars — so get on it. It takes a village, comrades.

God Bless America!

http://weblogs.asp.net/jkey/archive/2006/01/18/435889.aspx

I love ClearContext!!

Filed under: Live Chat software

After several months of using the Free version of the ClearContext addon for Microsoft Outlook, I just cant imagine what I would do without it.  It has reduced my email time, kept me more organized, and uncluttered my Inbox better & faster than any ad-hoc system I have devised in the past.

As a developer, I hate it when I have to “code in Outlook”.  If it were up to me, I would ban all email during a project and deal with all communication via instant messenging, Scrum meetings, and whiteboards, but the truth is that email is a neccessary evil especially as a Tech Lead who needs to interface with the Project Manager, Customer, and IT personnel.

Enter ClearContext Information Management System…

First, I set it up to flag emails from my bosses in Red, so I dont miss them.  Plus, for good measure, I have an Outlook rule that sets a FollowUp flag to make sure I dont overlook them.  Also, ClearContext automagically ranks emails based upon my prior history with this person, so I know what to do when I get some nice blue and green colored mail too.

If I receive an email relating to my current project, I simply hit ALT-P to popup the CC dialog and flag it with the topic “projects/MyProject” then either leave it in the inbox for further review, or hit ALT-M to file the message for future reference.    Accordingly, if I receive some corporate or administrative relating email, then I assign it’s topic appropriately and file the message to send it to its respective holding area.  

The act of assigning a Topic (ALT-P), automatically creates subfolders within my Inbox (e.g.  inbox/projects/MyProject) matching the topic name (Note the trick of adding a “/” to the topic name to create a nested subfolder at the same time).  The act of filing a message (ALT-M), moves it to the subfolder identified by the topic name.  This is great because the messages are nolonger visible in the Inbox listing, but are still within the Inbox via the subfolder.

At that point, my AutoArchive settings will take care of moving it off on a monthly basis in case I need it later.

At some point, I want to look at the full product, which has features for deferring emails, converting them to tasks & appointments, assigning them to other people, etc.   See their Features Overview section for more on these areas.

If these features are nearly as useful as the ones I use now, then I could *gasp* become even more productive!  woot!


http://weblogs.asp.net/lhunt/archive/2007/12/18/i-love-clearcontext.aspx

Startup, Shutdown and related matters

Filed under: Live Chat software

Usually
I write blog articles on topics that people request via email or comments on
other blogs.  Well, nobody has ever
asked me to write anything about shutdown.

size=2>

But then
I look at all the problems that occur during process shutdown in the unmanaged
world.  These problems occur because
many people don’t understand the rules, or they don’t follow the rules, or the
rules couldn’t possibly work anyway.

size=2>

We’ve
taken a somewhat different approach for managed applications.  But I don’t think we’ve ever explained
in detail what that approach is, or how we expect well-written applications to
survive an orderly shutdown. 
Furthermore, managed applications still execute within an unmanaged OS
process, so they are still subject to the OS rules.  And in V1 and V1.1 of the CLR we’ve
horribly violated some of those OS rules related to startup and shutdown.  We’re trying to improve our behavior
here, and I’ll discuss that too.

size=2>

Questionable
APIs

size=2>Unfortunately, I can’t discuss the model for shutting down managed
applications without first discussing how unmanaged applications terminate.  And, as usual, I’ll go off on a bunch of
wild tangents.

size=2>

size=2>Ultimately, every OS process shuts down via a call to ExitProcess or
TerminateProcess.  ExitProcess is
the nice orderly shutdown, which notifies each DLL of the termination.  TerminateProcess is ruder, in that the
DLLs are not informed.

size=2>

The
relationship between ExitProcess and TerminateProcess has a parallel in the
thread routines ExitThread and TerminateThread.  ExitThread is the nice orderly thread
termination, whereas if you ever call TerminateThread you may as well kill the
process.  It’s almost guaranteed to
be in a corrupt state.  For example,
you may have terminated the thread while it holds the lock for the OS heap.  Any thread attempting to allocate or
release memory from that same heap will now block forever.

size=2>

size=2>Realistically, Win32 shouldn’t contain a TerminateThread service.  To a first approximation, anyone who has
ever used this service has injected a giant bug into his application.  But it’s too late to remove it
now.

size=2>

In that
sense, TerminateThread is like System.Threading.Thread.Suspend and Resume.  I cannot justify why I added those
services.  The OS SuspendThread and
ResumeThread are extremely valuable to a tiny subset of applications.  The CLR itself uses these routines to
take control of threads for purposes like Garbage Collection and – as we’ll see
later – for process shutdown.  As
with TerminateThread, there’s a significant risk of leaving a thread suspended
at a “bad” spot.  If you call
SuspendThread while a thread is inside the OS heap lock, you better not try to
allocate or free from that same heap. 
In a similar fashion, if you call SuspendThread while a thread holds the
OS loader lock (e.g. while the thread is executing inside DllMain) then you
better not call LoadLibrary, GetProcAddress, GetModuleHandle, or any of the other OS
services that require that same lock.

size=2>

Even
worse, if you call SuspendThread on a thread that is in the middle of exception
dispatching inside the kernel, a subsequent GetThreadContext or SetThreadContext
can actually produce a blend of the register state at the point of the
suspension and the register state that was captured when the exception was
triggered.  If we attempt to modify
a thread’s context (perhaps bashing the EIP – on X86 – to redirect the thread’s
execution to somewhere it will synchronize with the GC or other managed
suspension), our update to EIP might quietly get lost.  Fortunately it’s possible to coordinate
our user-mode exception dispatching with our suspension attempts in order to
tolerate this race condition.

 

And
probably the biggest gotcha with using the OS SuspendThread & ResumeThread
services is on Win9X.  If a Win9X
box contains real-mode device drivers (and yes, some of them still do), then
it’s possible for the hardware interrupt associated with the device to interact
poorly with the thread suspension. 
Calls to GetThreadContext can deliver a register state that is perturbed
by the real-mode exception processing. 
The CLR installs a VxD on those operating systems to detect this case and
retry the suspension.

size=2>

Anyway,
with sufficient care and discipline it’s possible to use the OS SuspendThread
& ResumeThread to achieve some wonderful things.

size=2>

But the
managed Thread.Suspend & Resume are harder to justify.  They differ from the unmanaged
equivalents in that they only ever suspend a thread at a spot inside managed
code that is “safe for a garbage collection.”  In other words, we can report all the GC
references at that spot and we can unwind the stack and register state to reveal
our caller’s execution state.

size=2>

Because
we are at a place that’s safe for garbage collection, we can be sure that
Thread.Suspend won’t leave a thread suspended while it holds an OS heap
lock.  But it may be suspended while
it holds a managed Monitor (‘lock’ in C# or ‘SyncLock’ in VB.NET).  Or it may be suspended while it is
executing the class constructor (.cctor) of an important class like
System.String.  And over time we
intend to write more of the CLR in managed code, so we can enjoy all the
benefits.  When that happens, a
thread might be suspended while loading a class or resolving security policy for
a shared assembly or generating shared VTables for COM Interop.

size=2>

The real
problem is that developers sometimes confuse Thread.Suspend with a
synchronization primitive.  It is
not.  If you want to synchronize two
threads, you should use appropriate primitives like Monitor.Enter,
Monitor.Wait, or WaitHandle.WaitOne. 
Of course, it’s harder to use these primitives because you actually have
to write code that’s executed by both threads so that they cooperate
nicely.  And you have to eliminate
the race conditions.

size=2>

I’m
already wandering miles away from Shutdown, and I need to get back.  But I can’t resist first mentioning that
TerminateThread is distinctly different from the managed Thread.Abort service,
both in terms of our aspirations and in terms of our current
implementation.

size=2>

Nobody
should ever call TerminateThread. 
Ever.

size=2>

Today
you can safely call Thread.Abort in two scenarios.

size=2>

  1. style="MARGIN: 0in 0in 0pt; mso-list: l8 level1 lfo4; tab-stops: list.5in"
    >You can call Abort on your own thread
    (Thread.CurrentThread.Abort()). 
    This is not much different than throwing any exception on your thread,
    other than the undeniable manner in which the exception propagates.  The propagation is undeniable in the
    sense that your thread will continue to abort, even if you attempt to swallow
    the ThreadAbortException in a catch clause.  At the end-catch, the CLR notices that
    an abort is in progress and we re-throw the abort.  You must either explicitly call the
    ResetAbort method – which carries a security demand – or the exception must
    propagate completely out of all managed handlers, at which point we reset the
    undeniable nature of the abort and allow unmanaged code to (hopefully) swallow
    it.

size=2>

  1. style="MARGIN: 0in 0in 0pt; mso-list: l8 level1 lfo4; tab-stops: list.5in"
    >An Abort is performed on all threads that have stack in an
    AppDomain that is being unloaded. 
    Since we are throwing away the AppDomain anyway, we can often tolerate
    surprising execution of threads at fairly arbitrary spots in their
    execution.  Even if this leaves
    managed locks unreleased and AppDomain statics in an inconsistent state, we’re
    throwing away all that state as part of the unload anyway.  This situation isn’t as robust as we
    would like it to be.  So we’re
    investing a lot of effort into improving our behavior as part of getting
    “squeaky clean” for highly available execution inside SQL Server in our next
    release.

size=2>

Longer
term, we’re committed to building enough reliability infrastructure around
Thread.Abort that you can reasonably expect to use it to control threads that
remain completely inside managed code. 
Aborting threads that interleave managed and unmanaged execution in a
rich way will always remain problematic, because we are limited in how much we
can control the unmanaged portion of that execution.

size=2>

size=2>

ExitProcess
in a nutshell

So what
does the OS ExitProcess service actually do?  I’ve never read the source code.  But based on many hours of stress
investigations, it seems to do the following:

size=2>

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list.5in">1)     
Kill all the threads except one,
whatever they are doing in user mode. 
On NT-based operating systems, the surviving thread is the thread that
called ExitProcess.  This becomes
the shutdown thread.  On Win9X-based
operating systems, the surviving thread is somewhat random.  I suspect that it’s the last thread to
get around to committing suicide.

size=2>

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list.5in">2)     
Once only one thread survives, no
further threads can enter the process… almost.  On NT-based systems, I only see
superfluous threads during shutdown if a debugger attaches to the process during
this window.  On Win9X-based
systems, any threads that were created during this early phase of shutdown are
permitted to start up.  The
DLL_THREAD_ATTACH notifications to DllMain for the starting threads will be
arbitrarily interspersed with the DLL_PROCESS_DETACH notifications to DllMain
for the ensuing shutdown.  As you
might expect, this can cause crashes.

size=2>

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list.5in">3)     
Since only one thread has survived
(on the more robust NT-based operating systems), the OS now weakens all the
CRITICAL_SECTIONs.  This is mixed
blessing.  It means that the
shutdown thread can allocate and free objects from the system heap without
deadlocking.  And it means that
application data structures protected by application CRITICAL_SECTIONs are
accessible.  But it also means that
the shutdown thread can see corrupt application state.  If one thread was wacked in step #1
above while it held a CRITICAL_SECTION and left shared data in an inconsistent
state, the shutdown thread will see this inconsistency and must somehow tolerate
it.  Also, data structures that are
protected by synchronization primitives other than CRITICAL_SECTION are still
prone to deadlock.

size=2>

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list.5in">4)     
The OS calls the DllMain of each
loaded DLL, giving it a DLL_PROCESS_DETACH notification.  The ‘lpReserved’ argument to DllMain
indicates whether the DLL is being unloaded from a running process or whether
the DLL is being unloaded as part of a process shutdown.  (In the case of the CLR’s DllMain, we
only ever receive the latter style of notification.  Once we’re loaded into a process, we
won’t be unloaded until the process goes away).

size=2>

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list.5in">5)     
The process actually terminates,
and the OS reclaims all the resources associated with the process.

size=2>

Well,
that sounds orderly enough.  But try
running a multi-threaded process that calls ExitProcess from one thread and
calling HeapAlloc / HeapFree in a loop from a second thread.  If you have a debugger attached,
eventually you will trap with an ‘INT 3’ instruction in the OS heap code.  The OutputDebugString message will
indicate that a block has been freed, but has not been added to the free list…
It has been leaked.  That’s because
the ExitProcess wacked your 2nd thread while it was in the middle of
a HeapFree operation.

size=2>

This is
symptomatic of a larger problem.  If
you wack threads while they are performing arbitrary processing, your
application will be left in an arbitrary state.  When the DLL_PROCESS_DETACH
notifications reach your DllMain, you must tolerate that arbitrary
state.

size=2>

I’ve
been told by several OS developers that it is the application’s responsibility
to take control of all the threads before calling ExitProcess.  That way, the application will be in a
consistent state when DLL_PROCESS_DETACH notifications occur. If you work in the
operating system, it’s reasonable to consider the “application” to be a
monolithic homogenous piece of code written by a single author.  So of course that author should put his
house in order and know what all the threads are doing before calling
ExitProcess.

size=2>

But if
you work on an application, you know that there are always multiple components
written by multiple authors from different vendors.  These components are only loosely aware
of each other’s implementations – which is how it should be.  And some of these components have extra
threads on the side, or they are performing background processing via
IOCompletion ports, threadpools, or other techniques.

size=2>

Under
those conditions, nobody can have the global knowledge and global control
necessary to call ExitProcess “safely”. 
So, regardless of the official rules, ExitProcess will be called while
various threads are performing arbitrary processing.

size=2>

size=2>

The OS
Loader Lock

It’s
impossible to discuss the Win32 model for shutting down a process without
considering the OS loader lock. 
This is a lock that is present on all Windows operating systems.  It provides mutual exclusion during
loading and unloading.

size=2>

size=2>Unfortunately, this lock is held while application code executes.  This fact alone is sufficient to
guarantee disaster.

size=2>

If you
can avoid it, you must never hold one of your own locks while calling into
someone else’s code.  They will
screw you every time.

size=2>

Like all
good rules, this one is made to be broken. 
The CLR violates this rule in a few places.  For example, we hold a ‘class
constructor’ lock for your class when we call your.cctor method.  However, the CLR recognizes that this
fact can lead to deadlocks and other problems.  So we have rules for weakening this lock
when we discover cycles of.cctor locks in the application, even if these cycles
are distributed over multiple threads in multi-threaded scenarios.  And we can see through various other
locks, like the locks that coordinate JITting, so that larger cycles can be
detected.  However, we deliberately
don’t look through user locks (though we could see through many of these, like
Monitors, if we chose).  Once we
discover a visible, breakable lock, we allow one thread in the cycle to see
uninitialized state of one of the classes. 
This allows forward progress and the application continues.  See my earlier blog on “Initializing
code” for more details.

size=2>

size=2>Incidentally, I find it disturbing that there’s often little discipline
in how managed locks like Monitors are used.  These locks are so convenient,
particularly when exposed with language constructs like C# lock and VB.NET
SyncLock (which handle backing out of the lock during exceptions), that many
developers ignore good hygiene when using them.  For example, if code uses multiple locks
then these locks should typically be ranked so that they are always acquired in
a predictable order.  This is one
common technique for avoiding deadlocks.

size=2>

Anyway,
back to the loader lock.  The
OS takes this lock implicitly when it is executing inside APIs like
GetProcAddress, GetModuleHandle and GetModuleFileName.  By holding this lock inside these APIs,
the OS ensures that DLLs are not loading and unloading while it is groveling
through whatever tables it uses to record the state of the process.

size=2>

So if
you call those APIs, you are implicitly acquiring a lock.

size=2>

That
same lock is also acquired during a LoadLibrary, FreeLibrary, or CreateThread
call.  And – while it is held – the
operating system will call your DllMain routine with a notification.  The notifications you might see
are:

size=2>

DLL_THREAD_ATTACH

The
thread that calls your DllMain has just been injected into the process.  If you need to eagerly allocate any TLS
state, this is your opportunity to do so. 
In the managed world, it is preferable to allocate TLS state lazily on
the first TLS access on a given thread.

size=2>

DLL_THREAD_DETACH

The
thread that calls your DllMain has finished executing the thread procedure that
it was started up with.  After it
finishes notifying all the DLLs of its death in this manner, it will
terminate.  Many unmanaged
applications use this notification to de-allocate their TLS data.  In the managed world, managed TLS is
automatically cleaned up without your intervention.  This happens as a natural consequence of
garbage collection.

size=2>

DLL_PROCESS_ATTACH

The
thread that calls your DllMain is loading your DLL via an explicit LoadLibraryEx
call or similar technique, like a static bind.  The lpReserved argument indicates
whether a dynamic or static bind is in progress.  This is your opportunity to initialize
any global state that could not be burned into the image.  For example, C++ static initializers
execute at this time.  The managed
equivalent has traditionally been a class constructor method, which executes
once per AppDomain.  In a future
version of the CLR, we hope to provde a more convenient module constructor
concept.

size=2>

DLL_PROCESS_DETACH

If the
process is terminating in an orderly fashion (ExitProcess), your DllMain will
receive a DLL_PROCESS_DETACH notification where the lpReserved argument is
non-null.  If the process is
terminating in a rude fashion (TerminateProcess), your DllMain will receive no
notification.  If someone unloads
your DLL via a call to FreeLibrary or equivalent, the process will continue
executing after you unload.  This case is indicated by a null value for
lpReserved.  In the managed world, de-initialization
happens through notifications of AppDomain unload or process exit, or through
finalization activity.

The DLL_THREAD_ATTACH and
DLL_THREAD_DETACH calls have a performance implication.  If you have loaded
100 DLLs into your process and you start a new thread, that thread must call 100
different DllMain routines.  Let’s say that these routines touch a page or
two of code each, and a page of data.  That might be 250 pages (1 MB) in your
working set, for no good reason.

The CLR calls DisableThreadLibraryCalls
on all managed assemblies other than certain MC++ IJW assemblies (more on this
later) to avoid this overhead for you.  And it’s a good idea to do the same on your
unmanaged DLLs if they don’t need these notifications to manage their
TLS.

Writing code inside DllMain is one of
the most dangerous places to write code.  This is because you are executing inside a
callback from the OS loader, inside the OS loader lock.

Here are some of the rules related to
code inside DllMain:

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l10 level1 lfo2; tab-stops: list.5in">1)      You must never call LoadLibrary or
otherwise perform a dynamic bind.

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l10 level1 lfo2; tab-stops: list.5in">2)      You must never attempt to acquire a
lock, if that lock might be held by a thread that needs the OS loader lock.  (Acquiring a heap
lock by calling HeapAlloc or HeapFree is probably okay).

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l10 level1 lfo2; tab-stops: list.5in">3)      You should never call into another
DLL.  The
danger is that the other DLL may not have initialized yet, or it may have
already uninitialized.  (Calling into kernel32.dll is probably
okay).

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l10 level1 lfo2; tab-stops: list.5in">4)      You should never start up a thread or
terminate a thread, and then rendezvous with that other thread’s start or
termination.

As we shall see, the CLR violates some
of these rules. 
And these violations have resulted in serious consequences for managed
applications – particularly managed applications written in MC++.

And if you’ve ever written code inside
DllMain – including code that’s implicitly inside DllMain like C++ static
initializers or ‘atexit’ routines – then you’ve probably violated some of these
rules.  Rule #3
is especially harsh.

The fact is, programs violate these
rules all the time and get away with it.  Knowing this, the MC++ and CLR teams made a
bet that they could violate some of these rules when executing IJW
assemblies.  It
turns out that we bet wrong.

I’m going to explain exactly how we
screwed this up with IJW assemblies, but first I need to explain what IJW
assemblies are.

IJW

IJW is how we internally refer to mixed
managed / unmanaged images.  If you compile a MC++ assembly with ‘/clr’ in
V1 or V1.1, it almost certainly contains a mixture of managed and unmanaged
constructs.

In future versions, I expect there will
be ways to compile MC++ assemblies with compiler-enforced guarantees that the
image is guaranteed pure managed, or guaranteed pure verifiable managed, or –
ultimately – perhaps even pure verifiable 32-bit / 64-bit neutral managed.  In each case, the
compiler will necessarily have to restrict you to smaller and smaller subsets of
the C++ language. 
For example, verifiable C++ cannot use arbitrary unmanaged pointers.  Instead, it must
restrict itself to managed pointers and references, which are reported to the
garbage collector and which follow certain strict rules.  Furthermore, 32-bit
/ 64-bit neutral code cannot consume the declarations strewn through the
windows.h headers, because these pick a word size during compilation.

IJW is an acronym for “It Just Works”
and it reflects the shared goal of the C++ and CLR teams to transparently
compile existing arbitrary C++ programs into IL.  I think we did an amazing job of approaching
that goal, but of course not everything “just works.”  First, there are a
number of constructs like inline assembly language that cannot be converted to
managed execution. 
The C++ compiler, linker and CLR ensure that these methods are left as
unmanaged and that managed callers transparently switch back to unmanaged before
calling them.

So inline X86 assembly language must
necessarily remain in unmanaged code.  Some other constructs are currently left in
unmanaged code, though with sufficient effort we could provide managed
equivalents. 
These other constructs include setjmp / longjmp, member pointers (like
pointer to virtual method), and a reasonable startup / shutdown story (which is
what this blog article is supposed to be about).

I’m not sure if we ever documented the
constructs that are legal in a pure managed assembly, vs. those constructs which
indicate that the assembly is IJW.  Certainly we have a strict definition of this
distinction embedded in our code, because the managed loader considers it when
loading.  Some
of the things we consider are:

  • style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo3; tab-stops: list.5in"
    >A pure
    managed assembly has exactly one DLL import.This import is to mscoree.dll’s _CorExeMain
    (for an EXE) or _CorDllMain (for a DLL).The entrypoint of the EXE or DLL must be a
    JMP to this import. 
    This is how we force the runtime to load and get control whenever a
    managed assembly is loaded.

 

  • style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo3; tab-stops: list.5in"
    >A pure
    managed assembly can have no DLL exports.When we bind to pure managed assemblies, it
    is always through managed Fusion services, via AssemblyRefs and assembly
    identities (ideally with cryptographic strong names).

  • style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo3; tab-stops: list.5in"
    >A pure
    managed assembly has exactly one rebasing fixup.  This fixup is for
    the JMP through the import table that I mentioned above.  Unmanaged EXEs
    tend to strip all their rebasing fixups, since EXEs are almost guaranteed to
    load at their preferred addresses.However, managed EXEs can be loaded like
    DLLs into a running process.That single fixup is useful for cases where
    we want to load via LoadLibraryEx on versions of the operating system that
    support this.

  • style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo3; tab-stops: list.5in"
    >A pure
    managed assembly has no TLS section and no other exotic constructs that are
    legal in arbitrary unmanaged PE files.

Of course, IJW assemblies can have many
imports, exports, fixups, and other constructs.  As with pure managed assemblies, the
entrypoint is constrained to be a JMP to mscoree.dll’s _CorExeMain or
_CorDllMain function. 
This is the “outer entrypoint”.  However, the COM+ header of the PE file has
an optional “inner entrypoint”.  Once the CLR has proceeded far enough into
the loading process on a DLL, it will dispatch to this inner entrypoint which
is… your normal DllMain.  In V1 and V1.1, this inner entrypoint is
expressed as a token to a managed function.  Even if your DllMain is written as an
unmanaged function, we dispatch to a managed function which is defined as a
PInvoke out to the unmanaged function.

Now we can look at the set of rules for
what you can do in a DllMain, and compare it to what the CLR does when it sees
an IJW assembly. 
The results aren’t pretty.  Remember that inside DllMain:

You must never call LoadLibrary or otherwise perform a
dynamic bind

With normal managed assemblies, this
isn’t a concern. 
For example, most pure managed assemblies are loaded through
Assembly.Load or resolution of an AssemblyRef – outside of the OS loader
lock.  Even
activation of a managed COM object through OLE32’s CoCreateInstance will
sidestep this issue. 
The registry entries for the CLSID always mention mscoree.dll as the
server.  A
subkey is consulted by mscoree.dll – inside DllGetClassObject and outside of the
OS loader lock – to determine which version of the runtime to spin up and which
assembly to load.

But IJW assemblies have arbitrary DLL
exports. 
Therefore other DLLs, whether unmanaged or themselves IJW, can have
static or dynamic (GetProcAddress) dependencies on an IJW assembly.  When the OS loads
the IJW assembly inside the loader lock, the OS further resolves the static
dependency from the IJW assembly to mscoree.dll’s _CorDllMain.  Inside _CorDllMain,
we must select an appropriate version of the CLR to initialize in the
process.  This
involves calling LoadLibrary on a particular version of mscorwks.dll, violating
our first rule for DllMain.

So what goes wrong when this rule is
violated? 
Well, the OS loader has already processed all the DLLs and their imports,
walking the tree of static dependencies and forming a loading plan.  It is now executing
on this plan. 
Let’s say that the loader’s plan is to first initialize an IJW assembly,
then initialize its dependent mscoree.dll reference, and then initialize
advapi32.dll. 
(By ‘initialize’, I mean give that DLL its DLL_PROCESS_ATTACH
notification). 
When mscoree.dll decides to LoadLibrary mscorwks.dll, a new loader plan
must be created. 
If mscorwks.dll depends on advapi32.dll (and of course it does), we have
a problem.  The
OS loader already has advapi32.dll on its pending list.  It will initialize
that DLL when it gets far enough into its original loading plan, but not
before.

If mscorwks.dll needs to call some APIs
inside advapi32.dll, it will now be making those calls before advapi32.dll’s
DllMain has been called.  This can and does lead to arbitrary
failures.  I
personally hear about problems with this every 6 months or so.  That’s a pretty low
rate of failure. 
But one of those failures was triggered when a healthy application
running on V1 of the CLR was moved to V1.1 of the CLR.  Ouch.

You must never attempt to acquire a lock, if that lock
might be held by a thread that needs the OS loader lock

It’s not possible to execute managed
code without potentially acquiring locks on your thread.  For example, we may
need to initialize a class that you need access to.  If that class isn’t
already initialized in your AppDomain, we will use a.cctor lock to coordinate
initialization. 
Along the same lines, if a method requires JIT compilation we will use a
lock to coordinate this.  And if your thread allocates a managed
object, it may have to take a lock.  (We don’t take a lock on each allocation if
we are executing on a multi-processor machine, for obvious reasons.  But eventually your
thread must coordinate with the garbage collector via a lock before it can
proceed with more allocations).

So if you execute managed code inside
the OS loader lock, you are going to contend for a CLR lock.  Now consider what
happens if the CLR ever calls GetModuleHandle or GetProcAddress or
GetModuleFileName while it holds one of those other locks.  This includes
implicit calls to LoadLibrary / GetProcAddress as we fault in any lazy DLL
imports from the CLR.

Unfortunately, the sequence of lock
acquisition is inverted on the two threads.  This yields a classic deadlock.

Once again, this isn’t a concern for
pure managed assemblies.  The only way a pure managed assembly can
execute managed code inside the OS loader lock is if some unmanaged code
explicitly calls into it via a marshaled out delegate or via a COM call from its own
DllMain. 
That’s a bug in the unmanaged code!  But with an IJW assembly, some methods are
managed and some are unmanaged.  The compiler, linker and CLR conspire to make
this fact as transparent as possible.  But any call from your DllMain (i.e. from
your inner entrypoint) to a method that happened to be emitted as IL will set
you up for this deadlock.

You should
never call into another DLL

It’s really not possible to execute
managed code without making cross-DLL calls.  The JIT compiler is in a different DLL from
the ExecutionEngine. 
The ExecutionEngine is in a different DLL from your IJW
assembly.

Once again, pure managed assemblies
don’t usually have a problem here.  I did run into one case where one of the
Microsoft language compilers was doing a LoadLibrary of mscorlib.dll.  This had the side
effect of spinning up the CLR inside the OS loader lock and inflicting all the
usual IJW problems onto the compilation process.  Since managed assemblies have no DLL exports,
it’s rare for applications to load them in this manner.  In the case of this
language compiler, it was doing so for the obscure purpose of printing a banner
to the console at the start of compilation, telling the user what version of the
CLR it was bound to. 
There are much better ways of doing this sort of thing, and none of those
other ways would interfere with the loader lock.  This has been corrected.

 

You should never start up a thread or terminate a thread,
and then rendezvous

This probably doesn’t sound like
something you would do.  And yet it’s one of the most common deadlocks
I see with IJW assemblies on V1 and V1.1 of the CLR.  The typical stack
trace contains a load of an IJW assembly, usually via a DLL import.  This causes
mscoree.dll’s _CorDllMain to get control.  Eventually, we notice that the IJW assembly
has been strong name signed, so we call into WinVerifyTrust in
WinTrust.dll. 
That API has a perfectly reasonable expectation that it is not inside the
OS loader lock. 
It calls into the OS threadpool (not the managed CLR threadpool), which
causes the OS threadpool to lazily initialize itself.  Lazy initialization
involves spinning up a waiter thread, and then blocking until that waiter thread
starts executing.

Of course, the new waiter thread must
first deliver DLL_THREAD_ATTACH notifications to any DLLs that expect such
notifications. 
And it must obviously obtain the OS loader lock before it can deliver the
first notification. 
The result is a deadlock.

So I’ve painted a pretty bleak picture
of all the things that can go wrong with IJW assemblies in V1 and V1.1 of the
CLR.  If we had
seen a disturbing rate of failures prior to shipping V1, we would have
reconsidered our position here.  But it wasn’t until later that we had enough
external customers running into these difficulties.  With the benefits
of perfect hindsight, it is now clear that we screwed up.

Fortunately, much of this is fixable in
our next release. 
Until then, there are some painful workarounds that might bring you some
relief.  Let’s
look at the ultimate solution first, and then you can see how the workarounds
compare.  We
think that the ultimate solution would consist of several parts:

  1. style="MARGIN: 0in 0in 0pt; mso-list: l6 level1 lfo6; tab-stops: list.5in"
    >Just
    loading an IJW assembly must not spin up a version of the CLR.  That’s because
    spinning up a version of the CLR necessarily involves a dynamic load, and
    we’ve seen that dynamic loads are illegal during loading and initializing of
    static DLL dependencies.Instead, mscoree.dll must perform enough
    initialization of the IJW assembly without actually setting up a full
    runtime. 
    This means that all calls into the managed portion of the IJW assembly
    must be bashed so that they lazily load a CLR and initialize it on first
    call.

  1. style="MARGIN: 0in 0in 0pt; mso-list: l6 level1 lfo6; tab-stops: list.5in"
    >Along the
    same lines, the inner entrypoint of an IJW assembly must either be omitted or
    must be encoded as an unmanaged entrypoint.Recall that the current file format doesn’t
    have a way of representing unmanaged inner entrypoints, since this is always
    in the form of a token.Even if the token refers to an unmanaged
    method, we would have to spin up a version of the CLR to interpret that token
    for us.  So
    we’re going to need a tweak to the current file format to enable unmanaged
    inner entrypoints.

  1. style="MARGIN: 0in 0in 0pt; mso-list: l6 level1 lfo6; tab-stops: list.5in"
    >An
    unmanaged inner entrypoint is still a major risk.  If that inner
    entrypoint calls into managed code, we will trap the call and lazily spin up
    the correction version of the CLR.At that point, you are in exactly the same
    situation as if we had left the entrypoint as managed.  Ideally,
    assembly-level initialization and uninitialization would never happen inside
    the OS loader lock. 
    Instead, they would be replaced with modern managed analogs that are
    unrelated to the unmanaged OS loader’s legacy behavior.  If you read my
    old blog on “Initializing code” at
    http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/611cdfb1-2865-4957-9a9c-6e2655879323 , I mention that we’re under some
    pressure to add a module-level equivalent of.cctor methods.  That mechanism
    would make a great replacement for traditional DLL_PROCESS_ATTACH
    notifications. 
    In fact, the CLR has always supported a.cctor method at a global
    module scope. 
    However, the semantics associated with such a method was that it ran
    before any access to static members at global module scope.  A more useful
    semantic for a future version of the CLR would be for such a global.cctor to
    execute before any access to members in the containing Module, whether global
    or contained in any of the Module’s types.

  1. style="MARGIN: 0in 0in 0pt; mso-list: l6 level1 lfo6; tab-stops: list.5in"
    > >The above changes make it possible to avoid execution of
    managed code inside the OS loader lock.But it’s still possible for a naïve or
    misbehaved unmanaged application to call a managed service (like a marshaled
    out delegate or a managed COM object) from inside DllMain.  This final
    scenario is not specific to IJW.All managed execution is at risk to this
    kind of abuse. 
    Ideally, the CLR would be able to detect attempts to enter it while the
    loader lock is held, and fail these attempts.It’s not clear whether such detection /
    prevention should be unconditional or whether it should be enabled through a
    Customer Debug Probe. >

size=2>If you don’t know what Customer Debug Probes are,
please hunt them down on MSDN.  They are a life-saver for debugging certain
difficult problems in managed applications.  I would recommend starting with
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=c7b955c7-231a-406c-9fa5-ad09ef3bb37f , and then reading most of Adam Nathan’s
excellent blogs at
http://blogs.gotdotnet.com/anathan .

Of the above 4 changes, we’re relatively
confident that the first 3 will happen in the next release.  We also
experimented with the 4th change, but it’s
unlikely that we will make much further progress.  A key obstacle is that there is no
OS-approved way that can efficiently detect execution inside the loader
lock.  Our hope
is that a future version of the OS would provide such a mechanism.

This is all great.  But you have an
application that must run on V1 or V1.1.  What options do you have?  Fortunately, Scott
Currie has written an excellent article on this very subject.  If you build IJW
assemblies, please read it at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/vcconmixeddllloadingproblem.asp .

The Pure Managed
Story

If you code in a language other than
MC++, you’re saying “Enough about IJW and the OS loader lock
already.”

Let’s look at what the CLR does during
process shutdown. 
I’ll try not to mention IJW, but I’ll have to keep talking about that
darn loader lock.

From the point of view of a managed
application, there are three types of shutdown:

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l5 level1 lfo7; tab-stops: list.5in">1)      A shutdown initiated by a call to
TerminateProcess doesn’t involve any further execution of the CLR or managed
code.  From our
perspective, the process simply disappears.  This is the rudest of all shutdowns, and
neither the CLR developer nor the managed developer has any obligations related
to it.

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l5 level1 lfo7; tab-stops: list.5in">2)      A shutdown initiated by a direct call to
ExitProcess is an unorderly shutdown from the point of view of the managed
application. 
Our first notification of the shutdown is via a DLL_PROCESS_DETACH
notification. 
This notification could first be delivered to the DllMain of
mscorwks.dll, mscoree.dll, or any of the managed assemblies that are currently
loaded. 
Regardless of which module gets the notification first, it is always
delivered inside the OS loader lock.  It is not safe to execute any managed code at
this time.  So
the CLR performs a few house-keeping activities and then returns from its
DllMain as quickly as possible.  Since no managed code runs, the managed
developer still has no obligations for this type of shutdown.

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l5 level1 lfo7; tab-stops: list.5in">3)      An orderly managed shutdown gives
managed code an opportunity to execute outside of the OS loader lock, prior to
calling ExitProcess. 
There are several ways we can encounter an orderly shutdown.  Because we will
execute managed code, including Finalize methods, the managed developer must
consider this case.

Examples of an orderly managed shutdown
include:

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l9 level1 lfo8; tab-stops: list.5in">1)      Call System.Environment.Exit().  I already mentioned
that some Windows developers have noted that you must not call ExitProcess
unless you first coordinate all your threads… and then they work like mad to
make the uncoordinated case work.  For Environment.Exit we are under no
illusions.  We
expect you to call it in races from multiple threads at arbitrary times.  It’s our job to
somehow make this work.

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l9 level1 lfo8; tab-stops: list.5in">2)      If a process is launched with a managed
EXE, then the CLR tracks the number of foreground vs. background managed
threads.  (See
Thread.IsBackground). 
When the number of foreground threads drops to zero, the CLR performs an
orderly shutdown of the process.  Note that the distinction between foreground
and background threads serves exactly this purpose and no other
purpose.

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l9 level1 lfo8; tab-stops: list.5in">3)      Starting with MSVCRT 7.0, an explicit
call to ‘exit()’ or an implicit call to ‘exit()’ due to a return from ‘main()’
can turn into an orderly managed shutdown.  The CRT checks to see if mscorwks.dll or
mscoree.dll is in the process (I forget which).  If it is resident, then it calls
CorExitProcess to perform an orderly shutdown.  Prior to 7.0, the CRT is of course unaware of
the CLR.

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l9 level1 lfo8; tab-stops: list.5in">4)      Some unmanaged applications are aware of
the CLR’s requirements for an orderly shutdown.  An example is devenv.exe, which is the EXE
for Microsoft Visual Studio.  Starting with version 7, devenv calls
CoEEShutDownCOM to force all the CLR’s references on COM objects to be
Release()’d. 
This at least handles part of the managed shutdown in an orderly
fashion.  It’s
been a while since I’ve looked at that code, but I think that ultimately devenv
triggers an orderly managed shutdown through a 2nd API.

If you are following along with the
Rotor sources, this all leads to an interesting quirk of EEShutDown in
ceemain.cpp. 
That method can be called:

  • style="MARGIN: 0in 0in 0pt; mso-list: l7 level1 lfo9; tab-stops: list.5in"
    >0 times, if
    someone calls TerminateProcess.
  • style="MARGIN: 0in 0in 0pt; mso-list: l7 level1 lfo9; tab-stops: list.5in"
    >1 time, if
    someone initiates an unorderly shutdown via ExitProcess.
  • style="MARGIN: 0in 0in 0pt; mso-list: l7 level1 lfo9; tab-stops: list.5in"
    >2 times, if
    we have a single-threaded orderly shutdown.In this case, the first call is made
    outside of the OS loader lock.Later, we call ExitProcess for the 2nd half of the shutdown.  This causes
    EEShutDown to be called a 2nd time.
  • style="MARGIN: 0in 0in 0pt; mso-list: l7 level1 lfo9; tab-stops: list.5in"
    >Even more
    times, if we have a multi-threaded orderly shutdown.  Many threads will
    race to call EEShutDown the first time, outside the OS loader lock.  This routine
    protects itself by anointing a winner to proceed with the shutdown.  Then the eventual
    call to ExitProcess causes the OS to kill all threads except one, which calls
    back to EEShutDown inside the OS loader lock.

Of course, our passage through
EEShutDown is quite different when we are outside the OS loader lock, compared
to when we are inside it.  When we are outside, we do something like
this:

  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list.5in"
    >First we
    synchronize at the top of EEShutDown, to handle the case where multiple
    threads race via calls to Environment.Exit or some equivalent
    entrypoint.
  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list.5in"
    >Then we
    finalize all objects that are unreachable.This finalization sweep is absolutely
    normal and occurs while the rest of the application is still running.
  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list.5in"
    >Then we
    signal for the finalizer thread to finish its normal activity and participate
    in the shutdown. 
    The first thing it does is raise the AppDomain.ProcessExit event.  Once we get past
    this point, the system is no longer behaving normally.  You could either
    listen to this event, or you could poll System.Environment.HasShutdownStarted
    to discover this fact.This can be an important fact to discover
    in your Finalize method, because it’s more difficult to write robust
    finalization code when we have started finalizing reachable
    objects. 
    It’s no longer possible to depend on WaitHandles like Events, remoting
    infrastructure, or other objects.The other time we can finalize reachable
    objects is during an AppDomain unload.This case can be discovered by listening to
    the AppDomain.DomainUnload event or by polling for the
    AppDomain.IsFinalizingForUnload state.The other nasty thing to keep in mind is
    that you can only successfully listen to the ProcessExit event from the
    Default AppDomain. 
    This is something of a bug and I think we would like to try fixing it
    for the next release.
  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list.5in"
    >Before we
    can start finalizing reachable objects, we suspend all managed activity.  This is a
    suspension from which we will never resume.Our goal is to minimize the number of
    threads that are surprised by the finalization of reachable state, like static
    fields, and it’s similar to how we prevent entry to a doomed AppDomain when we
    are unloading it.
  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list.5in"
    >This
    suspension is unusual in that we allow the finalizer thread to bypass the
    suspension. 
    Also, we change suspended threads that are in STAs, so that they pump
    COM messages. 
    We would never do this during a garbage collection, since the
    reentrancy would be catastrophic.(Threads are suspended for a GC at pretty
    arbitrary places… down to an arbitrary machine code instruction boundary in
    many typical scenarios).But since we are never going to resume from
    this suspension, and since we don’t want cross-apartment COM activity to
    deadlock the shutdown attempt, pumping makes sense here.  This suspension
    is also unusual in how we raise the barrier against managed execution.  For normal GC
    suspensions, threads attempting to call from unmanaged to managed code would
    block until the GC completes.In the case of a shutdown, this could cause
    deadlocks when it is combined with cross-thread causality (like synchronous
    cross-apartment calls).Therefore the barrier behaves differently
    during shutdown. 
    Returns into managed code block normally.  But calls into
    managed code are failed.If the call-in attempt is on an HRESULT
    plan, we return an HRESULT.If it is on an exception plan, we
    throw.  The
    exception code we raise is 0xC0020001 and the argument to RaiseException is a
    failure HRESULT formed from the ERROR_PROCESS_ABORTED SCODE (0x1067).
  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list.5in"
    >Once all
    objects have been finalized, even if they are reachable, then we Release() all
    the COM pUnks that we are holding.Normally, releasing a chain of pUnks from a
    traced environment like the CLR involves multiple garbage collections.  Each collection
    discovers a pUnk in the chain and subsequently Release’s it.  If that Release
    on the unmanaged side is the final release, then the unmanaged pUnk will be
    free’d.  If
    that pUnk contains references to managed objects, those references will now be
    dropped.  A
    subsequent GC may now collect this managed object and the cycle begins
    again.  So a
    chain of pUnks that interleaves managed and unmanaged execution can require a
    GC for each interleaving before the entire chain is recovered.  During shutdown,
    we bypass all this. 
    Just as we finalize objects that are reachable, we also drop all
    references to unmanaged pUnks, even if they are reachable.

From the perspective of managed code, at
this point we are finished with the shutdown, though of course we perform many
more steps for the unmanaged part of the shutdown.

There are a couple of points to note
with the above steps.

  1. style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo11; tab-stops: list.5in"
    >We never
    unwind threads. 
    Every so often developers express their surprise that ‘catch’, ‘fault’,
    ‘filter’ and ‘finally’ clauses haven’t executed throughout all their threads
    as part of a shutdown.But we would be nuts to try this.  It’s just too
    disruptive to throw exceptions through threads to unwind them, unless we have
    a compelling reason to do so (like AppDomain.Unload).  And if those
    threads contain unmanaged execution on their threads, the likelihood of
    success is even lower.If we were on that plan, some small
    percentage of attempted shutdowns would end up with “Unhandled Exception /
    Debugger Attach” dialogs, for no good reason.

  1. style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo11; tab-stops: list.5in"
    >Along the
    same lines, developers sometimes express their surprise that all the
    AppDomains aren’t unloaded before the process exits.  Once again, the
    benefits don’t justify the risk or the overhead of taking these extra
    steps.  If
    you have termination code you must run, the ProcessExit event and Finalizable
    objects should be sufficient for doing so.

  1. style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo11; tab-stops: list.5in"
    >We run most
    of the above shutdown under the protection of a watchdog thread.  By this I mean
    that the shutdown thread signals the finalizer thread to perform most of the
    above steps. 
    Then the shutdown thread enters a wait with a timeout.  If the timeout
    triggers before the finalizer thread has completed the next stage of the
    managed shutdown, the shutdown thread wakes up and skips the rest of the
    managed part of the shutdown.It does this by calling ExitProcess.  This is almost
    fool-proof. 
    Unfortunately, if the shutdown thread is an STA thread it will pump COM
    messages (and SendMessages), while it is performing this watchdog blocking
    operation. 
    If it picks up a COM call into its STA that deadlocks, then the process
    will hang. 
    In a future release, we can fix this by using an extra thread.  We’ve hesitated
    to do so in the past because the deadlock is exceedingly rare, and because
    it’s so wasteful to burn a thread in this manner.

Finally, a lot more happens inside
EEShutDown than the orderly managed steps listed above.  We have some
unmanaged shutdown that doesn’t directly impact managed execution.  Even here we try
hard to limit how much we do, particularly if we’re inside the OS loader
lock.  If we
must shutdown inside the OS loader lock, we mostly just flush any logs we are
writing and detach from trusted services like the profiler or
debugger.

One thing we do not do during
shutdown is any form of leak detection.  This is somewhat controversial.  There are a number
of project teams at Microsoft which require a clean leak detection run whenever
they shutdown. 
And that sort of approach to leak detection has been formalized in
services like MSVCRT’s _CrtDumpMemoryLeaks, for external use.  The basic idea is
that if you can find what you have allocated and release it, then you never
really leaked it. 
Conversely, if you cannot release it by the time you return from your
DllMain then it’s a leak.

I’m not a big fan of that approach to
finding memory leaks, for a number of reasons:

  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list.5in"
    >The fact
    that you can reclaim memory doesn’t mean that you were productively using
    it.  For
    example, the CLR makes extensive use of “loader heaps” that grow without
    release until an AppDomain unloads.At that point, we discard the entire heap
    without regard for the fine-grained allocations within it.  The fact that we
    remembered where all the heaps are doesn’t really say anything about whether
    we leaked individual allocations within those heaps.
  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list.5in"
    >In a few
    well-bounded cases, we intentionally leak.For example, we often build little snippets
    of machine code dynamically.These snippets are used to glue together
    pieces of JITted code, or to check security, or twiddle the calling
    convention, or various other reasons.If the circumstances of creation are rare
    enough, we might not even synchronize threads that are building these
    snippets. 
    Instead, we might use a light-weight atomic compare/exchange
    instruction to install the snippet.Losing the race means we must discard the
    extra snippet. 
    But if the snippet is small enough, the race is unlikely enough, and
    the leak is bounded enough (e.g. we only need one such snippet per AppDomain
    or process and reclaim it when the AppDomain or process terminates), then
    leaking is perfectly reasonable.In that case, we may have allocated the
    snippet in a heap that doesn’t support free’ing.
  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list.5in"
    >This
    approach certainly encourages a lot of messy code inside the
    DLL_PROCESS_DETACH notification – which we all know is a very dangerous place
    to write code. 
    This is particularly true, given the way threads are wacked by the OS
    at arbitrary points of execution.Sure, all the OS CRITICAL_SECTIONs have
    been weakened. 
    But all the other synchronization primitives are still owned by those
    wacked threads. 
    And the weakened OS critical sections were supposed to protect data
    structures that are now in an inconsistent state.  If your shutdown
    code wades into this landmine of deadlocks and trashed state, it will have a
    hard time cleanly releasing memory blocks.Projects often deal with this case by
    keeping a count of all locks that are held.If this count is non-zero when we get our
    DLL_PROCESS_DETACH notification, it isn’t safe to perform leak detection.  But this leads to
    concerns about how often the leak detection code is actually executed.  For a while, we
    considered it a test case failure if we shut down a process while holding a
    lock.  But
    that was an insane requirement that was often violated in race
    conditions.
  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list.5in"
    >The OS is
    about to reclaim all resources associated with this process.  The OS will
    perform a faster and more perfect job of this than the application ever
    could.  From
    a product perspective, leak detection at product shutdown is about the least
    interesting time to discover leaks.
  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list.5in"
    > >DLL_PROCESS_DETACH notifications are delivered to
    different DLLs in a rather arbitrary order.I’ve seen DLLs either depend on brittle
    ordering, or I’ve seen them make cross-DLL calls out of their DllMain in an
    attempt to gain control over this ordering.This is all bad practice.  However, I must
    admit that in V1 of the CLR, fusion.dll & mscorwks.dll played this “dance
    of death” to coordinate their termination.Today, we’ve moved the Fusion code into
    mscorwks.dll.
  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list.5in"
    >I think
    it’s too easy for developers to confuse all the discipline surrounding this
    approach with actually being leak-free.The approach is so onerous that the goal
    quickly turns into satisfying the requirements rather than chasing
    leaks.

There are at least two other ways to
track leaks.

One way is to identify scenarios that
can be repeated, and then monitor for leaks during the steady-state of repeating
those scenarios. 
For example, we have a test harness which can create an AppDomain, load
an application into it, run it, unload the AppDomain, then rinse and
repeat.  The
first few times that we cycle through this operation, memory consumption
increases. 
That’s because we actually JIT code and allocate data structures to
support creating a 2nd AppDomain, or support
making remote calls into the 2nd AppDomain, or
support unloading that AppDomain.  More subtly, the ThreadPool might create –
and retain – a waiter thread or an IO thread.  Or the application may trigger the creation
of a new segment in the GC heap which the GC decides to retain even after the
incremental contents have become garbage.  This might happen because the GC decides it
is not productive to perform a compacting collection at this time.  Even the OS heap
can make decisions about thread-relative look-aside lists or lazy VirtualFree
calls.

But if you ignore the first 5 cycles of
the application, and take a broad enough view over the next 20 cycles of the
application, a trend becomes clear.  And if you measure over a long enough period,
paltry leaks of 8 or 12 bytes per cycle can be discovered.  Indeed, V1 of the
CLR shipped with a leak for a simple application in this test harness that was
either 8 or 12 bytes (I can never remember which).  Of that, 4 bytes
was a known leak in our design.  It was the data structure that recorded the
IDs of all the AppDomains that had been unloaded.  I don’t know if we’ve subsequently addressed
that leak.  But
in the larger scheme of things, 8 or 12 bytes is pretty impressive.

Recently, one of our test developers has
started experimenting with leak detection based on tracing of our unmanaged data
structures. 
Fortunately, many of these internal data structures are already described
to remote processes, to support out-of-process debugging of the CLR.  The idea is that we
can walk out from the list of AppDomains, to the list of assemblies in each one,
to the list of types, to their method tables, method bodies, field descriptors,
etc.  If we
cannot reach all the allocated memory blocks through such a walk, then the
unreachable blocks are probably leaks.

Of course, it’s going to be much harder
than it sounds. 
We twiddle bits of pointers to save extra state.  We point to the
interiors of heap blocks.  We burn the addresses of some heap blocks,
like dynamically generated native code snippets, into JITted code and then
otherwise forget about the heap address.  So it’s too early to say whether this
approach will give us a sound mechanism for discovering leaks.  But it’s certainly
a promising idea and worth pursuing.

Rambling Security
Addendum

Finally, an off-topic note as I close
down:

I haven’t blogged in about a month.  That’s because I
spent over 2 weeks (including weekends) on loan from the CLR team to the DCOM
team.  If
you’ve watched the tech news at all during the last month, you can guess
why.  It’s
security.

From outside the company, it’s easy to
see all these public mistakes and take a very frustrated attitude.  “When will
Microsoft take security seriously and clean up their act?”  I certainly
understand that frustration.  And none of you want to hear me whine about
how it’s unfair.

The company performed a much publicized
and hugely expensive security push.  Tons of bugs were filed and fixed.  More importantly,
the attitude of developers, PMs, testers and management was fundamentally
changed. 
Nobody on our team discusses new features without considering security
issues, like building threat models.  Security penetration testing is a fundamental
part of a test plan.

Microsoft has made some pretty strong
claims about the improved security of our products as a result of these
changes.  And
then the DCOM issues come to light.

Unfortunately, it’s still going to be a
long time before all our code is as clean as it needs to be.

Some of the code we reviewed in the DCOM
stack had comments about DGROUP consolidation (remember that precious 64KB
segment prior to 32-bit flat mode?) and OS/2 2.0 changes.  Some of these
source files contain comments from the ‘80s.  I thought that Win95 was ancient!

I’ve only been at Microsoft for 6
years.  But
I’ve been watching this company closely for a lot longer, first as a customer at
Xerox and then for over a decade as a competitor at Borland and Oracle.  For the greatest
part of Microsoft’s history, the development teams have been focused on enabling
as many scenarios as possible for their customers.  It’s only been for
the last few years that we’ve all realized that many scenarios should never be
enabled.  And
many of the remainder should be disabled by default and require an explicit
action to opt in.

One way you can see this change in the
company’s attitude is how we ship products.  The default installation is increasingly
impoverished. 
It takes an explicit act to enable fundamental goodies, like
IIS.

Another hard piece of evidence that
shows the company’s change is the level of resource that it is throwing at the
problem. 
Microsoft has been aggressively hiring security experts.  Many are in a new
Security Business Unit, and the rest are sprinkled through the product
groups.  Not
surprisingly, the CLR has its own security development, PM, test and penetration
teams.

I certainly wasn’t the only senior
resource sucked away from his normal duties because of the DCOM alerts.  Various folks from
the Developer Division and Windows were handed over for an extended period.  One of the other
CLR architects was called back from vacation for this purpose.

We all know that Microsoft will remain a
prime target for hacking.  There’s a reason that everyone attacks
Microsoft rather than Apple or Novell.  This just means that we have to do a lot
better.

Unfortunately, this stuff is still way
too difficult. 
It’s a simple fact that only a small percentage of developers can write
thread-safe free-threaded code.  And they can only do it part of the
time.  The
state of the art for writing 100% secure code requires that same sort of
super-human attention to detail.  And a hacker only needs to find a single
exploitable vulnerability.

I do think that managed code can avoid
many of the security pitfalls waiting in unmanaged code.  Buffer overruns are
far less likely. 
Our strong-name binding can guarantee that you call who you think you are
calling. 
Verifiable type safety and automatic lifetime management eliminate a
large number of vulnerabilities that can often be used to mount security
attacks. 
Consideration of the entire managed stack makes simple luring attacks
less likely. 
Automatic flow of stack evidence prevents simple asynchronous luring
attacks from succeeding.  And so on.

But it’s still way too
hard.  Looking
forwards, a couple of points are clear:

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l1 level1 lfo5; tab-stops: list.5in">1)      We need to focus harder on the goal that
managed applications are secure, right out of the box.  This means
aggressively chasing the weaknesses of our present system, like the fact that
locally installed assemblies by default run with FullTrust throughout their
execution.  It
also means static and dynamic tools to check for security holes.

style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l1 level1 lfo5; tab-stops: list.5in">2)      No matter what we do, hackers will find
weak spots and attack them.  The very best we can hope for is that we can
make those attacks rarer and less effective.

I’ll add managed security to my list for
future articles.


http://blogs.msdn.com/cbrumme/archive/2003/08/20/51504.aspx

Claimspace, a Long Tail Recognition System

Filed under: Live Chat software

Robert Rebholz is not only my boss*, he is also my muse, ideological sparring partner, alter ego, and mentor. Bob is possesed by a special kind of genius, with a sort of Jeffersonian breadth and intensity that makes it a pleasure and honor to collaborate with him, on a day-to-day basis. In my opinion, Bob is one of two people on Earth who can talk about the BIG idea that is Claimspace , with absolute confidence, competence, and credibility. If you have even a passing interest in online communities of practice, folksonomies , reputation systems, credibility, identity, recommendation systems, rewards, “flow”, collaborative filtering, “social search”, & related areas, I encourage you to subscribe to my RSS feed and Bob’s RSS feed.

Yesterday, Bob posted an excellent post about Claimspace  that wades into the broad river of uses that it might one day support, for both users and “community owners”, across the Web. He cites the following potential uses:

  • “Long tail recognition system”
  • Solution to the “Who can I trust? issue”
  • “Generalized polling mechanism” (and portable)
  • “A simple REST API gives everyone (and I mean everyone — the mashup possibilities are just staggering — caveat, keep the crawl, walk, run idea in mind) the ability use the data in a manner best suited to their needs: community (MVP or other influencer) reward programs, product design input, product feature voting, bug prioritization, and on and on and on, all without a ton of custom code. Any Digg-like application would love this kind of data. Can you imagine – hottest claims, hottest people making claims, most used claims, newest claims, by product, by solution area, by geographical region, and the list goes on.”

  • Lastly but not leastly… Bob identifies the possibility of using Claimspace as a bizarro substitute for a traditional, taxonomically hobbled, binary choice or n-scale rating system, which he describes thusly: “Claims can be created and applied by anyone, including the people hosting the community. They could be built right into the forums application, for instance, to support assertions or claims such as “was this post helpful”, or “this post answers the question asked”. A library team could, for instance, create several standard claims (a claim/assertion taxonomy) that relate to the quality or usefulness of the posted library content.”

Alas, it is true. Perversion will occur.

Alas, we must accomodate the taxonomy-doers and guide them to the right path, if we can. But Claimspace is a folksonomy.

Personally, I believe that the taxonomy-doers will come to see the futility of their ways and that if they don’t, they will lose the vast majority of their customers, over time. For example, if Typepad disallows xClaims and Blogger allows them, Typepad runs the risk of making Blogger appear to be a much better blogging platform than it actually is, relative to Typepad ;-) .

It’s tempting and easy to impose one’s way of thinking on others; to deprive one’s minions or customers of the ability to control the means by which resources of their creation are published, organized, discovered, and evaluated by other people. The organization of information, access to publication mechanisms, and permission to cite, annotate, edit, and otherwise alter the organization or substance of information resources or its metadata, both online and offline has ALWAYS been closely and jealously guarded. Those who control ”the tree” of information control you. The taxonomy-doers derive personal benefits from that control, often at our expense and often, in the absence of compensatory benefits. In many cases, taxonomies are indeed helpful. But Claimspace is designed and is being developed primarily as a “folksonomical” resource rating system. As such, Claimspace has the potential to be medium of social evaluation that empowers the little people: you, me and millions of other self-publishers, to gain recognition and evaluate credibility, on our terms, rather than in a way that is strictly and uniformly defined by AOL or Microsoft or O’Reilly or Yahoo or Google.

Are you subscribed to this blog ?

*Note that this is the first time I’ve mentioned Bob, in my blog. Talking about one’s boss in a public forum is tricky, both socially (wrt personal credibility) and from a career perspective. However, I feel that Bob’s ideas deserve your recognition, as they command my attention and respect, and not just because he looks like Zod.


http://blogs.msdn.com/korbyp/archive/2007/05/17/claimspace-a-long-tail-recognition-system.aspx

Merry Christmas Indeed!

Filed under: Live Chat software

Janice went all out this year and got me an Ibanez JS1000 (Joe Satriani series) guitar, a Line 6 POD X3 Live effects board and a pair of Roland CM-30 amplified monitors. My fingers are all tore up now since I’ve been out of practice for some time now. But it sure is fun to get back to some jamming. The JS1000 is pretty light and has easy action. Combined with the POD X3 I can get quite a variety of amazing sounds. I even got the X3 hooked up to my MacBook Pro and finally was able to try out Garage Band. I was able to lay down the rhythm track for Crushing Day (what I could remember from back in the day) and then play the lead part over it with no lag. The roland CM-30s are nice because I can run my Alesis QS8 and the POD X3 into them at the same time. This is probly the best setup I’ve ever had.

Later today I hooked up a microphone to the X3 and the kids had a blast talking and play singing into it. “Daddy it sounds kinda like I’m in a cave…”. Perhaps I should cut down some of that reverb. :)  

Santa Clause was good to me this year. (Thanks Janice)


http://weblogs.asp.net/dfindley/archive/2007/12/26/merry-christmas-indeed.aspx

Get free blog up and running in minutes with Blogsome
Theme designed by Jay of onefinejay.com