Bug 1546 - File descriptor leak & potential deadlock
: File descriptor leak & potential deadlock
: 0.1
: PC All
: P2 major
: ---
Assigned To:
: 6192
  Show dependency treegraph
Reported: 2004-02-13 14:06 by
Modified: 2008-07-18 14:35 (History)

Patch for problem described above (984 bytes, patch)
2004-02-13 14:06, Alain Roy
Some suggested changes for globus_io_securesocket.c (1.28 KB, patch)
2004-03-01 07:29, David Smith


You need to log in before you can comment on or make changes to this bug.

Description From 2004-02-13 14:06:13
This is a patch for two GRAM problems: 

1) There is a file descriptor leak. 
2) There is a fix for a potential deadlock problem. 

These were developed by David Smith at CERN and tested as part of the VDT.

I will attach the patch separately. 

------- Comment #1 From 2004-02-13 14:06:45 -------
Created an attachment (id=310) [details]
Patch for problem described above

Patch for problem described above
------- Comment #2 From 2004-02-26 12:18:33 -------
The first problem (FD leak) ought to have been fixed in Globus 2.4. There was a 
problem in Globus I/O where failed connects would result in a FD leak, but that 
was fixed a long time ago (see bug #237). Can you verify that this problem 
exists with a recent version of the code (and identify version what you are 
using). A test case sent in with the patches would be excellent. 

The deadlock problem has been fixed already. See bug #1566 for details on bug 
and the fix I applied. 
------- Comment #3 From 2004-02-27 18:50:54 -------
the first problem obviously was *not* fixed in 2.4!

Look at the original globus_2_4_3 code in the attachment!

We found the leak running the Condor-G "gahp_server" under "valgrind",
so this is not some theoretical issue: it actually was a problem!

Please do apply the fix; it has been carefully tested.  Thanks.

------- Comment #4 From 2004-03-01 07:27:19 -------
Hi Joe,

Regarding the first problem addressed by this patch (file descriptor leak):

The version being discussed is globus-2.4.3, it's identified in the patch text,
although not directly in the bug comment. 

The patch is forcing a close on the connection handle from a failed connect.
(The close was previously conditional on whether it was a failed connect or
whether a subsequent write register failed). Infact there is still some room for
tidying up the handling of the error objects here, although as written it
doesn't cause any problem. (As an asside there are several places in the
existing code where error objects are never released, although given the small
finite size of the globus object cache it don't see that it causes any problems)

In this case I think there is argument for saying that, given the connection
handle should always be closed before globus_l_gram_protocol_connect_callback()
is called with an error, the close is indeed not necessary. However we've seen
that sometimes the failed connection handle is _not_ closed and thus we suffer
from the file descriptor leak. So in the end I made a decision to always attempt
a close when there is an error in this routine, rather than distinguish whether
the error was a connection error or some other error in the callback processing.

If you prefer to enforce the close in the connection logic that would also be
very accepable:

A test case for this could be attempting a connection (using a secure
authentication mode) to a destination that accepts the connection, receives the
token but then drops the connection. I had started to look into the lower level
connection handling, with the intention of including the required patch in some
future suggestions. Indeed I have one solution now, although I haven't convinced
myself that it really cannot cause any existing code to misbehave. I include it
now so that you can follow that up, if you like.

(see attached patch to globus_io_securesocket.c)

Many thanks,
------- Comment #5 From 2004-03-01 07:29:39 -------
Created an attachment (id=328) [details]
Some suggested changes for globus_io_securesocket.c

Some possible changes to ensure that failed connection attempts close the
connection handle before calling the connection callback routine.
------- Comment #6 From 2004-03-01 09:38:39 -------
I'm redirecting this bug to the I/O guys. You should definitely try without 
this patch with the 3.2 beta---Globus I/O has been reworked as a compatibility 
layer on top of XIO, so the bugs might be a bit different. 
------- Comment #7 From 2004-03-01 11:59:20 -------
The semantics of not requiring a close after a failed connect will remain.  The 
fd leak appears to be possible when authentication fails mid way through the 

Since 3.2 now uses a globus io compatibility wrapper over globus xio, the 
priority for fixing this will be pretty low.  As a work around, it does not 
hurt to always call globus_io_register_close().  If the handle has already been 
destroyed, the call will just fail.

As for error objects not being released, this is the way they're intended to be 
used.  As you note, the error object cache size is finite, so it's not a real 
leak.  The system was designed in such a way to allow you to ignore return 
codes.  If you're not going to do anything with an error object, you have no 
reason to clutter your code with get-ing and free-ing the object.