Bug 77

Summary: Venue Server caused VenueClient and Nodemanager to hang
Product: Virtual Venue Server Software Reporter: Shawn Davis <wdavis@ncsa.uiuc.edu>
Component: Virtual Venue ServerAssignee: Robert Olson <olson@mcs.anl.gov>
Severity: critical CC: lefvert@mcs.anl.gov
Priority: P2    
Version: 2.0-beta1   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Bug Depends on:    
Bug Blocks: 79    

Description From 2003-02-19 15:48:34
VenueClient and NodeManagement hung when trying to change venues, same 
happened for another user connected to the same venue.  VenueServer continued 
to checkpoint properly.
When I restarted the venueserver, the venueclient and nodemanagement came back 
to life and entered the venue properly.
------- Comment #1 From 2003-02-22 10:42:54 -------

Is this still happening with Alpha 3?

------- Comment #2 From 2003-02-24 15:18:26 -------
And was it perhaps an expired credential? If so this is another of your we're 
working on :-)/
------- Comment #3 From 2003-02-24 15:51:26 -------
No, wasn't related to the expired credentials.  

As to whether or not I'm seeing it in Alpha 3, I have not yet encountered it, 
but will notify you if it happens again.
------- Comment #4 From 2003-02-27 13:51:47 -------
I am encountering similar behavior trying to connect to the transitional venue 
server right now.  So maybe it is still present in alpha 3?
------- Comment #5 From 2003-02-27 18:47:20 -------
If I read your timestamp right, you were seeing it at about 1:51PM CST.  That 
should have been before I restarted it with the latest code base which fixes a 
few bugs.

Let me know if you see it again and/or if you cna figure out how to 
deterministically cause this behavior.

------- Comment #6 From 2003-03-12 02:00:28 -------
Can you confirm this is still broken?  I'm confounded as to how to track this 
down :-)
------- Comment #7 From 2003-03-19 12:22:06 -------
yes.  this bug still exists. it should be noted that any applications that are 
connected to the venueserver at the time the lockup occurs hang also.

The applications can be unfrozen by manually breaking one point of the 
communication that is causing the deadlock.  So, if the hang was caused by 
communications between a venueserver and a venueclient, killing one of the two 
processes resumes normal operation of the rest of the processes involved.  

I just had this lockup occur, and this time, I chose to kill the venueclient 
instead of the server.  The following output was sent to the venueserver 
Exception happened during processing of request from ('', 2369)
Traceback (most recent call last):
  File "c:\python22\lib\SocketServer.py", line 221, in handle_request
    self.process_request(request, client_address)
  File "c:\python22\lib\SocketServer.py", line 240, in process_request
    self.finish_request(request, client_address)
  File "c:\python22\lib\SocketServer.py", line 253, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "c:\python22\lib\SocketServer.py", line 514, in __init__
  File "c:\python22\lib\BaseHTTPServer.py", line 266, in handle
  File "C:\Python22\lib\site-
 line 3928, in do_POST
  File "c:\python22\lib\BaseHTTPServer.py", line 313, in send_response
    self.wfile.write("%s %s %s\r\n" %
  File "C:\Python22\lib\site-packages\pyGlobus\io.py", line 364, in write
    self.sock.write(str, len(str))
  File "C:\Python22\lib\site-packages\pyGlobus\io.py", line 253, in write
    raise ex
IOBaseException: a system call failed (Invalid argument)

There is no output in any of the logs that provides indication of an error.

The venue management tool that was also frozen came back to life as soon as 
the venueclient was killed.
I was then able to reconnect my venueclient to the same instance of the 
venueserver and it operated as expected.
------- Comment #8 From 2003-04-21 15:58:59 -------
Can you try to make this happen in beta 3?
------- Comment #9 From 2003-04-29 09:19:18 -------
*** Bug 174 has been marked as a duplicate of this bug. ***
------- Comment #10 From 2003-05-29 18:50:57 -------
I'm going to reclassify this as the "general stability bug" and reassign it to 
Bob who's actively working on this.
------- Comment #11 From 2003-05-29 18:52:56 -------
*** Bug 295 has been marked as a duplicate of this bug. ***
------- Comment #12 From 2003-08-14 17:30:12 -------
Resolving, much of the underlying instability we saw has been replaced.
if we see more problems, please refile.