|
Bugzilla – Full Text Bug Listing |
| Summary: | Venue Server caused VenueClient and Nodemanager to hang | ||
|---|---|---|---|
| Product: | Virtual Venue Server Software | Reporter: | Shawn Davis <wdavis@ncsa.uiuc.edu> |
| Component: | Virtual Venue Server | Assignee: | Robert Olson <olson@mcs.anl.gov> |
| Status: | RESOLVED FIXED | ||
| Severity: | critical | CC: | lefvert@mcs.anl.gov |
| Priority: | P2 | ||
| Version: | 2.0-beta1 | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Windows XP | ||
| Bug Depends on: | |||
| Bug Blocks: | 79 | ||
Shawn, Is this still happening with Alpha 3? --Ivan
And was it perhaps an expired credential? If so this is another of your we're working on :-)/
No, wasn't related to the expired credentials. As to whether or not I'm seeing it in Alpha 3, I have not yet encountered it, but will notify you if it happens again.
I am encountering similar behavior trying to connect to the transitional venue server right now. So maybe it is still present in alpha 3?
If I read your timestamp right, you were seeing it at about 1:51PM CST. That should have been before I restarted it with the latest code base which fixes a few bugs. Let me know if you see it again and/or if you cna figure out how to deterministically cause this behavior. --Ivan
Can you confirm this is still broken? I'm confounded as to how to track this down :-)
yes. this bug still exists. it should be noted that any applications that are
connected to the venueserver at the time the lockup occurs hang also.
The applications can be unfrozen by manually breaking one point of the
communication that is causing the deadlock. So, if the hang was caused by
communications between a venueserver and a venueclient, killing one of the two
processes resumes normal operation of the rest of the processes involved.
I just had this lockup occur, and this time, I chose to kill the venueclient
instead of the server. The following output was sent to the venueserver
console.
----------------------------------------
Exception happened during processing of request from ('141.142.66.181', 2369)
Traceback (most recent call last):
File "c:\python22\lib\SocketServer.py", line 221, in handle_request
self.process_request(request, client_address)
File "c:\python22\lib\SocketServer.py", line 240, in process_request
self.finish_request(request, client_address)
File "c:\python22\lib\SocketServer.py", line 253, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "c:\python22\lib\SocketServer.py", line 514, in __init__
self.handle()
File "c:\python22\lib\BaseHTTPServer.py", line 266, in handle
method()
File "C:\Python22\lib\site-
packages\AccessGrid\hosting\pyGlobus\AGGSISOAP.py",
line 3928, in do_POST
self.send_response(status)
File "c:\python22\lib\BaseHTTPServer.py", line 313, in send_response
self.wfile.write("%s %s %s\r\n" %
File "C:\Python22\lib\site-packages\pyGlobus\io.py", line 364, in write
self.sock.write(str, len(str))
File "C:\Python22\lib\site-packages\pyGlobus\io.py", line 253, in write
raise ex
IOBaseException: a system call failed (Invalid argument)
----------------------------------------
There is no output in any of the logs that provides indication of an error.
The venue management tool that was also frozen came back to life as soon as
the venueclient was killed.
I was then able to reconnect my venueclient to the same instance of the
venueserver and it operated as expected.
Can you try to make this happen in beta 3?
*** Bug 174 has been marked as a duplicate of this bug. ***
I'm going to reclassify this as the "general stability bug" and reassign it to Bob who's actively working on this.
*** Bug 295 has been marked as a duplicate of this bug. ***
Resolving, much of the underlying instability we saw has been replaced. if we see more problems, please refile.