Bugzilla – Bug 5617
GRAM4 seg hangs with fork jobs
Last modified: 2008-07-18 14:11:38
You need to
before you can comment on or make changes to this bug.
SEG hangs when the logfile, globus-fork.log, is on NFS.
I traced this down to the "fork_starter.c" code. Specifically, the permission
of this file is set to 622. The open/write c-code in "fork_starter.c" has a
do-while loop over "fcntl" function (see below). If file is mounted on NFS, rc
returns as '-1' and errno as 11 (EAGAIN) because it cannot get a write-lock
over NFS. (Note, I am running rpc.rstatd on the system) This becomes an
infinite loop as all subsequent calls to fcntl return the same result. If I
change the file permission to '666', then it proceeds normally. Or if I ignore
setting a write lock by changing the code (my simple test program) putting rc=0
into the EAGAIN case, it works fine.
below is the loop.
rc = fcntl(logfd, F_SETLKW, &lock);
if (rc < 0)
rc = 1;
globus_assert(errno != EBADF);
globus_assert(errno != EDEADLK);
globus_assert(errno != EFAULT);
rc = 1;
while (rc == 1);
*** Bug 5620 has been marked as a duplicate of this bug. ***
Thanks for the report Jeff. Joe is off til the end of the month, but we should
be able to make this change for 4.0.6.
Joe and I discussed this some. Seems this needs further investigation. I'm
removing the 4.0.6 milestone.
I've put a new version of the globus fork starter in
which should detect errors better in this situation better before the job is
started and report them. I don't have access to a system that doesn't have
working fnctl locks, so I can't verify that this catches errors properly. If
this detects the problem for you, we can probably call that program in the
setup package to check that the logging file will work in practice.
Any feedback on this patched version?
This fix is committed to 4.2 branch (for 4.2.1) and 4.0 branch (for 4.0.8) and