Bugzilla – Bug 1538
Gatekeeper log rotation and logging job accounting info
Last modified: 2008-08-15 04:44:09
You need to
before you can comment on or make changes to this bug.
We have here several related patches. It's kind of a big lump, but:
1) They are well-tested because they have been used heavily by EDG and LCG.
2) They are very useful.
There are two features added:
1) Rotate the gatekeeper's log upon receiving SIGUSR1
2) Do some logging for job accounting purposes. By default, this goes into the
gatekeeper log, but it can be configured to go into another log. This is
desired by MANY users, not just EDG and LCG.
These features are combined since the accounting log will also be rotated.
These patches are in the VDT, and we really hope we can stop distributing a
modified version of Globus. I hope we can work together to find a way to get
this, or something similar, into an upcoming version of Globus. I realize that
these are big patches: please let me know what we can do to work effectively
with you on these patches.
Let me repeat: these are well-tested patches.
I will add the patches as attachments one at a time after the initial bug
Please talk to me if there is any confusion or questions. I'm happy to discuss
-alain, from the VDT
Created an attachment (id=302) [details]
This patch modifies the gatekeeper in two ways:
1) Setup the job accounting log file and inform job managers about it.
2) Rotate gatekeeper and accounting log files when SIGUSR1 is received
Created an attachment (id=303) [details]
Job Manager patch
This modifies the job manager to allow logging of job accounting information.
Created an attachment (id=304) [details]
LSF Accounting patch
This allow LSF to log extra information about the job for the job accounting.
You get basic accounting without it, this just gives more information.
Created an attachment (id=305) [details]
Patch to find-lsf-tools
The LSF accounting patch uses bacct, so find-lsf-tools needs to be modified to
find it. This patch does that. It is a trivial patch.
Created an attachment (id=306) [details]
patch for globus-script-lsf-queue
If LSF returns FAILED, then the accounting doesn't work, so this patch changes
that. This is the part I'm least confident off: David, will users will see
failed jobs properly?
For LSF the EXIT state means that a job finished with a non zero exit code. (In
addition it can also happen if the job is removed from the batch system)
Depening how LSF is setup the 'job exit code' may be the exit code from the
user's job submission script - but in general it is the exit code of the
administrator defined 'LSF job starter' (also often a script). Therefore it's
not clear, for every site, what EXIT will imply regarding the success of the
system to run the job. (That was what I imagined was relevant for the globus job
Of course we could consider other approaches - for instance to exit with a zero
result after the user command, as written in the LSF submission script generated
by the lsf jobmanager. If there are then globus submitted jobs in EXIT we could
guess something went wrong. However this implies that we expect the user job
return code to be returned by the LSF job starter - possibly not true
everywhere. The simplest way forward appeared to be not to try to use the LSF
DONE/EXIT state information to determine the globus job state.
I am doubtful that this patch will make it into the 3.2 release. We are
testing release candidates now and hope to have the 3.2 beta out early next
week. After beta, only bug fixes will be applied. A patch of this
significance seems to risky. I would anticipate this patch being applied to
the 4.0 release.
I am removing this enhancement from the 4.0 target milestone. I don't think we are going
to have the manpower to review, apply and test this in time for the 4.0 release.
Subject: Re: Gatekeeper log rotation and logging job
>I am removing this enhancement from the 4.0 target milestone. I don't
>think we are going to have the manpower to review, apply and test this in
>time for the 4.0 release.
Really? That's too bad!
I think that the accounting log is not only quite simple, but is incredibly
useful. Is there anything I can do to help make the process easier?
For some reason bugzilla attributed comment 9 to "d arroyo" when it should have
been attributed to Alain Roy. I'll see why the email interface behaved like
that, but wanted to correct the attribution for the record.
Now that Globus 4.0 is being delayed, is it possible to include this patch
*** Bug 4771 has been marked as a duplicate of this bug. ***
The patches for these are committed to 4.2 branch and trunk.