Bugzilla – Bug 1538
Gatekeeper log rotation and logging job accounting info
Last modified: 2008-08-15 04:44:09
You need to log in before you can comment on or make changes to this bug.
We have here several related patches. It's kind of a big lump, but: 1) They are well-tested because they have been used heavily by EDG and LCG. 2) They are very useful. There are two features added: 1) Rotate the gatekeeper's log upon receiving SIGUSR1 2) Do some logging for job accounting purposes. By default, this goes into the gatekeeper log, but it can be configured to go into another log. This is desired by MANY users, not just EDG and LCG. These features are combined since the accounting log will also be rotated. These patches are in the VDT, and we really hope we can stop distributing a modified version of Globus. I hope we can work together to find a way to get this, or something similar, into an upcoming version of Globus. I realize that these are big patches: please let me know what we can do to work effectively with you on these patches. Let me repeat: these are well-tested patches. I will add the patches as attachments one at a time after the initial bug submission. Please talk to me if there is any confusion or questions. I'm happy to discuss this further. Thanks! -alain, from the VDT
Created an attachment (id=302) [details] Gatekeeper patch This patch modifies the gatekeeper in two ways: 1) Setup the job accounting log file and inform job managers about it. 2) Rotate gatekeeper and accounting log files when SIGUSR1 is received
Created an attachment (id=303) [details] Job Manager patch This modifies the job manager to allow logging of job accounting information.
Created an attachment (id=304) [details] LSF Accounting patch This allow LSF to log extra information about the job for the job accounting. You get basic accounting without it, this just gives more information.
Created an attachment (id=305) [details] Patch to find-lsf-tools The LSF accounting patch uses bacct, so find-lsf-tools needs to be modified to find it. This patch does that. It is a trivial patch.
Created an attachment (id=306) [details] patch for globus-script-lsf-queue If LSF returns FAILED, then the accounting doesn't work, so this patch changes that. This is the part I'm least confident off: David, will users will see failed jobs properly?
Hi, For LSF the EXIT state means that a job finished with a non zero exit code. (In addition it can also happen if the job is removed from the batch system) Depening how LSF is setup the 'job exit code' may be the exit code from the user's job submission script - but in general it is the exit code of the administrator defined 'LSF job starter' (also often a script). Therefore it's not clear, for every site, what EXIT will imply regarding the success of the system to run the job. (That was what I imagined was relevant for the globus job state) Of course we could consider other approaches - for instance to exit with a zero result after the user command, as written in the LSF submission script generated by the lsf jobmanager. If there are then globus submitted jobs in EXIT we could guess something went wrong. However this implies that we expect the user job return code to be returned by the LSF job starter - possibly not true everywhere. The simplest way forward appeared to be not to try to use the LSF DONE/EXIT state information to determine the globus job state. Yours, David
Alain, I am doubtful that this patch will make it into the 3.2 release. We are testing release candidates now and hope to have the 3.2 beta out early next week. After beta, only bug fixes will be applied. A patch of this significance seems to risky. I would anticipate this patch being applied to the 4.0 release. -Stu
Alain, I am removing this enhancement from the 4.0 target milestone. I don't think we are going to have the manpower to review, apply and test this in time for the 4.0 release. -Stu
Subject: Re: Gatekeeper log rotation and logging job accounting info >I am removing this enhancement from the 4.0 target milestone. I don't >think we are going to have the manpower to review, apply and test this in >time for the 4.0 release. Really? That's too bad! I think that the accounting log is not only quite simple, but is incredibly useful. Is there anything I can do to help make the process easier? Thanks, -alain
For some reason bugzilla attributed comment 9 to "d arroyo" when it should have been attributed to Alain Roy. I'll see why the email interface behaved like that, but wanted to correct the attribution for the record.
Now that Globus 4.0 is being delayed, is it possible to include this patch into it? Thanks, -alain
*** Bug 4771 has been marked as a duplicate of this bug. ***
The patches for these are committed to 4.2 branch and trunk.