Bug 5731 - Add functionality to rotate the condor seg log file
: Add functionality to rotate the condor seg log file
Status: RESOLVED DUPLICATE of bug 5820
: development
: Macintosh All
: P2 normal
: ---
Assigned To:
  Show dependency treegraph
Reported: 2007-12-14 13:11 by
Modified: 2008-02-05 10:36 (History)



You need to log in before you can comment on or make changes to this bug.

Description From 2007-12-14 13:11:27
This campaign was prompted from comments in bug 3910.


Currently GRAM does not provide the functionality to rotate the condor SEG log
file used by the GRAM4 service.  This has been requested by Alain Roy of
VDT/OSG.  Two ways to handle this are:

1) Use Condor's system-side log file for all jobs (in development).

Condor is coming out with a new system-wide log file that can be used similar
to how GRAM4/seg interfaces with other LRMs (PBS, LSF).  But this has some
reliability issues if the GRAM4 services/SEG is down and the LRMs job info
moves through the rotated log file and is never seen by the SEG.  A job would
hang indefinitely.  As long as the rotated log files can have a minimum
lifetime (like no log file will be rotated away before it is 2 days old), this
seems like a safe enough system.  But this system-side condor log file is in
the development version and will not likely be deployed soon.  So something is
needed in the interim.

2) GRAM4 continues to control the creation of the Condor SEG log file and
coordinates with condor daemons to safely rotate the log files.

Currently, GRAM tells condor to write all log information to a single condor
seg file.  I think we can stick with this method and implement a safe logfile
rotation system.  But there will be many condor writers to this file, so we
will need to manage this in a way that does not effect them or cause loss of
information.  I discussed with Jaime about some options here and how condor
writes to the specified log file.  When writing to a log file, condor will do
the following:

open (for create)
 lock (using one of 2 methods depending on platform: flock or fcntl)

A Condor SEG log rotator (CSLR) could be written to do the following:
open (for read)
 lock (possibly using both flock *and* fcntl in order to assure the lock for
any platform)
   <note: condor has C++ functions for performing the file locks that could be
   Read all contents from the condor seg file and write to a temporary file.
   if the move is successful, then Truncate the seg log file

Rename the temp file to be in line with the rotation scheme.
Check all old/expired condor seg log files that need to be removed.

A program to actually perform the locks is necessary in order to prevent the
condor processes to create the log file in case it is not there.  If that
happens then the file would be owned by the user and the rotator would not be
able to remove it.

When the GRAM service starts up, it could also start up the Condor SEG log
rotator.  Or maybe the rotator can be started up by the SEG itself?

Something to consider for #2 is to use other log rotating software and extend
it with the necessary locking hooks to make this safe.

VDT supplies and uses logrotate (http://www.rt.com/man/logrotate.8.html) The
prerotate and postrotate commands may provide the ability to lock/unlock the
file in order to allow for safe rotations.
------- Comment #1 From 2008-01-22 09:19:49 -------
*** Bug 3912 has been marked as a duplicate of this bug. ***
------- Comment #2 From 2008-01-22 10:13:51 -------
Condor uses fcntl() to lock the job log file on all currently-supported
platforms. It locks the entire file. That should simplify things for the SEG.
------- Comment #3 From 2008-02-05 10:36:23 -------

*** This bug has been marked as a duplicate of 5820 ***