Bugzilla – Bug 5731
Add functionality to rotate the condor seg log file
Last modified: 2008-02-05 10:36:23
You need to
before you can comment on or make changes to this bug.
This campaign was prompted from comments in bug 3910.
Currently GRAM does not provide the functionality to rotate the condor SEG log
file used by the GRAM4 service. This has been requested by Alain Roy of
VDT/OSG. Two ways to handle this are:
1) Use Condor's system-side log file for all jobs (in development).
Condor is coming out with a new system-wide log file that can be used similar
to how GRAM4/seg interfaces with other LRMs (PBS, LSF). But this has some
reliability issues if the GRAM4 services/SEG is down and the LRMs job info
moves through the rotated log file and is never seen by the SEG. A job would
hang indefinitely. As long as the rotated log files can have a minimum
lifetime (like no log file will be rotated away before it is 2 days old), this
seems like a safe enough system. But this system-side condor log file is in
the development version and will not likely be deployed soon. So something is
needed in the interim.
2) GRAM4 continues to control the creation of the Condor SEG log file and
coordinates with condor daemons to safely rotate the log files.
Currently, GRAM tells condor to write all log information to a single condor
seg file. I think we can stick with this method and implement a safe logfile
rotation system. But there will be many condor writers to this file, so we
will need to manage this in a way that does not effect them or cause loss of
information. I discussed with Jaime about some options here and how condor
writes to the specified log file. When writing to a log file, condor will do
open (for create)
lock (using one of 2 methods depending on platform: flock or fcntl)
A Condor SEG log rotator (CSLR) could be written to do the following:
open (for read)
lock (possibly using both flock *and* fcntl in order to assure the lock for
<note: condor has C++ functions for performing the file locks that could be
Read all contents from the condor seg file and write to a temporary file.
if the move is successful, then Truncate the seg log file
Rename the temp file to be in line with the rotation scheme.
Check all old/expired condor seg log files that need to be removed.
A program to actually perform the locks is necessary in order to prevent the
condor processes to create the log file in case it is not there. If that
happens then the file would be owned by the user and the rotator would not be
able to remove it.
When the GRAM service starts up, it could also start up the Condor SEG log
rotator. Or maybe the rotator can be started up by the SEG itself?
Something to consider for #2 is to use other log rotating software and extend
it with the necessary locking hooks to make this safe.
VDT supplies and uses logrotate (http://www.rt.com/man/logrotate.8.html) The
prerotate and postrotate commands may provide the ability to lock/unlock the
file in order to allow for safe rotations.
*** Bug 3912 has been marked as a duplicate of this bug. ***
Condor uses fcntl() to lock the job log file on all currently-supported
platforms. It locks the entire file. That should simplify things for the SEG.
*** This bug has been marked as a duplicate of 5820 ***