Bugzilla – Bug 5820
Improve Condor Logfile Processing in GRAM
Last modified: 2010-07-06 09:35:05
You need to
before you can comment on or make changes to this bug.
CAMPAIGN: Improve Condor Logfile Processing in GRAM
Technologies: GRAM2, GRAM4, Condor LRM
The implementation of the LRM for Condor has problems when used in
high-activity systems because all job state changes are stored in a single log
file. The condor log files tend to get extremely large.
GRAM2 parses the entire log file each time a job is polled, causing
performance problems. GRAM4's SEG relies on the condor log to
remain a stream, so it cannot be rotated safely by system administrators while
the Globus Container is running. Also, users can insert any job information
they like into the log because of its liberal file permissions.
This campaign aims to modify the SEG / condor interaction so that per-job log
files can be used by GRAM2 and GRAM4, and old condor log files can be safely
removed. The goals are to have a less costly implementation of GRAM's
interfaces when used with condor.
- Develop algorithm for using multiple logfiles within the SEG framework which
is able to can safely recover from abnormal ends.
- Modify the condor LRM module and setup to write to per-job logs instead of a
- Modify SEG protocol to allow the Job Manager to signal to the SEG when
recover state is updated
- Modify condor SEG to implement the multiple logfile algorithm.
Algorithm description is at:
Patches implementing the algorithm:
*** Bug 5731 has been marked as a duplicate of this bug. ***
GRAM5 in 5.0.2 will process each job in a separate log file, see
http://jira.globus.org/browse/GRAM-130 for details.