Bugzilla – Bug 5820
Improve Condor Logfile Processing in GRAM
Last modified: 2010-07-06 09:35:05
You need to log in before you can comment on or make changes to this bug.
CAMPAIGN: Improve Condor Logfile Processing in GRAM Technologies: GRAM2, GRAM4, Condor LRM Description: The implementation of the LRM for Condor has problems when used in high-activity systems because all job state changes are stored in a single log file. The condor log files tend to get extremely large. GRAM2 parses the entire log file each time a job is polled, causing performance problems. GRAM4's SEG relies on the condor log to remain a stream, so it cannot be rotated safely by system administrators while the Globus Container is running. Also, users can insert any job information they like into the log because of its liberal file permissions. This campaign aims to modify the SEG / condor interaction so that per-job log files can be used by GRAM2 and GRAM4, and old condor log files can be safely removed. The goals are to have a less costly implementation of GRAM's interfaces when used with condor. Tasks: - Develop algorithm for using multiple logfiles within the SEG framework which is able to can safely recover from abnormal ends. - Modify the condor LRM module and setup to write to per-job logs instead of a common log - Modify SEG protocol to allow the Job Manager to signal to the SEG when recover state is updated - Modify condor SEG to implement the multiple logfile algorithm.
Algorithm description is at: http://www-unix.mcs.anl.gov/~bester/patches/5820-algorithm.txt
Patches implementing the algorithm: http://www-unix.mcs.anl.gov/~bester/patches/bug5820-diff.txt
*** Bug 5731 has been marked as a duplicate of this bug. ***
GRAM5 in 5.0.2 will process each job in a separate log file, see http://jira.globus.org/browse/GRAM-130 for details.