Bug 7012 - Better round-robin scheduling of multiple VMs per node
: Better round-robin scheduling of multiple VMs per node
Status: RESOLVED FIXED
: Nimbus
Workspace service
: unspecified
: PC All
: P3 enhancement
: 2.4
Assigned To:
: http://lists.globus.org/pipermail/wor...
:
:
: 7013
  Show dependency treegraph
 
Reported: 2010-05-04 10:02 by
Modified: 2010-06-08 19:55 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2010-05-04 10:02:05
When running multiple VMs per node, the workspace service scheduler doesn't
distribute VMs evenly across nodes.

When a VM is being matched to a slot, the behavior is:

    1. If there is an empty node that matches requested memory and network, use
it.
    2. Otherwise, pick the first node from the pool that matches the requested
memory and network associations.

Once there is a VM running on each node, this has an effect of piling VMs onto
a single node until it is "full" after which the scheduler moves onto the next
node, etc.

It would be better for networking and I/O in general if VMs were distributed in
a more round-robin fashion. Or at least if this were configurable.

See this code:

http://github.com/nimbusproject/nimbus/blob/master/service/service/java/source/src/org/globus/workspace/scheduler/defaults/ResourcepoolUtil.java#L88
------- Comment #1 From 2010-05-20 15:33:13 -------
Committed to master for 2.5

http://github.com/nimbusproject/nimbus/commit/f6738272220c0840b5539df0566dbf5732d17997

The default resource scheduler now operates with the notion of 'percentage
available' for each node in the VMM pool.  This is a percentage of the
memory previously allocated on the node and the still-available memory. This
allows the greedy and round-robin strategies to work better with pools that
have varying amounts of RAM on the VMMs.

The node selection can happen in one of two ways:

1. A "round-robin" configuration in resource-locator-ACTIVE.xml (this is the
default mode).  This looks for matching nodes (enough space to run, appropriate
network support, etc.) with the highest percentage of free space.  If there are
many equally free nodes it will pick randomly from those.  As should be clear,
this favors entirely empty nodes first.

2. A "greedy" configuration in resource-locator-ACTIVE.xml.  This looks for
matching nodes (enough space to run, appropriate network support, etc.) with
the lowest percentage of free space.  If there are many equally unfree nodes it
will pick randomly from those.
------- Comment #2 From 2010-06-08 19:55:11 -------
Confirmed this still works after Bug 7015 scheduler refactoring.