Bug 5503 - UDP driver sendto() fails on Solaris
: UDP driver sendto() fails on Solaris
Status: RESOLVED FIXED
: XIO
Globus XIO
: unspecified
: PC Solaris
: P3 normal
: 4.0.7
Assigned To:
:
:
:
: 5920 6192
  Show dependency treegraph
 
Reported: 2007-08-27 18:43 by
Modified: 2008-07-18 14:35 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2007-08-27 18:43:35
When the udp driver is used unconnected, as it is in the usage stats sender
lib, the underlying sendto() fails with the error:
System error in sendto: Address family not supported by protocol family.

The end result of this is C usage packets don't get sent on solaris.
------- Comment #1 From 2007-08-28 14:39:59 -------
The problem is an ipv4 vs ipv6 compatibility problem.  By default it seems
Solaris prefers ipv6 for local sockets (I think based on the order of localhost
entries in /etc/inet/ipnodes), and the way the udp driver is written it assumes
the address family (ipv4/ipv6) matches what the remote address will resolve to. 

I'm not sure yet the best way to fix this for all cases, but for now I've
provided a patch to force the usage stats sender code to use IPV4, as it is
unlikely that there will be an IPV6 usage target.

Another way that should fix it is to change the order of the localhost entries
in /etc/inet/ipnodes so that the IPV4 address is first.
------- Comment #2 From 2007-08-28 14:42:30 -------
The patch is at http://www-unix.mcs.anl.gov/~mlink/bugs/5503_noipv6.patch
------- Comment #3 From 2007-08-28 14:51:23 -------
Is this a patch that can be applied to 4.0.5?
------- Comment #4 From 2007-08-28 14:58:24 -------
Yes. Probably anything >4.0.1.
source-trees/usage/c/sender/source/globus_usage.c
------- Comment #5 From 2007-08-28 15:30:14 -------
Does this mean that one could use the unpatched code and possibly fix the
problem just by changing the order of things in /etc/inet/ipnodes?  In other
words, would it be worth us asking the people who are having the problem to try
that as a workaround?

Also, the fix that you made seems to replace one bad assumption with another
bad assumption.  What if someone has a local usage stats listener services that
uses an ipv6 address?  Should we document that they can't do that, or do we
need some code, perhaps, to first find out what address family the listener
targets resolve as and then does the right thing when calling sendto()?
------- Comment #6 From 2007-08-28 15:42:51 -------
I don't know the specifics of Solaris administration, but based on google and
man pages I think the ipnodes config change would help.

Right re: bad assumptions, I only offered the patch as a preliminary fix -- I
don't even plan to commit it.  I've got some ideas for a permanent fix to the
UDP driver that should work, basically preferring ipv6 sockets and mapping any
ipv4 addresses that we wish to communicate with to to ipv6-mapped-ipv4
addresses -- I think this is the 'right' way to move forward, and why Solaris
returns an ipv6 local address by default.  I still need to do some more
research on how different OS's handle this before I can be sure.
------- Comment #7 From 2007-08-29 09:52:49 -------
Michael,
We were not able to get the order of the iptable to make any diffence, but the
your patch seem to do the trick.

root@maverick #  ./tcpdump -i skge0  host globus-usage.teragrid.org
tcpdump: listening on skge0
09:33:25.997176 maverick.tacc.utexas.edu.53475 > nereus.ncsa.teragrid.org.4810:
udp 228 (DF)
09:33:25.997261 maverick.tacc.utexas.edu.53474 > nereus.ncsa.teragrid.org.4810:
udp 219 (DF)
09:35:05.407342 maverick.tacc.utexas.edu.53483 > nereus.ncsa.teragrid.org.4810:
udp 228 (DF)
09:35:05.407770 maverick.tacc.utexas.edu.53482 > nereus.ncsa.teragrid.org.4810:
udp 219 (DF)
09:39:11.032102 maverick.tacc.utexas.edu.53499 > nereus.ncsa.teragrid.org.4810:
udp 227 (DF)
09:39:11.032310 maverick.tacc.utexas.edu.53498 > nereus.ncsa.teragrid.org.4810:
udp 218 (DF)
09:39:51.411866 maverick.tacc.utexas.edu.53501 > nereus.ncsa.teragrid.org.4810:
udp 228 (DF)
09:39:51.411889 maverick.tacc.utexas.edu.53500 > nereus.ncsa.teragrid.org.4810:
udp 219 (DF)
09:40:54.727253 maverick.tacc.utexas.edu.53507 > nereus.ncsa.teragrid.org.4810:
udp 228 (DF)
09:40:54.728561 maverick.tacc.utexas.edu.53506 > nereus.ncsa.teragrid.org.4810:
udp 219 (DF)
09:42:21.362474 maverick.tacc.utexas.edu.53508 > nereus.ncsa.teragrid.org.4810:
udp 219 (DF)

I asked the teragrid if the usage reports are being received from maverick and
will update the ticket with their response.
------- Comment #8 From 2007-08-29 11:17:35 -------
Jason Brechin (brechin@ncsa.uiuc.edu) replied with a pointer to
http://isl.ncsa.uiuc.edu:8080/networkGraph/lastgridftp.jsp
that is showing that maverick (solaris) is now updating the gridftp usage data
at globus-usage.teragrid.org.

Thanks Michael for the patch for Solaris!
------- Comment #9 From 2008-03-20 16:48:41 -------
http://www-unix.mcs.anl.gov/~mlink/bugs/globus_xio-0.37.tar.gz

Here is a update to the XIO udp driver to correctly address this problem.  The
plan is to include this in the 4.0.7 release, but I'll need to verify operation
on a few other platforms before I commit it.
------- Comment #10 From 2008-03-25 13:06:26 -------
Committed.
------- Comment #11 From 2008-04-02 12:20:31 -------
The fix for this caused a crash on hosts that default to ipv6 address lookups.

Get globus_xio-0.38.tar.gz from http://www.globus.org/toolkit/advisories.html
for that fix (available soon).
------- Comment #12 From 2008-06-03 13:49:39 -------
*** Bug 6105 has been marked as a duplicate of this bug. ***