IPv6 Fallback

From GridPP Wiki
Jump to: navigation, search

IPv6 Fallback

Prompted by a networking incident in Edinburgh which saw the Tier2 site without IPv6 connectivity for ~1month there were some discussions had about IPv6 fallback and what is correctly expected.

IPv6 is in general being preferred by HEP related software services and we would expect software from a dual-stacked host to attempt to connect to a resource via IPv6 first if a AAAA record exists for the host.


Advice when speaking with networking teams

As a lot of user-land and browser based tools make use of the 'happy eyeballs' RFC which can lead in part to the false impression that IPv6 isn't important.

It is recommended to be explicit that when IPv6 breaks that this impacts a sites functionality in some way.

IPv6 breaking is sometimes viewed as lower priority than it should be, in part because user-land software _should_ fallback to IPv4 when this happens. This can cause a delay in the problem being addressed and this delay can leas to the knowledge of how it was impacted to become lost, further slowing down finding a fix.

From a few conversations I've heard IPv6 is being viewed as important for inreasing the availability and accessibility of national computing resources so should be viewed as more than 'just an IPv4 fallback'.


Expected Fallback Behaviour

In the case of IPv6 being broken it is hoped that clients should by default fall back to IPv4 transparently and that technically this shouldn't cause a disruption to normal operations.

Ideally a broken IPv6 config would be highlighted ASAP and an expert would be asked to look into the situation and advise.


Do all tests from dual-stacked clients detect IPv6 being broken?

Should there be dedicated IPvX tests to compare between IPv4/dual-stacked/IPv6 in-case there is a problem with the testing methodology?


Un-Expected/Bad Fallback Behaviour

Happy Eyeballs

It's generally agreeed that the 'Happy Eyeballs' RFC is bad behaviour in terms of cmd-line tools as in principle there are 'no eyes to satisfy' and it creates a lot of ambiguity in trying to work out why a dual-stacked host might have dropped back to IPv4.

Curl by default attempts to use happy eyeballs on the cmdline which causes confusion unless close attention is paid.


Long connection delays

If an IPv6 route cannot be found to a host, i.e. a problem, or the host does not have IPv6 configured, the client should expect to receive an ICMP packet telling it that the host os unreachable. If this packet is not returned the client will wait until there is a TCP timeout trying to connect to the remote host. This can be as high as 5min, but is typically around 30s.

In the case of ssh and xrdcp if the ICMP packets are being dropped/filtered the clients will take 90s to establish a connection which can appear the same as the client hanging.


Any other strange behaviours when connecting to badly behaved dual-stacked host

Please add here...


Likely Causes of Un-Expected behaviour

Dropped/Filtered ICMP packets

In the case of the ICMP packets being dropped it's difficult to determine that the IPv6 connection isn't working because the host is down or that it isn't properly configured.


Bad gateway/route/upstream-router

In the case of a bad gateway or upstream config, if a host is configured to be dual-stacked it will struggle to connect to other dual-stacked resources such as yum package repos.

The best fix here is to temporarily remove the IPv6 ip config from the impacted server if external connectivity is needed for things such as installing debugging/expert tools.


Any other likely causes of IPv6 being broken?

Please add


Mitigations/Work-arounds

Specify IPv4 or IPv6 explicitly

With ssh, wget and curl when connecting to a dual-stacked host from a dual-stacked client you can use the `-4` flag or the `-6` flag to connect using a specific IP protocol.

i.e.
ssh -4 -p 2222 mybadipv6host.uni.ac.uk

For XRootD based tools the XRootD behaviour can be changed by the `XRD_PREFERIPV4` boolean environmental flag.

i.e.
export XRD_PREFERIPV4=1; xrdcp root://mybadipv6host.uni.ac.uk//some/file .
will prefer IPv4 first before falling back to IPv6


Stop Curl Happy eyeballs causing agressive fallback(s)

Curl attempts to use "happy eyeballs" which means that if IPv6 is slow IPv4 is preferred. This can be made to fallback to a more sensible (strongly prefer v6 mode) with the command line option `--happy-eyeballs-timeout-ms`.

i.e.
curl --happy-eyeballs-timeout-ms 15000 https://myslowipv6host.uni.ac.uk/some/file

This should make curl prefer ipv6 even if it's slower as this gives the IPv6 connection attempt 15s to respond.


Any other tools/workarounds

...