Hunting good bugs is something every good software developer should experience. A good interview question is "tell me your favorite bug". Bugs are about reasoning, not intelligence. And I will take someone who can tell me what is wrong over what is correct any day. It requires a focus on getting things actually work.
I have two favorite bug stoies. The first is from a printout from the run of an IBM 360 assembly language program when I was just learning. Someone asked em why their program failed to run. I glanced quickly at the front page of the printout and it said "Too Long". So I told the person that was the problem. Something was too long. He looked at me very strangely, so I looked back at the page a little more closely, only to notice "Too Long" was in the name field of the person running the program. He was Vietnamese and his name was Too Long - literally. There is a powerful lesson (at least one) there.
The other happened when I was implementing some AppleTalk protocols - NBP to be exact. (Don't ask). I would capture the working packets then compare all the checksums, headers, constants, length fields in the packet my code generated and fix any problems. I was stuck on one failure. I just could not see any difference as I went through byte by byte, time after time. It was late and time to go home so I decided to print off each packet on paper and compare them later - certain I was missing something. The problem was instantly obvious. One printout took a page, the two pages. I had been appending junk data in the packet. Sigh
Amazing debugging, I loved reading that. HN doesn't get enough good posts like this anymore :)
If https://github.com/pion/sctp/issues/12 had happened (not just in Pion but across all implementations) this could have been fixed years ago. The hardcoding we all settle for is tragic.
Author here, thank you, that means a lot coming from you. Pion was the prior art I pointed the webrtc-rs maintainers at. And pion/sctp#12 is super relevant. A known, proposed fix years before we hit it.
"The hardcoding we all settle for" might be the epigraph for the whole incident. webrtc-rs invited a PR for the configurable-MTU + better default half [webrtc-rs/webrtc#806] to unblock folks today. Whether PMTUD gets implemented will be interesting to see.
I'm having flashbacks to 1990s-era PPPoE, where the slightly smaller MTU had issues with some server OS's that had TCP/IP stacks that didn't support or ignored MTUs smaller than 1500 bytes and bulk data transfers would get messed up. I don't remember which ones, but it was some commercial UNIX.
I don't understand how a product as popular as Tailscale can get this far while dropping certain ordinary types of packets.
It is impossible to parse the UDP or TCP port number out of a fragment. This is surely the reason the ACL module entirely rejects them. TCP will adjust it's segment size based on PMTUD so as to not require fragmentation. This is why it hasn't been noticed so far. But fragmented UDP packets are a corner case of normal behavior and it boggles the mind that someone could just decide to completely drop them.
UDP fragment filtering could be implemented by a global fragments on/off setting (works for "allow everything" = fragments on, cautious = fragments off) or by blocking the first fragment which includes the port number (and blocking it if the port number is split across fragments which I think is technically allowed but completely abnormal).
> I don't understand how a product as popular as Tailscale can get this far while dropping certain ordinary types of packets.
I’d venture to guess based on this outcome that fragmented UDP over IPv6 isn’t really an ordinary occurrence. Given the preponderance of HTTPS traffic, the aversion to fragmentation in IPv6, and the weird corner case of there being a hardcoded packet size in webrtc, it’s reasonable to assume that this is a corner case.
Would agree it's uncommon in general traffic. Rare conditions [webrtc-rs, 1280 class tunnel / tailscale, and ipv6 pair] but deadly when they are met since every connection silently fails. That's what made it worth chasing down for 2 weeks [and good for sleuthing :)].
Agreed. The port-number point is the most plausible rationale I've heard, more convincing than the RFC line in their source comment. The historical fix for "can't classify fragments" was virtual reassembly or flow tracking [conntrack on linux, scrub in pf], so dropping them outright punts past known prior approaches. Even your lighter idea would have saved us: a first-fragment match would have let our pair through.
We've reported upstream to both projects, tailscale/tailscale#20083 and webrtc-rs/webrtc#806, and webrtc-rs already invited a PR.
Ah yes, the horrible anti-feature of IP fragmentation strikes again.
Pair it with the anti-solution of dropping large packets instead of truncating them and we get our perfect storm of bad design that is MTU incompatibility and modern MTU discovery.
Another fun happy iOS story: we were launching our app a year ago, with a self-imposed deadline. As usual, tons of bugs were being fixed in the last moment.
And then our authentication stopped working on simulated iOS devices (while still working on the real devices!). After hours of frantic debugging and staring at Wireshark dumps, I found the issue: HTTP3 and QUIC. Apparently, the simulated stack was not tracking the MTU correctly and was trying to send 1506-byte UDP packets.
The "fix" was to add deny rules for UDP ports 80/443 to our firewall.
This started as a blank page on one device and ended two weeks later at the intersection of two bugs: webrtc-rs hardcodes INITIAL_MTU=1228 [never updated, no path probing, retransmits at the same size forever], and Tailscale's packet filter classifies any IPv6 packet with a Fragment header as unknown protocol, so the default deny fires. On every platform, counted under reason="acl". Neither is unreasonable alone.
Together: silent wedge, every health check green, because everything that tests the path is small and only the payload fragments. Two-command repro on any tailnet: ping -s 100 works, ping -s 1400 over the Tailscale IPv6 address is 100% loss. Full WebRTC repro and captures: https://github.com/phact/mtu-webrtc-bug. We've reported upstream to both projects https://github.com/tailscale/tailscale/issues/20083 and https://github.com/webrtc-rs/webrtc/issues/806. Happy to answer questions. Especially interested if anyone knows the history behind the IPv6 fragment decision in Tailscale's filter.
Yes! that's the "other reconstruct" I mention on the post. maxMessageSize at least appears in SDP and getStats. We ended up patching both at our client to be safe [800 bytes and 16kb respectively].
18 comments:
Hunting good bugs is something every good software developer should experience. A good interview question is "tell me your favorite bug". Bugs are about reasoning, not intelligence. And I will take someone who can tell me what is wrong over what is correct any day. It requires a focus on getting things actually work.
I have two favorite bug stoies. The first is from a printout from the run of an IBM 360 assembly language program when I was just learning. Someone asked em why their program failed to run. I glanced quickly at the front page of the printout and it said "Too Long". So I told the person that was the problem. Something was too long. He looked at me very strangely, so I looked back at the page a little more closely, only to notice "Too Long" was in the name field of the person running the program. He was Vietnamese and his name was Too Long - literally. There is a powerful lesson (at least one) there.
The other happened when I was implementing some AppleTalk protocols - NBP to be exact. (Don't ask). I would capture the working packets then compare all the checksums, headers, constants, length fields in the packet my code generated and fix any problems. I was stuck on one failure. I just could not see any difference as I went through byte by byte, time after time. It was late and time to go home so I decided to print off each packet on paper and compare them later - certain I was missing something. The problem was instantly obvious. One printout took a page, the two pages. I had been appending junk data in the packet. Sigh
Amazing debugging, I loved reading that. HN doesn't get enough good posts like this anymore :)
If https://github.com/pion/sctp/issues/12 had happened (not just in Pion but across all implementations) this could have been fixed years ago. The hardcoding we all settle for is tragic.
Author here, thank you, that means a lot coming from you. Pion was the prior art I pointed the webrtc-rs maintainers at. And pion/sctp#12 is super relevant. A known, proposed fix years before we hit it.
"The hardcoding we all settle for" might be the epigraph for the whole incident. webrtc-rs invited a PR for the configurable-MTU + better default half [webrtc-rs/webrtc#806] to unblock folks today. Whether PMTUD gets implemented will be interesting to see.
MTU black holes are the worst because every health check is small enough to survive.
I'm having flashbacks to 1990s-era PPPoE, where the slightly smaller MTU had issues with some server OS's that had TCP/IP stacks that didn't support or ignored MTUs smaller than 1500 bytes and bulk data transfers would get messed up. I don't remember which ones, but it was some commercial UNIX.
Weren't T1s running 576 MTU back then?
I don't understand how a product as popular as Tailscale can get this far while dropping certain ordinary types of packets.
It is impossible to parse the UDP or TCP port number out of a fragment. This is surely the reason the ACL module entirely rejects them. TCP will adjust it's segment size based on PMTUD so as to not require fragmentation. This is why it hasn't been noticed so far. But fragmented UDP packets are a corner case of normal behavior and it boggles the mind that someone could just decide to completely drop them.
UDP fragment filtering could be implemented by a global fragments on/off setting (works for "allow everything" = fragments on, cautious = fragments off) or by blocking the first fragment which includes the port number (and blocking it if the port number is split across fragments which I think is technically allowed but completely abnormal).
> I don't understand how a product as popular as Tailscale can get this far while dropping certain ordinary types of packets.
I’d venture to guess based on this outcome that fragmented UDP over IPv6 isn’t really an ordinary occurrence. Given the preponderance of HTTPS traffic, the aversion to fragmentation in IPv6, and the weird corner case of there being a hardcoded packet size in webrtc, it’s reasonable to assume that this is a corner case.
A good one to be aware of, but not common.
Would agree it's uncommon in general traffic. Rare conditions [webrtc-rs, 1280 class tunnel / tailscale, and ipv6 pair] but deadly when they are met since every connection silently fails. That's what made it worth chasing down for 2 weeks [and good for sleuthing :)].
Author here,
Agreed. The port-number point is the most plausible rationale I've heard, more convincing than the RFC line in their source comment. The historical fix for "can't classify fragments" was virtual reassembly or flow tracking [conntrack on linux, scrub in pf], so dropping them outright punts past known prior approaches. Even your lighter idea would have saved us: a first-fragment match would have let our pair through.
We've reported upstream to both projects, tailscale/tailscale#20083 and webrtc-rs/webrtc#806, and webrtc-rs already invited a PR.
You are shadowbanned.
One day I hope to work on problems like this. Fantastic article.
Ah yes, the horrible anti-feature of IP fragmentation strikes again.
Pair it with the anti-solution of dropping large packets instead of truncating them and we get our perfect storm of bad design that is MTU incompatibility and modern MTU discovery.
Another fun happy iOS story: we were launching our app a year ago, with a self-imposed deadline. As usual, tons of bugs were being fixed in the last moment.
And then our authentication stopped working on simulated iOS devices (while still working on the real devices!). After hours of frantic debugging and staring at Wireshark dumps, I found the issue: HTTP3 and QUIC. Apparently, the simulated stack was not tracking the MTU correctly and was trying to send 1506-byte UDP packets.
The "fix" was to add deny rules for UDP ports 80/443 to our firewall.
Author here.
This started as a blank page on one device and ended two weeks later at the intersection of two bugs: webrtc-rs hardcodes INITIAL_MTU=1228 [never updated, no path probing, retransmits at the same size forever], and Tailscale's packet filter classifies any IPv6 packet with a Fragment header as unknown protocol, so the default deny fires. On every platform, counted under reason="acl". Neither is unreasonable alone. Together: silent wedge, every health check green, because everything that tests the path is small and only the payload fragments. Two-command repro on any tailnet: ping -s 100 works, ping -s 1400 over the Tailscale IPv6 address is 100% loss. Full WebRTC repro and captures: https://github.com/phact/mtu-webrtc-bug. We've reported upstream to both projects https://github.com/tailscale/tailscale/issues/20083 and https://github.com/webrtc-rs/webrtc/issues/806. Happy to answer questions. Especially interested if anyone knows the history behind the IPv6 fragment decision in Tailscale's filter.
just wait till you try to send a data packet in webrtc that's too large in the browser. https://stackoverflow.com/questions/35381237/webrtc-data-cha...
last I checked, all browsers silently fail if it's too big.
This should be fixed!
I added this in Pion here[0] and I remember testing against Chrome + FireFox and it seemed to work great!
[0] https://github.com/pion/webrtc/commit/e4ff415b2bff31382bdb80...
Yes! that's the "other reconstruct" I mention on the post. maxMessageSize at least appears in SDP and getStats. We ended up patching both at our client to be safe [800 bytes and 16kb respectively].