Posting issues Last day or Two?

I cannot replicate the issue with FF v 94.0.1
I did a little bit of sleuthing with Firefox 94.0.1 (windows) TLS logging and wireshark, but I've also seen issues on Firefox 94.1.2 on Android. This seems to be a problem with TLS 1.3 0-rtt data and Firefox. If you set security.tls.enable_0rtt_data in about:config to false, it looks like it resolves the issue in my testing.

From what I'm seeing, if you do anything that tries to post to hardforum.com when there's no active connection, Firefox will open a new connection with TLS 1.3 and some 0-RTT http/2 stuff, and then kind of sit around until the server tells it to go away, and only then does it try to send the POST; that obviously doesn't work because the server told them to go away already. I'm pretty versed in TLS 1.3, I don't know very much about http/2, but this looks like Firefox doing something dumb. I didn't find anything like this in the mozilla bugzilla, so I'm guessing nobody else has figured this out yet, and I'm on the hook to write up a bug report; joy.

If the hardforums admins wanted to "fix" this, I'd guess they could turn off TLS 1.3 0-rtt (aka early data), but that would need to be done for all browsers because it's somewhere between very hard and impossible to determine browser identity in the TLS handshake, and that's where it needs to be done, and I'm guessing they turned it on for a reason; 0-rtt should help performance a bit and AFAIK, it's not a default setting.
RoldanLT
 
Well, lets see.

I just typed this message into the box as slowly as I could make myself do it (as often quick replies would work just fine).

If this message posts, I'd say that this solved it for me.

Edit: Apparent success.
 
I was seeing it in the Firefox debugger network tab with drafts. As long as I didn't have a working connection open (server timeout seems to about 60 seconds), writing something and putting something in the text box would trigger a POST to save the draft, and that would usually fail.

Looks good now; I'll update when I get a bug report in and if I get a good response.
 
I have it turned off now.
Thanks a lot for the intensive testing.

Hopefully it will resolve the FF issue now.

Just because I am curious, is there something TLS 1.3 does that the forum overall loses out on with it disabled?

I mean, I am of course appreciative of a fix, but I'm always of the mindset of "security first". :p
 

I clicked through to RFC 8470 and read this:

TLS 1.3 [TLS13] introduces the concept of early data (also known as
zero round-trip time (0-RTT) data). If the client has spoken to the
same server recently, early data allows a client to send data to a
server in the first round trip of a connection, without waiting for
the TLS handshake to complete.

When used with HTTP [HTTP], early data allows clients to send
requests immediately, thus avoiding the one or two round-trip delays
needed for the TLS handshake. This is a significant performance
enhancement; however, it has significant limitations.

The primary risk of using early data is that an attacker might
capture and replay the request(s) it contains. TLS [TLS13] describes
techniques that can be used to reduce the likelihood that an attacker
can successfully replay a request, but these techniques can be
difficult to deploy and still leave some possibility of a successful
attack.

Note that this is different from automated or user-initiated retries;
replays are initiated by an attacker without the awareness of the
client.

So, it looks like disabling early data / rtt data is a performance thing. Setting it to off might add a tiny bit of latency due to added round trips, but it is probably only noticeable for people with very high latency connections. (Do we still have any dialup users?)

On the flip side it looks like when it is on there is a small risk of "replay attacks", so it might not be a bad idea to keep it off either way, as long as the added latency doesn't prove problematic for people.

That is - unless - I have misunderstood this topic (which is very possible)

This is not exactly my domain.
 
You are much better at this stuff than I. I'm glad you took a look.

This is interesting. Yeah, it definitely looks like something that would be difficult for the hardforums to fix on their own. I'm hoping a bug report helps getting it tackled.

Thank you for doing that!
This hit the right combination of strange issue, in my comfort zone, and pisses me off. That's a recipe for me to figure out what's broken ;) Also, the tcpdump looked weird, so I wanted to see it with real data; but the weirdness was mostly because it has TCP Keep-Alives and I always forget what they look like.

Just because I am curious, is there something TLS 1.3 does that the forum overall loses out on with it disabled?

(you posted a pretty good summary after I already typed this, but since I already wrote it, here it is anyway)

The TLDR on TLS 1.3 early data or 0-rtt (interchangable names for the same thing) is that when the client (Firefox) reconnects to a server it's connected to before, it can send an http request immediately after it sends the TLS handshake, without waiting for the server response to the handshake. This is an enhancement to TLS session resumption, and means there's no additional round trips vs plaintext http on successful resumption. The tradeoff is that depending on your server setup, the same TCP data can be sent again and the server may process it again, so you're only supposed to use early data for requests that are OK to run multiple times; that is usually interpreted as GET is OK, and POST isn't.

The long and short of it is, if your connection to the server times out (at 60 seconds), and you load another forum post, you should get that back a little faster with early data (one round trip time between your client and the server; that can be significant for dial-up or users across the world or on congested mobile or wifi networks), but it shouldn't make a difference for posting because browsers shouldn't be sending POST over early data.
 
OK, well I wrote up a bug, so we'll see what happens. It looks like the details are a little bit different than what I suggested earlier. It seems like it's not entirely stuck, but at least on 94.0.1, in the right conditions, Firefox will wait almost 60 seconds after it sends its initial data to send a http/2 ping, at which point it notices the handshake finished and sends the POST. But the hardforums server seems to have a connection timeout of 60 seconds, so it's a race to see if the POST comes before the server decides to shut down the connection. If the request gets there in time, all is good (except for the response time) and if Firefox had a connection open already, it would use that (so reload and submit works too).

I tested on Firefox Nightly, and that seems to have the same behavior, but it only waits about 4 seconds before it gets going (I didn't look at packet contents, just timing for Nightly; so I'm not sure it's exactly the same problem, but it's a pain to get the session keys to get everything into wireshark). This reduced wait time wouldn't have resulted in service problems with the hardforum config, just slight delays that would be hard to measure and verify and maybe not worth it.

Hopefully, Firefox developers will fix the underlying issue, and there should be no delay vs with 0rtt disabled.
 
OK, well I wrote up a bug, so we'll see what happens. It looks like the details are a little bit different than what I suggested earlier. It seems like it's not entirely stuck, but at least on 94.0.1, in the right conditions, Firefox will wait almost 60 seconds after it sends its initial data to send a http/2 ping, at which point it notices the handshake finished and sends the POST. But the hardforums server seems to have a connection timeout of 60 seconds, so it's a race to see if the POST comes before the server decides to shut down the connection. If the request gets there in time, all is good (except for the response time) and if Firefox had a connection open already, it would use that (so reload and submit works too).

I tested on Firefox Nightly, and that seems to have the same behavior, but it only waits about 4 seconds before it gets going (I didn't look at packet contents, just timing for Nightly; so I'm not sure it's exactly the same problem, but it's a pain to get the session keys to get everything into wireshark). This reduced wait time wouldn't have resulted in service problems with the hardforum config, just slight delays that would be hard to measure and verify and maybe not worth it.

Hopefully, Firefox developers will fix the underlying issue, and there should be no delay vs with 0rtt disabled.

Awesome.

I think I found your report.

The bot seems to have moved it to Audio/Video playback for some reason ???

Hopefully it doesn't get lost in the wrong place.
 
Minor update, it looks like the relevant nginx timeout isn't the keepalive timeout, but the client_header_timeout, which defaults to 60 seconds. That's how long nginx will wait for a client to send the headers for the first request. Interestingly, setting this really low, when Firefox is in la-la land, nginx sends out the FIN to say it's not sending any more data, Firefox gets that, and ignores it until it decides to ping the connection later, too. :banghead: Thankfully, Firefox takes pains to set TCP keepalives on, and to a unsually short value (20 seconds) so that the OS can maybe tell if the connection is dead, but Firefox doesn't care.

This is the hard thing about looking at tcpdumps... anytime you open that up, you're going to find something dumb, and it might distract you from the task at hand.
 
Things are back to normal for me as well (Firefox 94.0.1). Liking and Posting are functioning as usual, which was always very quick. Thanks to those for tracking down the issue. It was weird that the problem (at least for me) was exactly 60 seconds and only seemed to be here.

Again thanks.
 
Things are back to normal for me as well (Firefox 94.0.1). Liking and Posting are functioning as usual, which was always very quick. Thanks to those for tracking down the issue. It was weird that the problem (at least for me) was exactly 60 seconds and only seemed to be here.

Again thanks.
The 60 seconds (logging seemed to indicate 57.2 seconds?) seems to be a timer in Firefox somewhere before it checks on progress on this connection; a mozilla person poked at the bug today (asking a specific other person to look at it) so hopefully there will be some enlightenment from that. I don't mind poking at networking, but I don't really want to dig into their code to understand the why of that timer. Of course, the server side also has a timer to close connections that don't do anything meaningful in 60 seconds, so that interacts poorly.

I think this only showed up here because TLS 1.3 Early Data (aka 0-RTT) is relatively new, and requires server configuration to activate, and most browsers aren't using it; I couldn't get Chrome or Edge to send any requests in early data, although I didn't look at wireshark to check if they do any of the http/2 setup work in early data. This seems to be tied to TLS 1.3 early data, HTTP/2 and POST on a new connection. If you're missing any one of the three, I don't think you'll have this problem.
 
Just wanted to add another voice to this issue. It happens elsewhere but the most noticeable for me is in GenMay on the Babe thread. I click like button and it goes to hour glass mode - but the funny part is if I scroll down further to another post and also click like, it may work perfectly fine (no lag at all). Really weird.

This is on Win 10 and 11 but both in latest version of Firefox (since that's what both boxes are using).
 
Just wanted to add another voice to this issue. It happens elsewhere but the most noticeable for me is in GenMay on the Babe thread. I click like button and it goes to hour glass mode - but the funny part is if I scroll down further to another post and also click like, it may work perfectly fine (no lag at all). Really weird.

This is on Win 10 and 11 but both in latest version of Firefox (since that's what both boxes are using).

If you are still having this problem now, it is unrelated and something else is going on.

RoldanLT disabled the feature that was causing this issue on the server, and it has fixed it.

I have it turned off now.
Thanks a lot for the intensive testing.

Hopefully it will resolve the FF issue now.
 
FWIW, the fix is in. I just spent a few minutes testing with Firefox Nightly after they made a patch and can no longer reproduce the issue. :)

Looking at their documentation site, Firefox 96 (which is currently Nightly) is set to be released January 11, and I don't know if they're going to backport this fix to Firefox 94 (current) or 95 (beta). Based on the current state, I would say the server side of early data probably shouldn't get turned back on until most Firefox users are on 96; looking at their user activity stats, it looks like releases get to 70%+ about four weeks later, so maybe mid-February would be a good time to turn it back on. Happy Valentines Day?
 
Back
Top