Blocking SYN-Flood Attacks on macOS?

I did see that. Unfortunately I don't have any experience with pf and especially not with synproxy. I also couldn't find anything about synproxy and mac os on the interwebs; not even like stackexchange posts about it not working with no solutions (like i saw for pf and divert). That said... I did see this post about it not working on NetBSD in a similar way. The described symptoms sound similar to your experienced symptoms, so I wonder if their workaround will work, too. They've suggested adding a second loop back interface with a new RFC1918 address, but I don't see why you couldn't just redirect to 127.0.0.1, or add a RFC1918 alias to lo0. So your rule might be pass in quick proto tcp from any to X.98 port { 80 443 } flags S/SA rdr-to 127.0.0.1 synproxy state Or whatever other IP you add. If you're doing IP based virtual hosts, you'd need to tweak your apache config a smidge to manage, but not too bad.
 
So I just added "rdr-to 127.0.0.1" to my web/mail port rule on my dummy server and just like that, it appears to be working. I can't really tell if the proxy is working because its a legit connection when I load a dummy web page, so if it's proxied, it's only proxied for a ms. If only I knew how to execute a syn flood so I could do it locally and see if the proxied ports pile up.

Also, before I enable this on my real server, can you -if you know- explain why this is working? I want to make sure I fully understand what is going on.

Also, is rdr-to the same thing you'd use to forward connections from a server, to a VM running on that server, and then back to the server again? Well I guess not exactly because the redirect BACK from the VM to the real server would have to be also redirected to a custom port number otherwise it would get stuck in an infinite loop. If I had more free time, I'd play around with some infinite loops like that :p

Hopefully I don't need the VM approach, hopefully a working synproxy will be sufficient.
Also my data centered offered me free access to their big ddos mitigating service, so I can fall back on that if I can't eventually get a working solution done on my own.
 
So I just added "rdr-to 127.0.0.1" to my web/mail port rule on my dummy server and just like that, it appears to be working. I can't really tell if the proxy is working because its a legit connection when I load a dummy web page, so if it's proxied, it's only proxied for a ms. If only I knew how to execute a syn flood so I could do it locally and see if the proxied ports pile up.

Also, before I enable this on my real server, can you -if you know- explain why this is working? I want to make sure I fully understand what is going on.

Also, is rdr-to the same thing you'd use to forward connections from a server, to a VM running on that server, and then back to the server again? Well I guess not exactly because the redirect BACK from the VM to the real server would have to be also redirected to a custom port number otherwise it would get stuck in an infinite loop. If I had more free time, I'd play around with some infinite loops like that :p

Hopefully I don't need the VM approach, hopefully a working synproxy will be sufficient.
Also my data centered offered me free access to their big ddos mitigating service, so I can fall back on that if I can't eventually get a working solution done on my own.

I don't know why this is working; I saw it on that mailing list post, and figured, if it works there, it'll work for you too. :) If I had to guess, it's probably something like the response packets aren't getting properly mangled, but redir-to helps them get flagged for mangling. I'm not going to dig into the firewall code and figure it out; I'm retired :p

rdr-to is "destination NAT", what people would normally use to forward a port from a firewall machine to a machine inside with only an RFC1918 address. You could use this for the VM, but I'd recommend route-to instead; it's a bit more work to setup, but it forwards the packets as-is, and doesn't need state tracking on the forwarder. (no sense tracking state in both places)

Incidentally, and unrelated, forwarding this way with a load balancer is called 'direct server return', as the servers response packets don't have to go back through the load balancer, which enables a simpler load balancer to handle way more connections. Of course, if the destination server is a VM on the host, packets are going to have to go back through the host.
 
So i'm back trying to get synproxy running on my dummy server.

There may have been some unrelated problems that caused previous attempts to fail, so they weren't necessarily reliable tests.

So now I'm back, trying to enable it with the following rule:

Code:
pass in quick proto tcp from en0 port { 25 80 443 587 993 } rdr-to 127.0.0.1 synproxy state

The rdr-to 127.0.0.1 is preventing the rules from being loaded. When I try to load the rule, it gives me a syntax error. If I delete that one portion, it loads the rules file. I'm not sure if I was doing something different when I was trying this a few weeks ago or if I just didn't notice.
 
Hmmm... I found a FreeBSD forum post that looks similar. I pulled config examples from OpenBSD, but it looks like it's a newer syntax, and you'll probably have to use older syntax like they said for FreeBSD.

pf is definitely not my strong suit. I'm pretty sure you should be able to adjust rules while the system is running, and I certainly have experience doing that with ipfw, but ipfw has rule numbers that make it a lot easier to turn rules on or off etc (although, I had a lot of issues with crashing while tweaking ipfw rules on FreeBSD <= 9 systems under load... stopped happening when we upgraded, but never figured out what change fixed it :()
 
So some good news. I have a script up and running that resets my TCP stack if the server is offline for more than 5 minutes. It has been working very well, I haven't had to do any hard server reboots. Even during severe attacks, once the server gets knocked offline, after 5 minutes, it shuts down all ethernet interfaces for 10 seconds, then brings everything back on.

But synproxy would still be helpful here. I was poking around with that some more today but had no luck. It seems like the completion of the handshake proxy is getting hung up in the firewall somehow. So I tried commenting out the "Block in all" rule, and turning on synproxy. But that still didn't work. The synproxy counter still goes up when you try to load web pages through it, but still none of the pages actually load. I'm still convinced theres a way to make it work, I just need to figure out the secret :D

Strangely, not only can I not find any examples of people successfully using this specific feature on their Macs, I can't find anyone using this features in `pf` on any system. I think there's some specific way you have to do it that is missing from the docs. But at least while I keep searching, the server is still online. I had some bad days where the server was offline for most of the day. Now it's generally no more than 10 minutes, and usually less than that, then it resets itself and is back up and ready to go.
 
The easy SA answer would have been to subscribe any and all servers to a properly maintained blacklist to auto filter out malicious IPs.
Block all ports you don't actively use to catch new IPs.

Get a firewall or sec appliance in front of the mac.
You are running in a colo not your house.
Any pressure on compute resources, even if successful in filtering, is considered a successful attack bc it's eating time and compute to deal with.

I used to contract for an ISP that rented colo space to companies, I know if 1 client that has a pair of older Mac Pros that serve documents for North America that had to go this route bc they lacked in house engineering to maintain their presence.

I can see a use case for them to migrate to AWS since publishing static content in 2020 shouldn't require server resources at this point for anything but an origin to feed a CDN.
 
Strangely, not only can I not find any examples of people successfully using this specific feature on their Macs, I can't find anyone using this features in `pf` on any system. I think there's some specific way you have to do it that is missing from the docs. But at least while I keep searching, the server is still online. I had some bad days where the server was offline for most of the day. Now it's generally no more than 10 minutes, and usually less than that, then it resets itself and is back up and ready to go.

From what I saw in the docs and mailing lists, it seems like synproxy was expected to be used by a firewall machine in between the internet and the server machine; people weren't getting it to work on the same machine, that I could see. If you have time to fiddle around with it, I'd try to get tcpdumps of the syn received and syn + ack sent, and any response (in a firewall machine in the middle case, you'd be able to see also the modified syn and the syn + ack sent to that, as well as if the syn + ack was modified, but alas). If the syn + ack doesn't match the syn, that's a smoking gun. Be sure to use -S on the tcpdump so it doesn't rewrite sequence numbers.

The easy SA answer would have been to subscribe any and all servers to a properly maintained blacklist to auto filter out malicious IPs.
Block all ports you don't actively use to catch new IPs.

blocking IPs doesn't work against synfloods, because most synflooders are able to spoof source addresses. The only way that could work is a whitelist approach, if the whitelist is pretty small (like only allow from corp IPs or something).

Get a firewall or sec appliance in front of the mac.
You are running in a colo not your house.
Any pressure on compute resources, even if successful in filtering, is considered a successful attack bc it's eating time and compute to deal with.

He tried using the colo's security appliance, and it didn't really work. Sticking a decent OS in front of the mac would probably work, but comes at a cost of hardware and rack space. Assuming there's a good reason to run on mac (maybe some automation that uses software only for mac), and there's no budget for a firewall, going from a 30 second synflood takes the box offline until someone gets around to rebooting it to 5 minutes later, it fixes itself is a win. Whatever the service is, it doesn't seem to need super high availability, so this is fine. Some tuning could probably cut the time to resolve down, but there's also an escalation game with whoever is flooding the thing; if you get better at handling it, they get better at flooding you, and there's no way mac os is going to win at DDoS escalation, so taking a smoke break instantly is a good way to make it unfun for the flooders. In a mixed customer colo, it could just be some idiot going sequentially through all the IPs because someone they don't like is in the same neighborhood; who knows.
 
Whitelist only is a lot like treating your public subnet like they're private, but they're still exposed.

I'm not sure what appliances are available in this specific cade, but heuristic scanning kicking off an automated block of attack I take for granted. I'm used to the colo itself filtering the traffic and sending nasty emails to owners of that cage trying to upsell services.
 
Last edited:
Things are getting weirder.
I was playing around with my dummy server and found out that the `ifconfig en0 down` command wasn't quite working. It was bringing the interface down but the interface would pop back up after a few seconds on it's own, not waiting for my own up command. Turns out the system was restoring itself to the settings configured in system prefs. So instead of taking down the interfaces that way, I have to use apple's own tools to take it down and keep it down:

networksetup -setnetworkserviceenabled Ethernet_1 on

So that works by using on/off. I guess. But now I'm starting to get incidents where my server suddenly goes down like before. But my script doesn't restore it. When I reboot it remotely and log in, there are no SYN_RCVD packets, not even at the start of the incident. Furthermore, my script did in-fact toggle the network off and on several times, but it didn't do the trick. This is very frustrating because it's so hard to tell what is actually going on.

So I'm not sure if this is a different kind of outage that my old script was still able to reset and mitigate, or if this is a totally different situation. These things happen infrequently enough that it's really hard to troubleshoot. I might go a week without an issue. Then have a bunch of issues in short succession.
 
Intermittent issues are almost worse than hourly incidents. ugh.

If you have the storage space, I'd really try to get some form of continuous tcpdump. If you have enough space to do something like tcpdump -s 40 -G 60 -w /tmp/tcpdump%H%M.pcap goes a long way to seeing what's going on. If you are sure you can catch the system within an hour, you can do -w tcpdump%M.pcap and only keep 60 files. -s 40 isn't always enough, you can experiment a bit, in case you get a lot of important tcp options you want to check out.

Of course, there's a danger. Once you spend enough time looking at wireshark, you start looking at all problems like network problems, and you get drawn into posts about syn floods :)p) and you get an ISP tech to come out on Dec 30th because going to http://icmpcheckv6.popcount.org/ resets the DSL modem (every time :banghead:; thankfully i't s a modern Linux kernel 2.6.30, so it certainly doesn't have any bugs or anything like that).
 
It would be really hard trying to catch an incident with tcpdump. Plus there is so much data in and out, this is a very high traffic web server.

So I contacted my data center, they checked their logs for the time in question and didn't see anything. I looked through my own logs that save certain things any time my server is offline. What I saw with ifconfig is that after a few minutes, both of my interfaces dropped their IPs and never got them back. Now, my script does take the interfaces offline every 3 minutes. But the logs are taken every 1 minute. So worst case should be that every 3rd log of ifconfig has no IPs, then they get them back. But that's not happening, there is a stretch of ~15 minutes where there are no IPs on either interface. Makes me think maybe the router at the data center was attacked, or that there was some glitch in their system. Maybe this whole incident was a fluke. I guess for now I need to run it as-is for a while, and potentially implement system-rebooting into my script when repeated TCP resets aren't doing the trick.

The size and complexity of my bandaid continues to grow. . .
 
It would be really hard trying to catch an incident with tcpdump. Plus there is so much data in and out, this is a very high traffic web server.

Data is big is why you only store headers, not the whole packet. Maybe you can store 5-15 minutes, and save the buffer the first time your script triggers in a day? Maybe even just 90 seconds. It'll be a lot to sift through in wireshark, but often there's a big change in packet rate, and there's something fishy within a couple screens before the change.

You could maybe capture only packets that aren't tcp with just ack or ack+push. Those are likely less interesting than most, and more numerous than most would be my guess. Although, certainly not always: two of the nasty tcp bugs I ran into involved excess sending of data or just empty acks (sometimes up to 10gbps worth, fun times).
 
Back
Top