This is a more recent story while working for an MSP in Europe compared to my time working for an ISP in Australia
the cast:
Me: Slazer
OT: Other Tech
I get a message on slack
OT: Hey, I am seeing something weird in the French office for customer, can you help me look into it?
Me: Sure
Queue the Teams call.
OT: So all the Access Points in that office are reported as offline in cloud.vendor.com portal but the customer is not reporting an issue.
Me: Ok, that is odd. What is the monitoring system saying?
OT: Monitoring says everything is OK, I can ping them and do SNMP calls to all the AP, they are just reporting as offline in the portal.
OT: The other thing is the firewall says the AP are trying to access cloud.vendor.com but the local in policy is denying the traffic.
Me: That is rather strange.
I log into the firewall and check the logs and see the APs are in fact trying to access cloud.vendor.com but the destination is 255.255.255.255. Not the expected IP from the vendors documentation.
Me: Well I want to say it’s a DNS issue what happens when you reboot the AP?
OT: Rebooting from the portal doesn’t work but I rebooted on from the switchport and the same thing happens.
Me: Is the on prem DNS server working?
OT: Yea, the domain controller is the DHCP/DNS server and it has no issue with access, the customer hasn’t reported connection issues. It looks to be just the APs.
Me: Ok then, are they being allocated the right DNS servers?
OT logs into the domain controller and everything is looking good.
Me: dafuq?.. Wait, do these even use the DNS server from DHCP or do we set one via the device template?
OT: Not sure, never had this happen before. When we provision these they are plug and play.
I log into the vendor portal and start poking around and notice all the APs have the same DNS server of 208.67.222.222 (OpenDNS)
Me: Ok, well the AP aren’t using the local DNS server they are using openDNS. Lets start a packet capture to see what is going on.
I setup a packet capture on the firewall and limit it to the IP of the AP we are looking at and let it run for a bit and crack open the capture in Wireshark.
I just start laughing at the error
OT: I know that laugh, what did you find?
Me: what do you make of this error?
Every single DNS query had this as the response.
The OpenDNS service is currently unavailable in France and some French territories due to a court order under Article L.333-10 of the French Sport code. See https://support.opendns.com/hc/en-us/
OT: Wha???
Me: Yea… Now for the hard part.
OT: Hard part?
Me: How do we fix this? There is no ssh logins to the AP, we can’t push config because the devices are offline according to the portal, and there is no way we are getting console to each of those units.
OT: I see.
Then the dumb idea occurred to me.
Me: I have a dumb idea. We DNAT any traffic destined for OpenDNS to Googles DNS so we can reconfigure the units to use the local DNS servers.
OT: Would that work?
Me: It should… I hope.
We then setup DNAT for the AP specifically to rewrite the DNS request destined for OpenDNS and forward it to Googles DNS.
After activating the config we start seeing the devices come online in the portal as if nothing happened to them.
OT: Hey, it worked.
Me: omg, it actually worked…
I am somewhat sill shocked it worked.
At some point I will get some time to clean up that DNAT and finish reconfiguring the APs.
I don’t have “nslookup” handy, but does this mean that DNS queries to OpenDNS from France will return an A record of “255.255.255.255”? Or do the APs default to the broadcast IP when the DNS query fails?
The first scenario would be an utterly insane response from a DNS server. The second scenario is unreasonable, as it means the AP knows that DNS resolution failed but it still wants to try chucking “Hail Mary” packets at the aether anyway.
With scenario 1, I’'d be concerned if other parts of the customer’s network are using the same nonfunctional DNS server. But in any case, your workaround was pretty clever.
Ap are going to do the fuck they want. I suspect it was a fallback on the AP as they were constantly trying to get a resolution but we’re getting the error message instead.
It’s always DNS. Unless it’s not.
I guess the APs are not using DoH or DNSSec, otherwise DNAT would (should) fail.
I’d be quite surprised if they actually support DoH, but OP already said it was set to OpenDNS by default anyway, so if it did it’s probably disabled.
I also don’t believe DNSSec would affect this, since it just verifies that a DNS zone wasn’t modified by a non-authority, not that you’re actually talking to the same DNS server you’re expecting.
These are so much more readable and enjoyable without abbreviations as people. Can we just give them fake names?
I keep telling my coworkers this: It’s always DNS, even when it’s not.
My boss has since pinned that in our teams channel.
*cue
While you would be right. I was waiting for a useless meeting to end so there was a queue for the call.
What is a DNAT and what is an SNMP?
DNAT is Destination Network Address Translation.
In normal operation you do Source NAT when accessing the internet. Your PC private IP is rewritten to a public one of your router.
DNAT rewrites the destination IP to something else.SNMP is Simple Network Monitoring Protocol.
You send a string of numbers to a device and you get back information about the device.
If you send a device a SNMP string of 1.3.6.1.2.1.1.5 you will receive back the hostname of the device.That was very clear and informative. Thank you.