This is a more recent story while working for an MSP in Europe compared to my time working for an ISP in Australia
the cast:
Me: Slazer
OT: Other Tech
I get a message on slack
OT: Hey, I am seeing something weird in the French office for customer, can you help me look into it?
Me: Sure
Queue the Teams call.
OT: So all the Access Points in that office are reported as offline in cloud.vendor.com portal but the customer is not reporting an issue.
Me: Ok, that is odd. What is the monitoring system saying?
OT: Monitoring says everything is OK, I can ping them and do SNMP calls to all the AP, they are just reporting as offline in the portal.
OT: The other thing is the firewall says the AP are trying to access cloud.vendor.com but the local in policy is denying the traffic.
Me: That is rather strange.
I log into the firewall and check the logs and see the APs are in fact trying to access cloud.vendor.com but the destination is 255.255.255.255. Not the expected IP from the vendors documentation.
Me: Well I want to say it’s a DNS issue what happens when you reboot the AP?
OT: Rebooting from the portal doesn’t work but I rebooted on from the switchport and the same thing happens.
Me: Is the on prem DNS server working?
OT: Yea, the domain controller is the DHCP/DNS server and it has no issue with access, the customer hasn’t reported connection issues. It looks to be just the APs.
Me: Ok then, are they being allocated the right DNS servers?
OT logs into the domain controller and everything is looking good.
Me: dafuq?.. Wait, do these even use the DNS server from DHCP or do we set one via the device template?
OT: Not sure, never had this happen before. When we provision these they are plug and play.
I log into the vendor portal and start poking around and notice all the APs have the same DNS server of 208.67.222.222 (OpenDNS)
Me: Ok, well the AP aren’t using the local DNS server they are using openDNS. Lets start a packet capture to see what is going on.
I setup a packet capture on the firewall and limit it to the IP of the AP we are looking at and let it run for a bit and crack open the capture in Wireshark.
I just start laughing at the error
OT: I know that laugh, what did you find?
Me: what do you make of this error?
Every single DNS query had this as the response.
The OpenDNS service is currently unavailable in France and some French territories due to a court order under Article L.333-10 of the French Sport code. See https://support.opendns.com/hc/en-us/
OT: Wha???
Me: Yea… Now for the hard part.
OT: Hard part?
Me: How do we fix this? There is no ssh logins to the AP, we can’t push config because the devices are offline according to the portal, and there is no way we are getting console to each of those units.
OT: I see.
Then the dumb idea occurred to me.
Me: I have a dumb idea. We DNAT any traffic destined for OpenDNS to Googles DNS so we can reconfigure the units to use the local DNS servers.
OT: Would that work?
Me: It should… I hope.
We then setup DNAT for the AP specifically to rewrite the DNS request destined for OpenDNS and forward it to Googles DNS.
After activating the config we start seeing the devices come online in the portal as if nothing happened to them.
OT: Hey, it worked.
Me: omg, it actually worked…
I am somewhat sill shocked it worked.
At some point I will get some time to clean up that DNAT and finish reconfiguring the APs.
While you would be right. I was waiting for a useless meeting to end so there was a queue for the call.