Since the HVAC coolant leak last Thursday, I’ve been working with Blackboard support and networking-savvy people here at Notre Dame to figure out how chat broke… The Chat/Whiteboard tool just says “Not available. Try back later.” It’s a pleasant enough message BUT not exactly what the few non-face-to-face courses we host are looking for.
Like Alice going deepr down the rabbit-hole, we found that port 2304 traffic goes blithely into an appropriate application server (1 of 3) but never comes out again. We found lowering the Host IP Filters (Solaris app servers) had sporadic affects… We found records (2) in the database pointing to an active chat server on nodes which weren’t even active!
What changed? Got sprayed with coolant? (Read: CMS Admin tears hair out)
Nope. I did it. I sent in a rule request to the firewall admins for the new systems I’m building. I did not want NATing from the load balancer front end to the application servers; instead I want to see the Load Balancer’s proxy IPs in the app server logs. And I did think I needed that in order for client IP address header inserts to function, the Load Balancer Admin says I’m wrong on that one.
Turns out our Firewall Admins have WebCT Vista ‘rule sets’ …so when they applied the change, it changed not only the new systems’ servers, but also the current ones. Now the current servers are getting traffic from places they don’t trust, so they’re not talkin’.
The intermittent success of lowering IP Filters? Me again. I forgot we never got health checks working on the chat pool. Lowering filters on one of the servers meant in a 3 server system that we would have a successful chat connection only 30% of the time.
Time for a break/fix change request to my Sys Admin…