Stack Overflow Careers is served up something like so:
user -> internet -> our fw -> nginx -> haproxy -> web farm
- FreeBSD is the operating system in use
- no firewalling or QoS is in place on this box
- nginx handles our SSL termination
- haproxy handles the load balancing
- nginx / haproxy are pushing about 15 Mbps each way
During normal operation, nginx receives the HTTP request, does its thing, and hands off the request to an haproxy instance that is bound to the loopback address (127.0.0.1) on that same box.
In order to do some troubleshooting the other day, I moved the haproxy instance onto the same interface that nginx was running on. This immediately added 100ms of latency to all requests. This interface is not a true physical interface, but a carp interface.
Can anyone explain to me why this was the case? Contention with the packet queue maybe? Or perhaps loopback is always faster because it is 'soft'? There is something fundamental that I'm missing here, and I'm hoping someone will kindly educate me.
-
Just for clarity, you only changed how it was being accessed, from the 127 address, to the local IP; correct?
If that's the case and it made a difference, something ain't right. Check your routing table with
netstat -rnand see what the local IPs are routed to, it should be routed to the lo0 interface (just like 127).Your
netstat -rnoutput should be vaguely similar to this:Internet: Destination Gateway Flags Refs Use Netif Expire default 1.2.3.1 UGS 131 2655014 nge1 1.2.3.0/23 link#2 U 0 88 nge1 1.2.3.4 link#2 UHS 0 34848 lo0 127.0.0.1 link#5 UH 0 64678 lo0 192.168.0.0/26 link#1 U 2 41703537 nge0 192.168.0.1 link#1 UHS 0 70088 lo0Michael Gorsuch : I should've included this in the post: these interfaces are carp interfaces. Completely slipped my mind until I ran netstat. Does that make a difference?Chris S : Yeah, that's it right there. If the address you're using is assigned to a carp interface using that IP will force it through the carp stack before it hits the loopback device; 100ms would be excessive still though. Is the host in question the master of that IP, or are you using load balancing? It could be sending the traffic to the other carp host.Michael Gorsuch : The host is the master of that IP.Chris S : I just finished whipping up a similar environment and testing it out. I saw no appreciable difference in response times between the carp interface IP, 127 IP, and a physical IP. I've only got one server here to test, so no carp slaves, but I suspect something's amiss elsewhere in your environment (firewall or traffic shaping?) that's causing the latency. This is a i386-8.1-STABLE.Michael Gorsuch : Thanks, Chris. I'll see if I can gather more info when traffic dies down. The current system is not using any firewall package or doing any traffic shaping. I should also note (will update original question) that we are receiving a large amount of traffic due to the job ads that we're displaying on the SO family sites. We're moving about 15 Mbps each way during normal hours.From Chris S -
I have seen loopback implemented as an interrupt-level software i/f such that traffic never gets outside the box. Could this have been the case when you were running loopback? Disclaimer: Just a general question; I know nothing about FreeBSD.
-- pete
Chris S : This is not the way it works in FreeBSD. -
A constant 100ms delay looks weird. It looks like packets get buffered and not immediately delivered. Or maybe some of them are dropped and retransmitted. Can you run tcpdump on this interface to show the problem ? I don't know how the IP stack works on FreeBSD, nor how CARP is implemented, but would it be possible for instance that the slave regularly advertises its MAC address with gratuitous ARPs and that the master alternatively sends packets to each side ?
Could you also run tcpdump on the real interface to ensure that nothing gets emitted out ?
Is it possible that the system refrains from caching a CARP device's ARP entry, thus causing an ARP request to be emitted for each packet of a session, that the CARP daemon would have to answer ?
Most of those are some stupid ideas, but it's in order to help you search in the right direction.
Michael Gorsuch : Thanks for the ideas, Willy. I move the configuration back to the interface off-hours and see what a packet trace turns up.From Willy Tarreau
0 comments:
Post a Comment