System failures à go go

By , 2009-11-26 22:41

Today was quite the day. As the title says, systems were failing all over the place. Our main switch at work (a Cisco 6509) crashed about 3 times this week, causing our vSphere environment to crash repeatedly, taking all the guest VMs with it. We searched for a long while before discovering that a faulty UPS battery was to blame for the switch’s instability. Meanwhile, we’re left with a misconfigured iSCSI SAN and 3 ESX hosts with no storage.

At home, my crazy MythTV/OpenVZ/KVM/PBX/Windows 2003/Seedbox/RADIUS server had to be shut down when my home network started acting up. DHCP stopped working, and the machines that were left had difficulty pinging each other. This time, a Cisco device was to blame. A WRT610N router that I use as an ABGN Access-point running DD-WRT had somehow bricked itself and started broadcasting packets on the network, thus flooding my routers and other computers. Then, I tried booting up my server again. MythTV and OpenVZ started up OK, but the qemu-server/kvm machines didn’t start, throwing “can’t open lock for VM 107 ‘/var/lock/qemu-server/lock-107.conf’ – No such file or directory”. Weird error. The fix is to create the /var/lock/qemu-server folder.

And finally, everything at home is up and running again. We’ll see tomorrow morning how things go at work. David was staying late today on the phone with Dell EqualLogic specialists, so fingers crossed!

Leave a Reply


Custom theme by me. Based on Panorama by Themocracy