@CarolineS I don't think I'm eligible to win some swag, but I feel compelled to share this anyway! This is a bit of a long story, but hang in there with me, it's worth it. This goes back many years but is one of my favorite network troubleshooting stories. I was working for a major publishing company, and in one of the remote printing plants, there were 3 RIP (Raster Image Processor) servers (Windows PCs) that took in PostScript files and rendered and generated the raster bitmap that got sent to other systems to burn the actual plate that gets mounted on the printing press drums. They were having all kinds of "network problems" for many weeks and spent many hours troubleshooting locally, and many failed jobs had to be re-sent through these servers, some of them multiple times, slightly impacting their ability to actually print the paper on time. After another couple weeks of remote network troubleshooting and even code debugging, they identified a lucky engineer (me) to go on site, which for me was from the east coast to the west coast of the US. And I had a whole bag (literally) of network troubleshooting tools, cables, meters, sniffers, etc. I spent a whole day on site getting everything set up and documented, ran test jobs to establish a baseline, and got ready for that evening's production run. As the run got going and I was running around confirming all looked normal, despite that a few jobs had already failed. I took note that there were no jobs failing on RIP server 1, but there were multiple failures on servers 2 and 3. Seemed odd. The printing plant admin was using the console on RIP server 1 like usual, and since servers 2 and 3 were processing jobs, but nobody was actually sitting in front of those PCs, I started watching their CPU, memory, network and disk activity, etc. I noticed that while I was actively looking at these PCs, no more jobs were failing. After stepping away for a bit, and jobs started failing again, but only on servers 2 and 3, I went back to these PCs perf meters. Jobs stopped failing like before. Then it all became so clear! It was the screensaver! The local printing plant admin personalized these 3 PCs (which had terrible video cards) with a graphics-intensive screensaver, and totally killed the performance to the point that jobs were failing. The reason nothing ever failed on server 1 was because that's the console the admin always used, so the screensaver never kicked in. And when failed jobs had to be sent multiple times, it eventually got through because it eventually load balanced to server 1. And no other printing plants around the country ever had this issue, because they all had their default screensavers. So in the end, this could have been a Dilbert comic strip! After a month of troubleshooting and frustration and escalations and all kinds of blame-game antics going on, they flew an engineer across the country to fix it... by turning off the screensaver!
... View more