2012-02 France Survey Server Damage

Between February 2012 and September 2012 Fugro LADS conducted what was (until our 2015-2016 Saudi Arabia survey), the largest LIDAR survey ever conducted. The client in this case was the French government’s hydrographic agency, SHOM.

Below are some pics of the server for this job, in our build area in Adelaide. The green shell is a military-grade transit case with an internal server rack mounted on rubber shock absorbers. The server is a Dell R710 with 12x2Tb SAS disks, for 20Tb of RAID6 storage. The tape library is a Dell TL2000 with dual LTO5 drives. At the bottom is an awful 1U HP UPS. The back of the rack also housed a 24-port switch:

Upon arriving at a survey location, there’s always a rush to get the system up & running so we can make sure the equipment is all OK, and so surveyors can check they’re ready to plan flights and get the survey underway. The photos below show our field office in a conference room at our hotel in Brest, with the system freshly setup & looking somewhwat untidy:

The main server is in the middle of the room in its military-grade portable rack. Cables are a bit of a mess in these pics, but were of course taped down to the floor as soon as people were happy with the office layout.

Although the system was basically running fine it wasn’t long before I discovered the mil-spec indestructible portable server transit case had taken a hell of a jolt during transit. The Dell TL2000 tape library was refusing to load tapes into its top drive, and closer inspection revealed that part of the internal bracing had bent downwards, blocking the tape drive bay.

There wasn’t any noticeable damage to any part of the transit case beyond a few scuffs that looked relatively minor. But there can be no doubt the whole case must’ve been mightily squeezed in some way. This had caused the transit case to flex & bend inwards, squashing the tape library up like an accordion. When the pressure was released, this flimsy part of the tape library chassis was the only thing that stayed bent and didn’t flex back into its proper shape.

Damage to Dell TL2000 LTO5 tape library in transit to France:

Bending the chassis back into line was easy, and the system operated perfectly for the remaining 6 months of the deployment.

LTO library chassis bracing bent back into shape, and library fixed:

But the transit gremlins were not yet satisfied with their mischief on this survey. At the end of the 6-month survey, in September 2012, the system was freighted back to Adelaide. It arrived in Adelaide looking like it had come back via Syria. It was scuffed & scratched all over and dented in several places. This is a steel case, not aluminium, and the two largest dents were on corners where the curvature makes it stronger than anywhere else. The force required to cause the larger dent must have been enormous. My guess is someone drove a vehicle or forklift into it:

The pics above were taken after removing the server & racking it safely in our main server room. Whereupon we discovered that this time, the tape library was fine, but the R710 server was in serious trouble.

Of the 12 drives, 2 had failed completely, and 4 more were damaged, showing bad sector counts that continued to gradually escalate as we read data from the drives and attempted to access more of the sectors. With two disks completely offline from the RAID6, every sector of every remaining disk was required, but four of the remaining 10 disks had been damaged. Every attempt to read a bad sector therefore resulted in a message logged that was something like “hole punched in RAID, sector XXXXXXX”. This is something you don’t ever expect to see on a RAID6…

In spite of all this damage, I was able to get the most recent data off the server without hitting any bad blocks in important files. I was then able to merge this new data with older data already transferred back to the office, & surveyors were quickly working on a fully updated database – without having to wait for restoration from one of the two complete LTO tape backups done just prior to transit.

The server eventually made a full recovery, after the 6 damaged disks were replaced, RAID volumes recreated & filesystem reformatted. No data was lost, with even the not-so-important damaged files eventually recovered from LTO tape.