Customers praise Microsoft’s ‘no BS’ explanation of cloud service outage

Microsoft have apologised for a several-hour outage of their cloud-based Visual Studio Online service in what customers are describing as a “refreshingly-direct” and “no BS, straight-talking” explanation of the failure.

In a blog post on August 22nd, Brian Harvey, a Microsoft Technical Fellow, corporate Vice President and Product Unit Manager for Team Foundation Server, detailed the events and his opinion surrounding the August 14th outage of Visual Studio Online, the team server designed to complement complex projects and collaboration.

In some regions Visual Studio was offline the previous day but troubles mounted Thursday morning until a total outage was announced that lasted over five hours. “This duration and severity makes this one of the worst incidents we’ve ever had on VS Online,” said Harry.

In Harry’s blog he apologises for the outage, then dives straight into into a technical explanation of what triggered the blackout, and then followed it up what what the company is doing to prevent it from happening again.

“We’ve gotten sloppy. Sloppy is probably too harsh. As with any team, we are pulled in the tension between eating our Wheaties and adding capabilities that customers are asking for,” said Harry. “In the drive toward rapid cadence, value every sprint, etc., we’ve allowed some of the engineering rigor that we had put in place back then to atrophy — or more precisely, not carried it forward to new code that we’ve been writing. This, I believe, is the root cause.”

Historically, Microsoft has patted its own back on things like backwards compatibility, supporting older technologies and software, even as the rapid pace of advancement continues. In the eyes of the customers, this has unfortunately only contributed to the failure.

More information, including the blog post itself, can be found on Microsoft’s MSDN developer site.

Photo by flickr/incredibleguy