Comment on the cloud failover at Atlassian: The professional is different

Comment on the cloud failover at Atlassian: The professional is different

he went terribly wrong, mike and scott,, The partial failure of the cloud versions of Confluence and Jira, indispensable tools for many developers in their daily work. And where you’re urging your customers to switch to cloud versions from early 2021: the prices of on-premises versions have gone up, from 2024 they shouldn’t be offered at all. At the time, iX offered alternatives to Jira and Confluence for users who didn’t want to move to the cloud with you.

There are certainly good reasons to cloud now. Users no longer have to deal with the installation and operation of software, and the risk of failure or data loss is likely to be greater in their own data center than in the cloud. But it’s a bit like with cars and planes: Objectively, the risk may be higher in your own data center, but you’re helplessly at the mercy of cloud failure, while you can still do something in your data center. Huh.

Some Atlassian customers experienced this pain this spring: As of April 5, they haven’t had access to Confluence and Jira in the cloud — sometimes for weeks. Thanks to, let’s call it friendly, reserved communication from the provider, it was unclear for days how many customers were affected, whether there was a risk of data loss and how long the outage would last.

Data stored in the Atlassian cloud does not appear to be lost in the process. But of course a lot of customer trust. According to Atlassian, the cause of the outage was faulty internal agreements and a maintenance script that accidentally deleted customer data and inactive accounts. Also according to Atlassian, in the days following the incident, “hundreds of engineers worked round the clock” to fix the problem.

See also  Tips to avoid being a victim of social engineering

It doesn’t reflect well on Atlassian. What about procedures, rules and compliance in a company when misunderstandings among employees can trigger such a catastrophe? When can a maintenance script do such damage? What if there is no working disaster recovery and no emergency management, but the entire developer team has to overcome breakdowns for days under high pressure? What if, instead of importing a single backup, hundreds of accounts would have to be manually restored in direct communication with affected customers?

Also, there has been no proper communication from the provider side for several days now. Was it expected that unaffected customers – according to Atlassian 99.6 percent – ​​would not notice anything about the outage? Or were you so overwhelmed by the situation that everyone was completely lost?

breakdown may occur. Contingency plans can fail. Crisis communication may fail. But a company that claims to hand over their data to over 200,000 customers should be able to handle this kind of situation better. and not myself Brag about it in a post-event reviewThey had an uptime of 99.9 percent until that failure, as did the founders and owners of Atlassian.

In any case, professionally it is different.

* Mike Cannon-Brooks and Scott Farquhar, Atlassian Founders and CEO

more from iX Magazine

more from iX Magazine

more from iX Magazine


(OD)

on home page

LEAVE A REPLY

Please enter your comment!
Please enter your name here