TLDR: Afrihost takes 180 hours or 7.5 days to recover from a datacenter disaster
At 18:46 on Wednesday, 29 June 2022, Afrihost cloud services (storage and virtualisation) encountered a catastropic failure. All shared hosting, cloud hosting and dedicated hosting went down.
The message from Afrihost’s status page:
Due to extended load shedding there was a cooling system failure at MTN’s Gallo Manor data centre where we host some of our web servers.
This unfortunately damaged certain Afrihost hosting servers – We have successfully recovered 8% of the affected servers.
Please be assured that our team is working tirelessly to recover all services as quickly and as safely as possible.
Using MTN’s Datacenter
In 2016 MTN announced it will dispose of the controlling stake in Afrihost. I am guessing that previously there was a deal where Afrihost made use of MTN’s datacenters.
However with MTN exiting Afrihost: the option was available for management and executives to make the move to a datacenter company – focused on one thing and one thing only – the management and maintenance of a datacenter. To avoid the mishaps we have seen with companies not being datacenter specialists – like the South Africa Azure Datacenter Flood.
One such company is Teraco and it is a company that many global internet companies (Google, Facebook, Netflix) and local ISP’s like Vox Telecom have been using. Vox used Teraco as their datacenter since the migration in 2018 from their onsite self-run Waverley datacenter.
Note: Teraco’s JB1 and JB3 datacenters in Isando are on the same electricity as OR Tambo and are hence not affected by loadshedding (apparently).
Another factor in the decision not to move away from MTN Gallo Manor datacenter. Gallo Manor is only 4km away from Afrihost’s Head Office in Rivonia. Gallo Manor hosting is also 25% cheaper than Teraco according to Cloud Africa.
The failure to move to a more reliable options has now cost Afrihost’s reputation and damaged their clients businesses.
All to cut costs by 25% and some petrol money for Datacenter staff. Testament to why managing risk is so important in business.
How Fast is Afrihost Recovering?
Afrihost was posting updates about the speed of recovery every 2 hours and leaving previous notifications up. However, they are started removing the older messages so customers don’t really know how fast they are recovering.
Luckily I have been screenshoting the recovery over time and will be calculating that now.
Time | Percent Recovered | Hours since | Rate |
---|---|---|---|
18:46 29 June 2022 | 0% | 0 | 0 |
11:05 1 July 2022 | 8% | 40 hours | 0.2 |
23:46 1 July 2022 | 15.3% | 52 hours | 0.29 |
10:35 2 July 2022 | 22.6% | 63 hours | 0.36 |
14:42 2 July 2022 | 24% | 68 hours | 0.35 |
16:39 2 July 2022 | 26% | 70 hours | 0.37 |
20:38 2 July 2022 | 28% | 75 hours | 0.37 |
13:44 3 July 2022 | 37% | 92 hours | 0.4 |
20:29 3 July 2022 | 38% | 99 hours | 0.38 |
13:48 4 July 2022 | 50% | 116 hours | 0.38 |
16:53 4 July 2022 | 52% | 119 hours | 0.43 |
21:38 4 July 2022 | 56% | 123 hours | 0.45 |
12:05 5 July 2022 | 67% | 138 hours | 0.48 |
14:05 5 July 2022 | 70% | 140 hours | 0.5 |
19:29 5 July 2022 | 71% | 145 hours | 0.48 |
23:06 5 July 2022 | 75% | 149 hours | 0.5 |
08:00 6 July 2022 | 88% | 158 hours | 0.56 |
14:42 6 July 2022 | 93% | 164 hours | 0.56 |
16:15 6 July 2022 | 95% | 166 hours | 0.57 |
20:17 6 July 2022 | 96% | 170 hours | 0.57 |
6:22 7 July 2022 | 100% | 180 hours | 0.55 |
The rate of server recovery is ~0.55% of cloud servers per hour.
I don’t know the constraints or the full extent of the damage – but I will say that this is a slow recovery rate. Some customers would have had to wait 7 days for recovery. A week of no visits to your businesses website, shop or critical business web services could be disasterous.
Afrihost may even have some Contracts and Agreements that may have been broken which have legal and financial implications.
They will also have to refund their customers for a service not delivered.