Incident description
Incident severity: CRITICAL
Data loss: NO
Timeline
Time (CET) | |
---|---|
18 Jun, 23:16 | Issue Reported by OC |
19 Jun, 07:35 | Picked up by Michael H the following morning |
19 Jun, 08:30 | Fixed by turning off SSL temporarily to restore the service. Initial investigation revealed certificate has expired but later turned out that wasn't the case. |
19 Jun, 10:30 | Further investigations were carried out to avoid such failures in future |
19 Jun, 16:08 | The actual cause identified for the failure - due to IT patching certificates were automatically changed. |
20 Jun, 09:30 | Proposal was discussed between IT and SWD to avoid such failures in future. |
20 Jun, 11:30 | Part one of Nagios check in the proposed solution implemented |
20 Jun, 16.30 | New certs provided by IT installed on crowd servers. SSL switched back on (crowd ↔ AD). |
Total downtime: 09:14 hours.