Disaster recovery

Loftware Cloud is stable and secure.

But if things go wrong, Loftware has comprehensive disaster recovery plans in place. Our teams work hard to minimize your downtime and help you get back to business as usual as quickly as possible.

Terms and definitions

NAME

DEFINITION

INCIDENT

A situation that might be, or could lead to, a disruption, loss, or disaster.

DISASTER

Any condition that results in a prolonged inability to access or use Loftware Cloud. A disaster requires recovery action to restore normal operation.

INCIDENT RESPONSE TEAM

Includes members of our support and application development teams who respond to incident support requests from customers. Incident response team members receive alerts from our monitoring system. Incident response teams resolve incidents or escalate incidents to disasters.

PROVISIONING TEAM

Includes application development team members responsible for managing Loftware Cloud. Besides regular management, our provisioning team supports our incident response team in resolving incidents.

DISASTER RECOVERY TEAM

Assembles in the event of a disaster scenario to recover the service from the disaster. Includes provisioning team members.

DISASTER RECOVERY PROCESS MANAGEMENT TEAM

Monitors, reviews, and makes changes to disaster recovery processes to ensure effectiveness. This team is not directly involved in disaster response, but reviews each disaster scenario to improve processes.

Flow chart

image2.png

Incident start

Incidents begin when our incident response team receives information about issues affecting Loftware Cloud.

This information may come from:

  • Monitoring system alerts

  • Customer support requests (phone or email)

  • Other events indicating potential Loftware Cloud problems

We track incidents with support tickets following standard support procedures.

Incident response

Incident response teams handle incidents. Response includes:

  1. Incident assessment (reviewing alerts, customer reports).

  2. Decision point. After investigation, teams decide whether or not to escalate incidents to disasters.

    1. Incident response teams consult with the provisioning team as needed.

    2. If incidents do not require disaster responses, teams resolve incidents according to standard support processes.

Incident handling and response times follow standard support procedures determined by your SLA level.

Escalation to disaster

Incident response teams contact the provisioning team to trigger disaster recovery responses. Our provisioning team assembles a disaster recovery team to oversee disaster recovery processes.

Disaster recovery

Teams log all status updates in our internal system to ensure visibility for all teams involved. Teams add the [Disaster] keyword to all related support requests to organize disaster logs.

Our disaster recovery team analyzes problems and determines next steps by following our established disaster recovery procedures:

  1. Identify the scale, impact, and root cause of the problem.

  2. If the problem is due to underlying Azure cloud infrastructure, make sure Microsoft is solving the problem:

    1. Check Microsoft notifications in the Service Health section.

    2. Open support tickets as needed.

    3. Monitor Microsoft’s progress.

      If Microsoft resolves the problem in a timely manner, no additional recovery action is required.

  3. If Microsoft does not solve the problem, begin recovery procedures following our recovery guide.

While the disaster recovery team progresses through disaster recovery, we provide affected users with status updates and estimated resolution times.

Following recovery, the disaster recovery team analyses root causes of outages and recommends improvements you can make to prevent future incidents. Your affected users receive reports including service credit notes when applicable.

Recovery times

We’re committed to restoring service as soon as possible. Recovery times may vary according to the nature and the scale of the problem. Loftware works with Microsoft to resolve issues related to underlying services provided by Microsoft Azure.

Process reviews

Our disaster recovery process management team reviews our recovery processes:

  • After each disaster scenario

  • Periodically (at least once per year)

  • As needed (during planned enhancements, or if deficiencies are found outside the scope of periodic testing)

Our disaster recovery process management team determines if our processes require changes and may delegate implementation to our provisioning team. We notify affected teams about all changes.

Periodic tests

Our disaster recovery process management team periodically tests our disaster recovery processes to ensure correct execution and measure effectiveness. Our teams schedule and perform periodic tests at least once per year following established plans. Teams analyze test results during process reviews.