We are committed to delivering a continuity of service that allows our clients and users to maintain the continuity of their operations when using Sitepass. Our business continuity plan outlines the:
- Measures we take to mitigate risks to our business
- Technical measures we take to protect our data, and
- Technical and administrative measures we take to recover from a disaster
Corporate management has approved the following policy statement:
- The company shall develop a comprehensive disaster recovery plan.
- A formal risk assessment shall be undertaken to determine the requirements for the disaster recovery plan.
- The disaster recovery plan should cover all essential and critical infrastructure elements, systems and networks, in accordance with key business activities.
- The disaster recovery plan should be periodically tested to ensure that it can be implemented in emergency situations and that the management and staff understand how it is to be executed.
- All staff must be made aware of the disaster recovery plan and their own respective roles.
- The disaster recovery plan is to be kept up to date to take into account changing circumstances.
The principal objective of the disaster recovery program is to develop, test and document a well-structured and easily understood plan which will help Sitepass recover as quickly and effectively as possible from an unforeseen disaster or emergency which interrupts information systems and business operations. Additional objectives include the following:
- The need to ensure that duties to implement such a plan are understood.
- The need to ensure that operational policies are adhered to within all planned activities.
- The need to ensure that proposed contingency arrangements are cost-effective.
- The need to consider implications on other company sites.
Disaster recovery capabilities as applicable to key customers, vendors and others.
The Hosting Infrastructure
Sitepass has partnered with Amazon Web Services (AWS) to host and deploy its application. Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS cloud.
Amazon EC2 enables Sitepass to launch as many or as few virtual servers as required, configure security and networking, manage storage, and scale up or down to handle changes in forecast traffic.
Importantly, Amazon meets and exceeds a security and compliance requirements ensuring Sitepass maintains data protection for its clients. You can read about Amazon’s compliance from this link Amazon’s compliance.
Regions and availability zones
Amazon EC2 is hosted in multiple locations world-wide, known as regions. Sitepass is deployed the AWS Sydney region. All data collected and processed from the Sitepass application is retained Australia.
Each Amazon EC2 region is completely isolated from the other Amazon EC2 regions, and within each Amazon region there are multiple, isolated locations known as Availability Zones. Amazon EC2 provides Sitepass with the ability to place resources, such as instances (servers), and data in multiple locations within that region ensuring the highest level of fail over and redundancy in the situation of a disaster.
The Availability Zones in each region are connected through low-latency links. Sitepass utilises both the separation of Regions and Availability Zones within AWS to maintain data protection and implement both disaster recovery and backup.
Amazon enables Sitepass to deploy multiple instances (servers) to meet its business requirements and platform needs. The flexibility of Amazon EC2 enables both horizontally and vertically scaling to meet its platform needs, and implement solutions to support performance, disaster recovery and backup requirements.
Horizontal scaling is the process of adding more nodes to a system, such as adding a new server to a distributed software application. Sitepass uses horizontal scaling to extend its platform by adding additional instances to meet its business requirements, but important to meet usage demands.
Vertical scaling means to add more resources to a single node within a system, such as increasing the CPU or memory requirements for a single server. Vertical scaling is used to increase performance to align with forecast user traffic and manage performance of its application when under load.
Vertical scaling is the process of increasing the instance type (hardware performance) for each instance on Amazon. Sitepass uses a variety of instance types depending on the purpose and requirements for the application or software that is run on the instance.
The platform consists of the following virtualised servers:
This is the cluster of primary production servers which host the application and database. All user traffic is directed to the platform hosted on these servers. Traffic is evenly spread across the primary production servers, which are located across all 3 availability zones, to ensure redundancy and stability.
Web application firewall (WAF)
The WAF helps secure the application by blocking common web application threats.
Platform uses Amazon’s S3 and CloudFront services to deliver static content such as video, audio and images to users.
Front end web servers
Front end web servers deliver the Sitepass application to the user.
Sitepass uses Amazon’s Aurora managed database platform to host the database, which provides scalability, redundancy and to-the-minute backups. All database data is encrypted at rest.
The three storage options provided by Amazon that are used by Kineo to store and collect data are:
- Amazon EBS
- Amazon S3
- Amazon Aurora
This instance is accessible to Sitepass clients to conduct functional and integration development and testing, which is separate from their production (live) environment.
This instance is responsible for coordinating and storing all backups for the application and database. The Backup instance is hosted in a separate Availability Zone to both the Primary and Failover servers to ensure that the backups of customer data are logically and physically separate from production data and can be accessed in the case of a disaster in another Availability Zone.
This instance hosts the Sitepass website and the product updates site.
This instance is used to host our internal monitoring and logging systems. Metrics logged include:
- Application access logs
- Application uptime, and
- Platform performance and utilisation statistics.
Backup is an important aspect of the Sitepass hosted platform, and a multi-facet backup solution has been implemented. Backups are implemented both on the Amazon platform, and in a secure location at Adelaide head office in the case of the disaster within the entire Sydney region.
For the Australian platform, daily incremental backups of the following have been implemented:
- The application and its database
- Course content
- User uploaded files and assets
- Sitepass corporate website
- All site logs
- Uploaded SCORM courses,
- Infrastructure and instance configuration
The power of Amazon enables Sitepass to implement backups using horizontal scaling.
- A dedicated Amazon instance has been setup to manage incremental backups.
- The backup frequency is daily (24 hours), and
- All backups are stored within Amazon AWS.
- Backup integrity is verified daily.
- All backups are encrypted at reset
- Backups are retained for a period of 7 years.
Adelaide head office
As AWS does not have a separate region within Australia, therefore daily backups occur from Amazon to the Sitepass Adelaide, Australian head office, ensuring that in the case of a disaster within Amazon’s Sydney region, access to a local and up-to-date backup is available.
- Backups are copied daily from Amazon using an encrypted connection.
- Backups at head office are stored in our access-controlled server room.
- The head office is alarmed, and security monitored out of business hours.
- Access to the server room is restricted, and
- The office and server room are audited as part of the ISO27001 certification.
In the case of a disaster, Sitepass has implemented a multi staged disaster recovery process
The Sitepass application is designed to be resilient in the face of an outage of one or more servers. If a disaster occurs in one Availability Zone hosting the Production servers, end-user traffic is automatically directed to redundant instances in a other Availability Zones. This provides a real time fail over solution in the case of a disaster within one of the availability zones within Amazon.
In the event of failure of one or more web servers, the Sitepass application can automatically provision additional redundant instances to handle end-user traffic.
The Sitepass application also runs multiple primary and secondary database nodes that are synchronized using Amazon Aurora’s replication technology. In the event of failure of one of the primary database nodes, a secondary database node will automatically be promoted to primary with no loss of data.
- The expected recovery time objective (RTO) is less than 1 hour.
- The expected recovery point objective (RPO) is less than 1 hour.
In the unlikely case, that a disaster has impacted, both Availability Zone A and B in the Sydney region, backups will be used to restore to a non-impacted Availability Zone within AWS.
- The expected recovery time objective (RTO) is 48 hours.
- The expected recovery point objective (RPO) is up to 24 hours.
Disaster recovery process
In the event of a significant server failure, the following steps will be followed to address the issue:
- conduct an immediate assessment of the problem
- identify an action plan and develop initial estimate for the rectification
- send communication and an estimated time of rectification to client contacts
- log the issue to the live server status page
- Implement the action plan. Depending on the circumstance this can include:
- setup of an alternate server
- installation of the application
- transfer of files from backup
- system testing – random account and operational procedures
- re-delegation of domains
When there is a change to the circumstances, Sitepass will provide an update to the client contacts detailing the change.
Where domain re-delegation is required, there may be a 12 to 24-hour delay between restoration of system functionality and the system becoming available via the internet, due to the delay in DNS propagation.
Disaster recovery testing and validation
The following activities are performed in order to test the disaster recovery process and validate the integrity of backups:
- Daily – Backups are restored to a simulated environment to ensure integrity of data.
- Monthly – Backups are restored to the Beta environment to ensure availability of all data from backup and for verification.
- Quarterly – Stage 1 disaster recovery is tested. The production failover capability is tested by running the application on the Production (Failover) servers during a maintenance period.
- Yearly – Stage 2 disaster recovery is tested. A new simulated Production instance of the application is built from backup in order to test the processes and procedures required to rebuild the application and infrastructure in case of disaster.