Change Control Procedures
October 10, 2022 18:04:46 UTC
This Standard Operating Procedure ("SOP") document establishes change processes and procedures related to cloud infrastructure services provided by Stellar Technologies Inc. ("Stellar").
Types of Changes
Stellar categorizes cloud internal infrastructure changes in the following manner:
P0: Critical Change
Even though it is rare, unplanned outages can happen. During such an event, changes made to troubleshoot or mitigate active problems or to restore services are categorizes as a "Critical Change" or P0 Change, generally correlated to a P0 event. Due to the nature of major infrastructure outages, it is unlikely that a customer will be notified of a P0 Change until after the event has ended. We make every effort to document all changes made during such an event, in order to provide our customers with as much transparency as possible.
P0 events are tracked by our operations team as a change control ticket or group of change control tickets. Customers will be notified by phone as soon as possible, and a Reason for Outage document will be provided to all impacted customers after the event and following investigations have been completed.
P1: Emergency Change
Some unplanned outages can be avoided, and we make every effort to ensure they never happen. Occasionally, our internal monitoring and analytics systems alert our operations team of critical events, which we correlate to potential high-impact outcomes. If it is determined that an active alert or event has the potential to significantly impact our infrastructure or customer uptime, an "Emergency Change" or P1 Change may be internally approved and implemented without prior customer notification. While this is rare, it may be necessary to make a service-impacting change in order to prevent a much larger outage or event.
P1: Security Patch
As is not uncommon in the current security landscape, critical security vulnerabilities are sometimes discovered an announced by infrastructure vendors whose equipment we use to deliver our services. In the event that a security patch or update is released in order to address a critical vulnerability on an affected system of ours, our operations team may perform a "Security Patch" or P1 Change if it is deemed vital to the continued security posture of our services. Depending on the severity of the vulnerability and the potential impact of implementing the patch, the patch may be internally approved and implemented without prior customer notification.
P1 events are tracked by our operations team as a change control ticket or group of change control tickets. Customers will be notified by phone and a Reason for Outage document will be provided to all impacted customers after the event and following investigations have been completed.
In order to maintain our performance, uptime, and availability commitments to our customers, our infrastructure may require occasional maintenance. Maintenance events are usually related to important but non-critical fixes that need to be put in place in order to prevent a critical event in the future. We engineer our infrastructure to be as redundant and highly available as possible, so a "Maintenance" or P2 Change may be necessary to restore an element to a fully redundant state in the event of a minor hardware failure or similar event.
P2 events are tracked by our operations team as a change control ticket. An email notification containing dates, times, and expected impact of the maintenance event will be sent to any potentially impacted customers at least 2 business days prior to the event start time.
As cloud engineers determined to offer the fastest, most reliable enterprise-native cloud platform, we are always trying to improve our services and give our customers the best cloud experience we can offer. We sometimes decide to implement new technologies, or take what we are continuously learning from the industry and improve our existing architecture, or perform a software upgrade on internal infrastructure in order to take advantage of new features. Once we decide to move forward with a new technology or architectural change after rigorous testing, our engineers meticulously plan the change deployment process and submit an "Improvement" or P3 Change to Stellar's senior engineering leadership for review, approval, and scheduling.
P3 events are tracked by our operations team as a change control ticket. An email notification containing dates, times, and expected impact of the maintenance event will be sent to any potentially impacted customers at least 7 business days prior to the event start time.
P4: Software Update
Many of our cloud applications and managed security offerings are self-updating, which means they follow a predefined schedule for implementing minor updates such as security definitions (for detecting vulnerabilities, malware, new applications, etc.) and application dependencies.
Security definitions are automatically downloaded and installed according to the following schedule:
Hourly, at 10 minutes past the hour
Daily, at 01:05 device time
Every 15 minutes
Daily, at 01:05 device time
For internally developed or SaaS applications supporting Semantic Versioning, we configure automatic upgrading to PATCH-level versions via Continuous Integration (CI) tooling for internally developed applications, and vendor-provided tooling for SaaS applications.
MINOR-level version changes are hand-reviewed by an engineer for any potential breaking changes, but are treated as a standard P4-level dependency update once the upgrade is approved in CI.
MAJOR-level version changes are treated as a P3 event and follow the guidelines described therein.
P4 provisioning tasks are tracked via a provisioning task in our service system. Customers are not notified of P4 events unless the Stellar operations team deems a notification necessary.
Non-Provisioning changes are tracked via Stellar's service desk system as a change control ticket. Change control tickets requiring approval are not released to the engineering team for execution until a member of engineering leadership systematically approves the change.
For changes requiring approval, the following process is followed:
- Requestor creates change request ticket via service desk system. Information required:
- Summary of changes
- Expected outcome
- Expected Scope and duration of impact
- Task list to be performed during change window, including a point of no return
- Testing procedures
- Rollback procedures
- Metrics of successful change
- Change request is reviewed by a member of the engineering leadership team. Additional criteria to be defined:
- Specific customers or customer infrastructure elements expected to be impacted.
- Proposed dates and times.
- Coordination/confirmation with impacted customers, if necessary.
- Change request is approved or rejected via service desk system.
Definition: Point of No Return
We refer to the "point of no return" as a point in time during a project or implementation where it would be more effort or more impact to roll back a change than it would to finish the change, even if it means going beyond the planned change window.
For changes that afford prior customer notification, the following process is followed:
- General notifications are sent to all cloud service customers, specific notifications are sent to list of customers with expected impact, if necessary.
- 24 hours prior to the change start time, notifications are re-sent to list of customers with expected impact.
- 30 minutes prior to the change start time, notifications are re-sent to list of customers with expected impact.
- Following completion of the change, and after the change success criteria is confirmed to have been met, notifications are sent to all cloud service customers notifying them of the completion of the change.