Go Live Checklist

Here's a typical example of a “Go Live” checklist to run through before greenlighting a cut-over to a new production environment.

Preflight Checklist

  • Rollback Plan
    • Verify Backup Integrity and Recency
    • Test Restore Process
  • External Availability Monitoring
    • Enable “Real User Monitoring” (RUM). Establish a 1-2 week baseline.
  • Alert Escalations
    • Ensure on-call engineers have mobile devices properly configured to alert
    • Review on-call schedule to ensure continuity
    • Prepare run books and link them to alerts
  • Performance Load Tests
    • Replicate production workloads to ensure systems handle as expected
  • Exception Logging
  • Reduce DNS TTLs
    • Set TTLs to 60 seconds on branded domains
  • Prepare Maintenance Page
    • Provide a means to display a maintenance page (if necessary)
    • Should be a static page (e.g. hosted on S3)
  • Schedule Cut Over
    • Identify all relevant parties, stakeholders
    • Communicate scope of migration and any expected downtime
  • Perform End-to-End Tests
    • Verify deployments are working
    • Verify software rollbacks are working
    • Verify autoscaling is working (pods and nodes)
    • Verify TLS certificates are in working order (non-staging)
    • Verify TLD redirects are working
  • Establish Status Page
    • Integrate with all status checks

Post-Cut-over Checklist

  • Review exception logs
  • Monitor customer support tickets
  • Monitor non-200 status codes for anomalies
    • Spikes in 404 could indicate missing assets
    • Spikes in 30x could indicate redirect problems
    • Spikes in 500 could indicate server misconfigurations
  • Ensure robots.txt is permitting indexing
    • Ensure all mainstream crawlers are happy
    • Ensure sitemap is reachable
  • Check Real End User Data
    • Verify that end-user experience has not degraded