Cloud Architecture

Run an infrastructure that will reduce your overhead while maximize your upside.

What it looks like…

  • 01

    Scalable to grow with demand both instantaneously and over time as the business grows. Capable of being both scaled-up and scaled-out.

  • 02

    Fault-tolerant so that services won’t fall over and die if a component fails. Resilient by design, so that services self-heal without human intervention.

  • 03

    Highly-available components must never never go offline.

  • 04

    Flexible enough to support any class of application that you might need to run now or in the future.

Automation

Write Infrastructure-as-Code and eliminate error-prone manual operations.

How it works…

  • 01

    Reproducible 100% of the time. Spin up as many environments as needed and treat them all the same.

  • 02

    Treat everything as “Infrastructure as Code” for more manageable environments. Reduce the human element wherever possible.

  • 03

    Lifecycle Management strategies ensure all components can be upgraded without major disruptions.

  • 04

    Simple – anyone should be able to do it.

Testing

Continually test every change made to your infrastructure and ensure all systems are go.

Confidence that it works…

  • 01

    Infrastructure as code means it can be tested as code.

  • 02

    Identify problems before they get into production. Run identical environments to eliminate headaches.

  • 03

    Enable more engineers to contribute to the infrastructure without risking instability.

  • 04

    Improve overall stability by catching problems early. Treat every problem as an opportunity to eliminate future headaches.

CI/CD

Continuous Integration and Continuous Delivery works seamlessly with Kubernetes to ensure that your software can be reliably released at any time and without downtime.

Easy deployments…

  • 01

    Leverage CircleCI or Jenkins to build and test every commit. Know exactly which commit broke the build every time.

  • 02

    Deploy exactly what was tested to any cluster using immutable containers.

  • 03

    Easy rollbacks when things don’t work as expected. Just revert back to the previous deployment without bending over backwards.

  • 04

    Zero downtime, rolling deployments are accomplished automatically by Kubernetes.

Mentorship

Foster an engineering culture that fuses ops and dev by cross-training engineers to achieve maximum productivity and complete business continuity.

Foster a DevOps culture…

  • 01

    DevOps involves constant cross-training of engineers to ensure business continuity is achieved at the human-level.

  • 02

    Peer Review / Code Review

  • 03

    Best Practices exist to ensure hard lessons can be learned the easily.

  • 04

    Cloud Technologies are evolving at an astonishing rate. Get help staying on top of the latest & greatest tech without getting overwhelmed.

Security

Implement a strategy that is baked in to the DNA of the organization that addresses both technological attack vectors and social engineering.

Protect your business…

  • 01

    Cloud security involves hardening all components, restricting access with SSO/MFA, and having a birds eye view of everything going on in order to quickly remediate any incident.

  • 02

    On-prem security is just as important as cloud security. Protect your intellectual property (IP) from being compromised. Lock down laptops, wifi, and physical access.

  • 03

    Safe practices for exchanging sensitive information are essential. A company is only as secure as its weakest link.

  • 04

    Secrets management ensures there’s a formal process for storing, securing and rotating passwords and keys. Well designed solutions help ensure your company will not be tomorrow’s headline news.

Logging

Aggregate and report on logs collected from all services across all machines.

Visibility into all services…

  • 01

    Collect and ship logs somewhere so they can be reported on.

  • 02

    Reporting on logs requires visualization of events because that’s the only way to make sense of mounds of data.

  • 03

    Auditing is the on-going process of surfacing anomalous events happening across all systems by combing through logs that are centrally aggregated to a log store like Splunk, Sumologic or Kibana/ElasticSearch.

  • 04

    Integrate with monitoring and alerting so that critical events are not lost.

Monitoring

Monitor everything that your organization depends on to meet SLAs, which means keeping an eye on both internal and external services.

System-wide overview…

  • 01

    Dashboards provide an overview of everything at a glace and provide the necessary transparency across departments. Get everyone on the same page and working towards the same goals by giving them the insights they need to do it.

  • 02

    KPIs provide the benchmarks for success. They provide a concrete indicator when things are working or broken. Alert based on thresholds instead of discrete events.

  • 03

    Internal services are monitored for both availability and correctness. Kill nagios. Use datadog and never tear your hair out again understanding what’s going on.

  • 04

    External services are just as integral to the performance of your product as internal ones. Monitor all dependencies as if they were your own. Escalate before their problems become your.

Alerting

Generate actionable notifications that escalate only when it matters to On-Call Engineers .

What We’re Looking For

  • 01

    SLAs are formed from KPIs and are the basis for alerts. Internal SLAs ensure that incidents are resolved before it affects your customers.

  • 02

    Escalation involves notifying the appropriate people at the right time the way they want to be notified. It also means that when someone cannot be reached, that contingencies kick-in.

  • 03

    Remediation procedures should be documented along side the alerts in the form of a knowledge base. This ensures on-call engineers (OCEs) are never left hanging.

  • 04

    Relevant alerts go directly to key stakeholders. Avoid alert overload by tracking KPIs instead of individual checks.