DevOps Glossary

1

12-Factor App

The 12-Factor App is a methodology for building modern, scalable, and maintainable software-as-a-service applications with best practices for deployment and operations.

A

Account vending automates the provisioning of new cloud accounts with baseline security, networking, and compliance controls, enabling teams to get production-ready accounts in minutes instead of weeks.

Agile

Agile is an iterative approach to software development emphasizing short cycles, continuous feedback, and adapting to change over following rigid plans, forming the cultural foundation for DevOps practices.

Amazon CloudFront

Amazon CloudFront is AWS's content delivery network (CDN) that caches and serves content from edge locations worldwide, reducing latency for static assets and API responses.

Amazon DynamoDB

Amazon DynamoDB is a serverless NoSQL database commonly used for Terraform state locking, session stores, and high-throughput workloads that need single-digit millisecond latency.

Amazon EC2

Amazon Elastic Compute Cloud (EC2) provides resizable virtual machines in AWS, serving as the foundation for workloads that need full OS-level control beyond what containers or Lambda offer.

Amazon ECR

Amazon Elastic Container Registry (ECR) is a managed Docker container registry that stores, manages, and deploys container images, typically paired with ECS or EKS for production workloads.

Amazon ECS

Amazon Elastic Container Service (ECS) is AWS's managed container orchestration service that runs Docker containers without requiring you to operate your own Kubernetes control plane.

Amazon EKS

Amazon Elastic Kubernetes Service (EKS) is AWS's managed Kubernetes offering that handles the control plane so your team can focus on deploying workloads instead of managing etcd and API servers.

Amazon GuardDuty

Amazon GuardDuty is a managed threat detection service that continuously monitors your AWS accounts and workloads for malicious activity using machine learning and threat intelligence feeds.

Amazon Inspector

Amazon Inspector is an automated vulnerability management service that continuously scans AWS workloads for software vulnerabilities and unintended network exposure.

Amazon RDS

Amazon Relational Database Service (RDS) is a managed database service supporting PostgreSQL, MySQL, and other engines with automated backups, patching, and multi-AZ failover.

Amazon Route 53

Amazon Route 53 is AWS's DNS service that routes traffic globally with health checks and failover, serving as the entry point for multi-account architectures managed by Terraform.

Amazon S3

Amazon Simple Storage Service (S3) is AWS's object storage, commonly used for Terraform state backends, static assets, logs, and backup storage across multi-account architectures.

API Gateway

An API Gateway sits in front of your backend services to handle request routing, authentication, rate limiting, and protocol translation for REST, GraphQL, or WebSocket APIs.

Apply After Merge

Apply after merge is a GitOps deployment pattern where Terraform apply runs automatically after a pull request is merged, ensuring main always reflects the actual state of infrastructure.

Approval Gate

An approval gate is a checkpoint in a deployment pipeline that requires explicit human approval before proceeding, enforcing separation of duties for production infrastructure changes.

Atlantis

Atlantis is an open-source tool that automates Terraform plan and apply through pull request comments, enabling GitOps-style infrastructure workflows with team review and approval.

Atmos

Atmos is an open-source framework by Cloud Posse for managing Terraform at scale using stacks, components, and inheritance to organize complex multi-account AWS environments.

Audit Trail

An audit trail is an immutable record of who changed what, when, and why. Git commit history, CloudTrail logs, and Terraform plan outputs all contribute to the audit trail compliance frameworks require.

Auto-Scaling

Auto-scaling automatically adjusts the number of compute resources based on demand, adding capacity during traffic spikes and removing it during quiet periods to optimize cost and performance.

Availability Zone

An availability zone is a physically separate data center within a cloud region, with independent power and networking. Distributing resources across AZs provides resilience against facility failures.

AWS CloudFormation

AWS CloudFormation is AWS's native infrastructure as code service that provisions resources from JSON or YAML templates, often compared with Terraform for multi-cloud flexibility.

AWS CloudTrail

AWS CloudTrail records every API call made in your AWS accounts, providing the audit trail that compliance frameworks like SOC 2 require to prove who did what and when.

AWS Config

AWS Config is a service that continuously records all resource changes in your AWS accounts, providing the audit trail your compliance auditor needs to verify that controls are operational.

AWS Control Tower

AWS Control Tower automates the setup of a multi-account AWS landing zone with guardrails, account provisioning, and centralized logging following AWS best practices.

AWS IAM

AWS Identity and Access Management (IAM) controls who can do what in your AWS accounts through users, roles, and policies that enforce least-privilege access at every layer.

AWS KMS

AWS Key Management Service (KMS) creates and controls encryption keys used to protect data at rest and in transit across your AWS services, a core requirement for SOC 2 and HIPAA.

AWS Lambda

AWS Lambda runs your code in response to events without provisioning servers, charging only for compute time consumed, making it ideal for event-driven automation and lightweight APIs.

AWS Multi-Account

AWS multi-account architecture is a strategy of using multiple AWS accounts to isolate workloads, enforce security boundaries, and implement governance at scale using AWS Organizations.

AWS Organizations

AWS Organizations lets you centrally manage multiple AWS accounts with consolidated billing, service control policies, and organizational units for governance at scale.

AWS Secrets Manager

AWS Secrets Manager stores, rotates, and retrieves database credentials, API keys, and other secrets, eliminating hardcoded secrets in your application code and Terraform configurations.

AWS Security Hub

AWS Security Hub aggregates security findings from GuardDuty, Config, Inspector, and third-party tools into a single pane of glass for centralized security posture management across accounts.

AWS Systems Manager Parameter Store

AWS Systems Manager Parameter Store provides hierarchical storage for configuration data and secrets, often used alongside Terraform to manage environment-specific values across accounts.

AWS Transit Gateway

AWS Transit Gateway acts as a central hub that connects VPCs and on-premises networks through a single gateway, simplifying network topology in multi-account AWS architectures.

AWS WAF

AWS Web Application Firewall (WAF) protects applications from common web exploits like SQL injection and cross-site scripting by filtering HTTP requests at the edge.

AWS Well-Architected Framework

The AWS Well-Architected Framework provides best practices across six pillars (operational excellence, security, reliability, performance, cost optimization, and sustainability) for evaluating cloud workloads.

B

Bare Metal

Bare metal refers to physical servers without a hypervisor or virtualization layer, offering maximum performance for workloads that require direct hardware access.

Bastion Host

A bastion host is a hardened server that provides secure SSH or RDP access to resources in private subnets, acting as a controlled entry point. SSM Session Manager is the modern alternative.

BeyondCorp

BeyondCorp is Google's zero-trust security model that shifts access controls from the network perimeter to individual users and devices, eliminating the need for traditional VPNs.

Blameless Culture

Blameless culture is the practice of focusing on systemic improvements rather than individual fault when incidents occur, creating psychological safety that leads to honest postmortems and better prevention.

Blast Radius

Blast radius is the scope of impact from a failed change — decomposing infrastructure into smaller components limits the blast radius so a bad apply only affects one service, not everything.

Blue-Green Deployment

A blue-green deployment runs two identical environments (blue and green), routing all traffic to one while the other is updated. Switching traffic provides instant rollback if the new version fails.

Branch Protection

Branch protection rules prevent direct pushes to critical branches and enforce requirements like pull request reviews, status checks, and signed commits before code can be merged.

Branching Strategy

A branching strategy defines how a team organizes Git branches for development, review, and releases. Common strategies include GitHub Flow, Git Flow, and trunk-based development.

C

Canary Deployment

A canary deployment routes a small percentage of traffic to the new version while monitoring for errors, gradually increasing traffic if metrics stay healthy before full rollout.

CDN

A CDN (Content Delivery Network) caches and serves content from geographically distributed edge locations, reducing latency and improving performance for users worldwide.

Change Freeze

A change freeze is a defined period during which no infrastructure or application changes are deployed, typically enforced during high-traffic seasons, audits, or after error budget exhaustion.

Change Management

Change management is the process of controlling, approving, and tracking infrastructure changes — when integrated with Git, every pull request becomes a change ticket with built-in approval workflows.

Chaos Engineering

Chaos engineering deliberately introduces controlled failures into production systems to verify resilience, exposing weaknesses before real outages do. Tools like Chaos Monkey pioneered this discipline.

Child Module

A child module is a reusable Terraform module called from a root module to encapsulate a specific set of resources, enabling composition and code reuse across configurations.

CI/CD

CI/CD (Continuous Integration/Continuous Delivery) is the combined practice of automatically building, testing, and preparing code for release through an automated pipeline.

CI/CD (Continuous Integration / Continuous Delivery)

CI/CD combines continuous integration and continuous delivery to automate the build, test, and deployment pipeline, enabling rapid and reliable software releases.

CIDR

CIDR (Classless Inter-Domain Routing) notation defines IP address ranges for VPCs and subnets. Planning CIDR blocks carefully prevents address conflicts when VPCs need to peer across accounts.

Circuit Breaker

A circuit breaker is a resilience pattern that stops making calls to a failing downstream service, preventing cascading failures and giving the failing service time to recover.

CIS Benchmark

A CIS Benchmark is a set of prescriptive security configuration guidelines published by the Center for Internet Security for hardening operating systems, cloud services, and applications.

Cloud Center

A cloud center (or cloud center of excellence) is an organizational team responsible for establishing cloud strategy, governance, best practices, and enabling cloud adoption across the enterprise.

Cloud Framework

A cloud framework is a structured set of guidelines, best practices, and tools—such as the AWS Well-Architected Framework—for designing and operating reliable cloud workloads.

Cloud Native

Cloud native is an approach to building and running applications that fully exploits cloud computing advantages, including containers, microservices, and declarative infrastructure.

Cloud Operating System

A cloud operating system is a platform layer that manages distributed cloud resources, providing abstraction over physical infrastructure for workload scheduling and resource allocation.

Cloud Platform

A cloud platform is a suite of cloud computing services offered by a provider—such as AWS, Azure, or GCP—that enables organizations to build, deploy, and manage applications and infrastructure.

Cloud Standards

Cloud standards are industry specifications and protocols that ensure interoperability, portability, and security across cloud computing services and providers.

Cloud Technology

Cloud technology encompasses the hardware, software, and services that enable on-demand delivery of computing resources over the internet, including IaaS, PaaS, and SaaS.

Cluster

A cluster is a group of interconnected computers or servers that work together as a unified system to provide high availability, load balancing, and scalable computing resources.

Code Review

Code review is the practice of having peers examine proposed changes before merging, catching bugs, enforcing standards, and spreading knowledge across the team through pull request discussions.

Codefresh

Codefresh is a CI/CD platform built for Kubernetes and GitOps workflows, providing Docker-native build pipelines and Argo CD integration for application and infrastructure delivery.

Compliance Automation

Compliance automation codifies regulatory controls into automated checks that run in CI/CD pipelines, replacing manual audit spreadsheets with continuous, provable compliance through infrastructure as code.

Component

A component in Atmos is an independently deployable unit of infrastructure — a root module with its own state file that a team can plan, apply, and manage without affecting other components.

Component Instance

A component instance is a specific deployment of an Atmos component in a particular stack — the same component can be deployed multiple times with different configurations across environments.

Configuration Drift

Configuration drift occurs when the actual state of infrastructure diverges from the desired state defined in code, caused by manual changes, failed applies, or out-of-band modifications.

Conformance Pack

A Conformance Pack is a collection of AWS Config rules and remediation actions bundled together to enforce a compliance framework like CIS, PCI DSS, or SOC 2 across your accounts.

Container Registry

A container registry stores and distributes Docker container images, with options ranging from managed services like ECR and Docker Hub to self-hosted registries like Harbor.

Containers as a Service (CaaS)

Containers as a Service (CaaS) is a cloud service model that provides container orchestration and management as a managed service, abstracting the underlying infrastructure.

Continuous Delivery

Continuous delivery (CD) is a software practice where code changes are automatically built, tested, and prepared for release to production, enabling deployments at any time.

Continuous Integration

Continuous integration (CI) is a development practice where developers frequently merge code changes into a shared repository, with each merge automatically triggering build and test pipelines.

D

DaemonSet

A DaemonSet ensures that a copy of a pod runs on every node (or a subset of nodes) in a Kubernetes cluster, commonly used for log collectors, monitoring agents, and network plugins.

Declarative

Declarative infrastructure means specifying the desired end state and letting the tool figure out how to get there — Terraform, CloudFormation, and Kubernetes manifests are all declarative.

Defense in Depth

Defense in depth layers multiple security controls so that if one fails, others still protect the system: network segmentation, IAM policies, encryption, and monitoring all working together.

Desired State

Desired state is the configuration that describes what your infrastructure should look like — in GitOps, Git is the system of record for the desired state, and tools reconcile reality to match.

Developer Experience

Developer experience (DX) encompasses the tools, processes, and workflows that developers interact with daily, aiming to minimize friction and maximize productivity.

DevOps

DevOps is a set of practices that combines software development and IT operations to shorten the development lifecycle while delivering features, fixes, and updates frequently and reliably.

DevSecOps

DevSecOps integrates security practices into every phase of the software development lifecycle, making security a shared responsibility rather than an afterthought.

Disaster Recovery

Disaster recovery (DR) is the strategy and processes for restoring systems after catastrophic failures, defined by RTO (how fast you recover) and RPO (how much data you can afford to lose).

DNS

DNS (Domain Name System) translates human-readable domain names into IP addresses, serving as the foundational layer for routing traffic to your applications and services.

Docker

Docker is a platform for building, shipping, and running applications in lightweight, isolated containers that package code with all its dependencies.

DORA Metrics

DORA (DevOps Research and Assessment) metrics are four key measures of software delivery performance: deployment frequency, lead time for changes, change failure rate, and mean time to recovery.

Drift Detection

Drift detection is the process of identifying when actual infrastructure state has diverged from the desired state defined in code, enabling remediation before issues occur.

E

Encryption

Encryption is the process of converting plaintext data into an unreadable format using cryptographic algorithms, protecting data confidentiality at rest and in transit.

Error Budget

An error budget is the amount of unreliability your SLO permits. If your SLO is 99.9% uptime, your error budget is 0.1% downtime. When the budget is exhausted, you freeze features and fix reliability.

Event-Driven Architecture

Event-driven architecture decouples services by communicating through events rather than direct calls, enabling asynchronous processing and loose coupling between system components.

F

Failover

Failover is the automatic switch from a failed primary system to a standby backup, ensuring continuity of service. Multi-AZ databases, DNS health checks, and load balancers all implement failover.

Fear, Uncertainty, and Doubt (FUD)

Fear, Uncertainty, and Doubt (FUD) is a strategy of spreading negative or misleading information to influence decision-making, often encountered in technology vendor evaluations.

Feature Branch

A feature branch isolates work on a specific feature or fix from the main branch, enabling parallel development and code review through pull requests before merging.

Feature Flag

A feature flag is a runtime toggle that controls whether a feature is active for users, enabling teams to deploy code to production without exposing it and to roll back without redeploying.

FedRAMP

FedRAMP (Federal Risk and Authorization Management Program) standardizes security assessment for cloud services used by U.S. federal agencies, requiring rigorous controls and continuous monitoring.

FinOps

FinOps is the practice of bringing financial accountability to cloud spending through real-time cost visibility, optimization, and collaboration between engineering, finance, and business teams.

G

GDPR

GDPR (General Data Protection Regulation) is the EU's data protection law that governs how organizations collect, store, and process personal data, with strict requirements for consent and data residency.

Geodesic

Geodesic is Cloud Posse's Docker-based cloud automation shell that packages Terraform, AWS CLI, kubectl, and other tools into a consistent, reproducible development environment.

Git Flow

Git Flow is a branching model with long-lived develop and main branches plus feature, release, and hotfix branches. More structured than GitHub Flow, but adds complexity for teams doing continuous delivery.

GitHub Actions

GitHub Actions is a CI/CD platform built into GitHub that automates build, test, and deployment workflows using YAML-defined pipelines triggered by repository events like pushes and pull requests.

GitHub Flow

GitHub Flow is a lightweight branching model where you create a branch, open a pull request, get review, and merge to main. The simplicity makes it ideal for continuous deployment workflows.

GitOps

GitOps is an operational model that uses Git as the single source of truth for declarative infrastructure and application deployment, with automated reconciliation.

Golden Path

A golden path is an opinionated, pre-built workflow that makes the right way to do something the easiest way. Platform teams provide golden paths for common tasks like deploying a new service.

Golden Source of Truth (GSOT)

Golden Source of Truth (GSOT) is the authoritative, single reference point for a particular set of data or configuration, ensuring consistency across all systems.

H

Hardware as a Service (HaaS)

Hardware as a Service (HaaS) is a model where physical computing hardware is leased or rented from a provider rather than purchased, with maintenance and support included.

HCL

HCL (HashiCorp Configuration Language) is the declarative language used to write Terraform configurations, designed to be human-readable while remaining machine-parseable.

Helm

Helm is a package manager for Kubernetes that simplifies application deployment by bundling Kubernetes manifests into reusable, versioned charts.

Helmfile

Helmfile is a declarative tool for managing multiple Helm releases as code, letting you define all your Kubernetes applications and their values in version-controlled YAML files.

High Availability

High availability (HA) is the design of systems to remain operational with minimal downtime, achieved through redundancy, failover, and distributing workloads across multiple availability zones.

HIPAA

HIPAA (Health Insurance Portability and Accountability Act) is a U.S. compliance framework that governs the protection of patient health information and drives infrastructure architecture decisions around encryption, access control, and audit logging.

HITRUST

HITRUST is a certifiable security framework that harmonizes requirements from HIPAA, SOC 2, NIST, and other standards into a single assessment, commonly required in healthcare and fintech.

Horizontal Scaling

Horizontal scaling adds more instances or nodes to handle increased load, distributing work across multiple machines rather than making a single machine bigger.

Hybrid Cloud

Hybrid cloud combines on-premises infrastructure with public cloud services, connected through VPN or dedicated links, common in organizations migrating to cloud or with data residency constraints.

I

Idempotent

An idempotent operation produces the same result whether you run it once or multiple times. Terraform apply is idempotent: running it again when nothing changed makes no modifications.

Identity Federation

Identity federation lets users authenticate once through an external identity provider (like Okta or Azure AD) and access multiple systems via SSO, SAML, or OIDC without separate credentials.

Identity-Aware Proxy (IAP)

Identity-Aware Proxy (IAP) is a security service that controls access to cloud applications by verifying user identity and context before granting access, replacing traditional VPNs.

Immutable Infrastructure

Immutable infrastructure is the practice of never modifying running systems — instead, you build new artifacts in CI/CD, promote them through environments, and replace the old ones entirely.

Imperative

Imperative infrastructure means writing step-by-step instructions that execute in order — scripts, AWS CDK with general-purpose languages, and manual runbooks are imperative approaches.

Incident Management

Incident management is the process of detecting, responding to, and resolving production incidents, including communication, escalation, and coordination to minimize user impact.

Infrastructure as a Service (IaaS)

IaaS provides virtualized computing resources (servers, storage, networking) on demand over the internet, with AWS EC2 being the most common example. You manage everything above the hypervisor.

Infrastructure as Code

Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable configuration files rather than manual processes.

Ingress Controller

An ingress controller manages external access to Kubernetes services by processing Ingress resources, handling TLS termination, path-based routing, and load balancing at the cluster edge.

Internal Developer Platform (IDP)

An Internal Developer Platform is a self-service layer built on top of infrastructure that abstracts complexity, giving developers standardized workflows for deploying and managing applications.

Intracloud

Intracloud refers to networking and communication that occurs within a single cloud provider's infrastructure, as opposed to inter-cloud communication between different providers.

ISO 27001

ISO 27001 is an international standard for information security management systems (ISMS) that provides a systematic approach to managing sensitive information through risk assessment and controls.

J

Jenkins

Jenkins is an open-source CI/CD automation server with a massive plugin ecosystem, widely used for build, test, and deployment pipelines before the rise of cloud-native CI/CD platforms.

K

Key Performance Indicators (KPIs)

Key Performance Indicators (KPIs) are measurable values that demonstrate how effectively an organization or team is achieving key business or operational objectives.

Kibana

Kibana is an open-source data visualization and exploration tool for Elasticsearch, providing dashboards, charts, and search capabilities for log and metric analysis.

Kubernetes

Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications across clusters of machines.

L

LAMP Stack

LAMP is a web application stack consisting of Linux, Apache, MySQL, and PHP/Python/Perl—a foundational open-source architecture for building and deploying web applications.

Landing Zone

A landing zone is a pre-configured, security-hardened cloud foundation with identity, networking, logging, and governance wired in before any workload deploys. AWS uses Control Tower, Azure has Landing Zone Accelerator, and GCP provides Cloud Foundation Toolkit.

Least Privilege

Least privilege is the principle of granting users, roles, and services only the minimum permissions needed to perform their function, reducing blast radius when credentials are compromised.

Linux

Linux is an open-source operating system kernel that powers the majority of cloud servers, containers, and embedded systems worldwide.

Load Balancer

A load balancer distributes incoming traffic across multiple backend targets, enabling horizontal scaling, high availability, and health-check-based routing for your applications.

M

Microservices

Microservices is an architectural pattern where an application is composed of small, independently deployable services that communicate via APIs and can be developed by autonomous teams.

Microsoft Azure

Microsoft Azure is a comprehensive cloud computing platform offering IaaS, PaaS, and SaaS services for building, deploying, and managing applications across Microsoft's global datacenter network.

Monorepo

A monorepo stores multiple projects, services, or packages in a single Git repository, enabling atomic cross-project changes, shared tooling, and consistent versioning across your codebase.

Multi-Cloud

Multi-cloud is the strategy of using services from multiple cloud providers (AWS, Azure, GCP) to avoid vendor lock-in, optimize for specific capabilities, or meet data residency requirements.

Multi-Factor Authentication (MFA)

MFA requires two or more verification methods to prove identity, combining something you know (password), something you have (device), or something you are (biometric).

Multi-Tenancy

Multi-tenancy is a software architecture where a single instance of an application serves multiple customers (tenants) while keeping their data and configurations isolated.

MySQL

MySQL is an open-source relational database management system widely used for web applications, offering reliable data storage with SQL query support.

N

Nagios

Nagios is an open-source monitoring system that provides comprehensive monitoring of hosts, services, and network infrastructure with alerting and reporting capabilities.

Namespace

A Kubernetes namespace provides logical isolation within a cluster, letting teams share infrastructure while maintaining separate resource quotas, RBAC policies, and network boundaries.

NAT Gateway

A NAT Gateway enables resources in private subnets to access the internet for updates and API calls while remaining unreachable from outside, a standard pattern in secure AWS VPC architectures.

Network Segmentation

Network segmentation divides your infrastructure into isolated network zones using VPCs, subnets, and security groups, limiting lateral movement if an attacker breaches one segment.

NIST

NIST (National Institute of Standards and Technology) publishes cybersecurity frameworks and guidelines that form the foundation for many compliance standards including SOC 2 and FedRAMP.

O

Observability

Observability is the ability to understand the internal state of a system from its external outputs: metrics, logs, and traces. It goes beyond monitoring by enabling you to ask new questions without deploying new code.

On-Call

On-call is a rotation system where engineers are designated to respond to production incidents and alerts, ensuring 24/7 system reliability and rapid incident response.

On-Call Engineer

An on-call engineer is a team member designated to respond to production incidents and alerts outside of normal working hours, ensuring system reliability and uptime.

Open Source

Open source is software with publicly available source code that anyone can inspect, modify, and distribute. Cloud Posse maintains 200+ open-source Terraform modules and tools.

OpenID Connect (OIDC)

OpenID Connect is an identity layer on top of OAuth 2.0 used for federated authentication, commonly configured in AWS for GitHub Actions to assume IAM roles without storing long-lived credentials.

OpenStack

OpenStack is an open-source cloud computing platform that provides infrastructure as a service (IaaS) for deploying and managing compute, storage, and networking resources in private clouds.

P

PCI DSS

PCI DSS (Payment Card Industry Data Security Standard) is a compliance standard that prescribes security controls for any organization that stores, processes, or transmits cardholder data.

Pingdom

Pingdom is a website monitoring service that checks availability and performance of websites and servers, providing alerts when downtime or performance degradation is detected.

Plan Before Merge

Plan before merge is a Terraform workflow pattern where terraform plan runs on every pull request so reviewers can see exactly what infrastructure changes will occur before approving.

Platform as a Service (PaaS)

Platform as a Service (PaaS) provides a managed environment for developing, running, and managing applications without the complexity of building and maintaining infrastructure.

Platform Engineering

Platform engineering is the discipline of designing and building self-service internal developer platforms that abstract away infrastructure complexity and accelerate software delivery.

Pod

A pod is the smallest deployable unit in Kubernetes, consisting of one or more containers that share networking and storage. Pods are ephemeral and managed by controllers like Deployments.

Policy as Code

Policy as code defines compliance rules and security guardrails in machine-readable files (using OPA, Sentinel, or Checkov) that automatically validate infrastructure changes in CI/CD.

Portability

Portability is the ability to move applications, data, and workloads between different cloud providers or environments with minimal effort and without vendor-specific dependencies.

Postmortem

A postmortem is a blameless analysis conducted after an incident to identify root causes, contributing factors, and action items that prevent recurrence. The emphasis on blamelessness encourages honest reporting.

Private Cloud

A private cloud is a cloud computing environment dedicated to a single organization, providing greater control, security, and customization over shared public cloud resources.

Progressive Delivery

Progressive delivery gradually exposes new releases to wider audiences using techniques like canary deployments, feature flags, and traffic shifting to reduce risk.

Public Cloud

A public cloud is a computing environment where cloud infrastructure and services are shared across multiple organizations and accessed over the public internet on a pay-per-use basis.

Pull Request

A pull request proposes changes from one branch to another, providing a review surface for code, infrastructure, and documentation changes before they merge into the main branch.

R

Recovery Point Objective (RPO)

Recovery Point Objective is the maximum acceptable data loss measured in time. A 1-hour RPO means you need backups no older than 1 hour to meet your recovery goals.

Recovery Time Objective (RTO)

Recovery Time Objective is the maximum acceptable time to restore a system after failure. A 4-hour RTO means you need your infrastructure back within 4 hours of an outage.

Release Engineering

Release engineering is the discipline of building, packaging, and delivering software reliably, encompassing CI/CD pipelines, artifact management, versioning, and deployment automation.

Remote State

Remote state stores Terraform state in a shared backend (S3, Terraform Cloud, etc.) rather than locally, enabling team collaboration and cross-component data sharing via state outputs.

Role-Based Access Control (RBAC)

RBAC assigns permissions to roles rather than individual users, so access is determined by what role you hold. In Kubernetes and AWS, roles map to specific API actions on resources.

Rollback

A rollback reverts a deployment to a previously known-good state when a release introduces bugs or failures, a critical capability for maintaining production reliability.

Root Module

A root module is the top-level Terraform configuration that serves as the entry point for terraform plan and apply — when a single root module manages everything, you have a Terralith.

Runbook

A runbook is a documented procedure for handling specific operational tasks or incidents, providing step-by-step instructions that reduce response time and enable consistent execution.

S

Scheduler

A scheduler is a system component that allocates resources and determines when and where workloads run, such as Kubernetes scheduler placing pods on appropriate cluster nodes.

Secrets Management

Secrets management is the practice of securely storing, rotating, and accessing credentials, API keys, and certificates so they never appear in source code, environment variables, or logs.

Self-Service Infrastructure

Self-service infrastructure enables developers to provision environments, deploy applications, and manage resources without filing tickets or waiting for ops teams, a core goal of platform engineering.

Semantic Versioning

Semantic versioning (SemVer) uses MAJOR.MINOR.PATCH numbers to communicate the nature of changes: breaking changes bump major, new features bump minor, and bug fixes bump patch.

Separation of Concerns

Separation of concerns is a design principle where each component handles one responsibility. In Atmos, this means each component manages one piece of infrastructure with its own state.

Separation of Duties

Separation of duties ensures that no single person can both make and approve a change, a control requirement in SOC 2 and SOX that maps naturally to pull request approval workflows.

Serverless

Serverless computing executes code on demand without provisioning servers, with the cloud provider handling scaling and infrastructure. AWS Lambda, API Gateway, and DynamoDB are common serverless building blocks.

Service Control Policy (SCP)

A Service Control Policy is an AWS Organizations guardrail that sets the maximum permissions for accounts in your organization, preventing actions even if IAM policies allow them.

Service Level Agreement (SLA)

A Service Level Agreement (SLA) is a formal contract between a service provider and customer that defines expected service levels, uptime guarantees, and remedies for failures.

Service Level Indicator (SLI)

A Service Level Indicator is a quantitative measure of a service's behavior, like request latency, error rate, or throughput. SLIs are the raw metrics that SLOs set targets against.

Service Level Objective (SLO)

A Service Level Objective is a target value for a service level indicator, like 99.9% availability or p99 latency under 200ms. SLOs set the threshold that triggers your error budget.

Service Mesh

A service mesh is an infrastructure layer that manages service-to-service communication with features like mutual TLS, traffic splitting, and observability, typically using sidecar proxies.

Service-Oriented Architecture

Service-oriented architecture decomposes a system into independent services that communicate over well-defined interfaces — the same principle applied to Terraform yields componentized infrastructure.

Shift Left

Shift left moves testing, security, and quality checks earlier in the development pipeline. Instead of finding problems in production, you catch them in CI, code review, or even at the IDE level.

Site Reliability Engineering (SRE)

Site Reliability Engineering applies software engineering principles to operations, using SLIs, SLOs, error budgets, and automation to maintain reliability while shipping features at velocity.

Slack

Slack is a cloud-based messaging platform for teams that supports channels, direct messages, integrations, and workflows to streamline workplace communication.

Snowflake Server

A snowflake server is a hand-configured machine that's impossible to reproduce reliably because its state diverged from any documented or automated configuration. Infrastructure as code eliminates snowflakes.

SOC 2 Compliance

SOC 2 is an attestation — not a certification — where an auditor verifies that your security controls are real, operational, and repeatable: you say what you do, and you do what you say.

Software as a Service (SaaS)

SaaS delivers fully managed software applications over the internet on a subscription basis. Users access the service without managing infrastructure, updates, or maintenance.

SOX Compliance

Sarbanes-Oxley (SOX) compliance requires publicly traded companies to maintain auditable internal controls over financial reporting, driving infrastructure requirements for change management and access control.

Spacelift

Spacelift is a Terraform CI/CD platform that provides plan previews, policy enforcement, drift detection, and approval workflows as a managed service alternative to self-hosted Atlantis.

Stack

A stack in Atmos is a YAML configuration file that declares which components to deploy, with what settings, in a specific environment and account — the glue between components and infrastructure.

Staging Environment

A staging environment mirrors production for testing changes before they go live, providing a safe space to validate infrastructure, application behavior, and integrations.

State Locking

State locking prevents concurrent Terraform operations from corrupting the state file by acquiring a lock before writes, typically using DynamoDB or a database backend.

StatefulSet

A StatefulSet is a Kubernetes controller for managing stateful applications that require stable network identities, persistent storage, and ordered deployment and scaling.

Subnet

A subnet is a range of IP addresses within a VPC. Public subnets have internet-routable addresses while private subnets are isolated, forming the network architecture of your cloud environment.

Synthetic Monitoring

Synthetic monitoring uses scripted transactions to simulate user interactions with applications, proactively detecting performance issues and outages before real users are affected.

T

Tagging Strategy

A tagging strategy defines consistent labels applied to cloud resources for cost allocation, ownership tracking, compliance, and automation. Without tags, you cannot answer who owns what or what anything costs.

Technical Debt

Technical debt is the accumulated cost of shortcuts and deferred improvements in code and infrastructure that slow future development and increase the risk of failures.

Terraform

Terraform is an open-source infrastructure as code tool by HashiCorp that lets you define and provision cloud infrastructure using a declarative configuration language called HCL.

Terraform Apply

Terraform apply executes the changes previewed in a plan, creating, updating, or destroying real infrastructure resources to match your declared configuration.

Terraform Backend

A Terraform backend determines where state is stored and how operations are executed. Remote backends like S3 enable team collaboration with state locking and encryption.

Terraform Cloud

Terraform Cloud is HashiCorp's managed platform for Terraform that provides remote state, plan/apply workflows, and team collaboration features with a SaaS delivery model.

Terraform Modules

Terraform modules are reusable, self-contained packages of Terraform configuration that encapsulate related resources into a single logical unit for consistent infrastructure provisioning.

Terraform Plan

Terraform plan previews what changes Terraform will make before applying them, showing resources to be created, modified, or destroyed. It is the foundation of safe infrastructure workflows.

Terraform Provider

A Terraform provider is a plugin that interfaces with a specific API (AWS, GitHub, Datadog, etc.), translating your HCL configuration into actual API calls to create and manage resources.

Terraform State

Terraform state is a file that maps real-world resources to your configuration, tracks metadata, and enables Terraform to determine what changes need to be applied to reach the desired state.

Terraform Workspace

A Terraform workspace provides a separate state file for each workspace name, enabling state isolation. Atmos components are a more scalable alternative for managing multiple environments.

Terragrunt

Terragrunt is a thin wrapper around Terraform that reduces code duplication and manages remote state configuration. Atmos is Cloud Posse's alternative with deeper component composition.

Terralith

A Terralith is a Terraform monolith — one giant root module that manages all your infrastructure. Simple to start, but hits scaling limits with long plan times, API rate limits, and governance challenges.

Toil

Toil is repetitive, manual, automatable work that scales linearly with service growth. SRE teams actively measure and reduce toil to free engineering time for projects that eliminate future toil.

Trunk-Based Development

Trunk-based development is a branching strategy where all developers commit to a single main branch with short-lived feature branches, reducing merge conflicts and enabling continuous integration.

U

Unit Test

A unit test is an automated test that verifies the correctness of an individual function or component in isolation, providing fast feedback during development.

Z

Zero Trust

Zero Trust is a security model that assumes no implicit trust for any user, device, or network. Every request is authenticated and authorized regardless of where it originates.