To deepen my Infrastructure-as-Code (IaC) and automation skills, I moved beyond my initial projects with Terraform. While Terraform is foundational for provisioning infrastructure—like the Drupal container hosting this article—it's only part of the story. The next logical step was incorporating Ansible for configuration management and wrapping it all in a GitOps workflow.
This project was an opportunity to significantly level-up my DevOps expertise by building an end-to-end CI/CD (continuous integration/continuous delivery) pipeline with GitHub Actions. I chose to deploy a full-featured, Grafana-based observability stack, providing a practical, industry-relevant platform for service health monitoring and metrics analysis.
The Technology Stack
Instead of a managed container service, I provisioned a dedicated Azure Virtual Machine to have full control over the environment. All services run in Docker containers and are orchestrated to work together seamlessly from the moment of deployment.
The stack includes:
- Grafana: For data visualization and dashboards, with Microsoft Entra ID OAuth integration for Role-Based Access Control (RBAC).
- Prometheus: For metrics collection, using
node-exporterfor host metrics andcAdvisorfor container metrics. - Loki: For log aggregation, paired with
Promtailto collect logs from all services. - Traefik: A modern reverse proxy that handles SSL termination and automated certificate management via Let's Encrypt.
- Forward Auth: A single sign-on (SSO) middleware that secures the Traefik dashboard and Prometheus endpoints with Entra ID.
The result is a suite of services that are containerized, SSL-secured, accessible via unique subdomains, and protected by centralized SSO.
A True GitOps Workflow
The core of this project is a complete GitOps workflow where the main branch serves as the single source of truth for the entire infrastructure. This is managed by two complementary GitHub Actions pipelines:
Validation Pipeline (validate.yml)
- Trigger: Runs on every pull request targeting the
mainbranch. - Actions:
- Performs Terraform formatting (
fmt), validation (validate), and planning (plan). - Runs Trivy security scans to detect IaC misconfigurations.
- Posts the detailed Terraform plan as a comment on the pull request for review.
- Performs Terraform formatting (
- Environment: Uses a dedicated
grafana-devTerraform workspace to ensure isolation from production.
Deployment Pipeline (deploy.yml)
- Trigger: Runs automatically upon a merge to the
mainbranch. - Actions:
- Provisions the infrastructure using Terraform (
apply). - Configures the virtual machine and services using Ansible.
- Deploys the application containers using Docker Compose.
- Provisions the infrastructure using Terraform (
- Environment: Uses the
grafana-prodworkspace.
This separation guarantees that all changes are validated in a development context before they are promoted to production, creating a declarative and version-controlled system. Using the same codebase for both environments ensures parity, meaning what you test is exactly what you deploy.
Beyond Terraform: Configuration with Ansible
Opting for a VM over a managed container service created the perfect opportunity to leverage Ansible's strengths. While Terraform provisions the cloud resources (the VM, networking, firewall rules), Ansible handles the fine-grained configuration of the machine itself.
Key responsibilities for Ansible in this project included:
- Installing Docker and configuring the Docker daemon for optimal performance.
- Managing complex configuration files for each service using Jinja2 templates for environment-specific values.
- Deploying the multi-container application stack with Docker Compose.
- Handling service dependencies to ensure a correct and reliable startup order.
Advanced Authentication Architecture
Securing multiple services with a single identity provider required a more sophisticated authentication design than a simple SSO setup.
Dual-Client Strategy
I implemented a dual-client approach within a single Microsoft Entra ID application registration to handle different authentication needs:
- Forward Auth Client: Secures access to entire applications at the network edge, protecting the Traefik dashboard and Prometheus UI.
- Grafana OAuth Client: Integrates directly with Grafana's native authentication system, enabling powerful, built-in RBAC.
This design provides layered security while maintaining a centralized and easily managed identity source.
Role-Based Access Control (RBAC) in Grafana
To automatically assign roles, I used a JMESPath expression to map a user's Entra ID group membership to a Grafana role upon login.
GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_PATH: \
"contains(groups, '${GRAFANA_OAUTH_ADMIN_GROUP_ID}') && \
'Admin' || 'Viewer'"This line elegantly grants a user Admin rights if they belong to the specified Entra ID group, and Viewer rights otherwise, enabling fine-grained access control.
Foundational Principles: Modularity and Security
Two core principles guided the project's architecture:
- Modularity: The codebase is highly organized and reusable. The Terraform configuration is broken into modules for compute, network, and security components. The Ansible setup is similarly divided into distinct roles for setting up Docker and deploying the observability stack. This makes the system easier to maintain, scale, and troubleshoot.
- Security by Design: Security was a primary consideration from the start. All secrets, like API keys and client IDs, are managed securely through GitHub repository secrets and passed as environment variables—they are never hard-coded. Furthermore, by centralizing Identity and Access Management (IAM) with Entra ID, all services are protected behind a robust SSO barrier.
From Code to Cloud: Key Takeaways
This project successfully integrates a diverse set of powerful DevOps tools into a single, cohesive system. The result is more than just a running application; it's a fully automated, observable, and secure platform where infrastructure and configuration are managed entirely through Git. This GitOps approach provides declarative versioning, peer review for all changes, and a complete audit trail, dramatically improving reliability and reducing manual error.
Building this stack was a fantastic learning experience that solidified my understanding of modern cloud-native operations. For a detailed look at the Terraform modules, Ansible playbooks, and GitHub Actions workflows, feel free to checkout my GitHub repository.