Operations 14 min read

From Zero to Production: Ansible Playbook Design Patterns & Best Practices

This guide walks you through building a production‑grade Ansible automation framework—from identifying common manual‑deployment pain points to defining layered architecture, directory conventions, reusable playbook patterns, high‑availability deployments, performance optimizations, monitoring, security hardening, CI/CD integration, and troubleshooting tips—empowering teams to achieve reliable, scalable operations.

Raymond Ops
Raymond Ops
Raymond Ops
From Zero to Production: Ansible Playbook Design Patterns & Best Practices

From Zero to Production: Ansible Playbook Design Patterns & Best Practices

Automation is no longer optional in the cloud‑native era; manual deployments cause night‑time emergencies, configuration drift, scaling delays, and risky rollbacks. This article provides a step‑by‑step guide to constructing a production‑ready Ansible automation system that eliminates those pain points.

Architecture Overview – Core Principles

Four principles shape the design:

Layered Decoupling – separate application, service, and infrastructure concerns.

Environment Isolation – distinct inventory directories for each stage.

Role‑Driven – reusable roles for common components.

Configuration Externalization – variables stored in group_vars and host_vars.

Application Layer   -> Application deployment
Service Layer       -> Middleware service management
Infrastructure Layer-> Infrastructure configuration
inventory/
├── production/   # production environment
├── staging/      # pre‑production
├── development/  # dev environment
└── testing/      # test environment
roles/
├── common/       # base tasks
├── nginx/        # web server
├── mysql/        # database
└── application/  # app‑specific tasks

Directory Structure Best Practice

ansible-ops/
├── ansible.cfg                # global config
├── site.yml                  # entry point
├── inventories/
│   ├── production/
│   │   ├── hosts
│   │   └── group_vars/
│   └── staging/
├── roles/
│   ├── common/
│   ├── nginx/
│   ├── mysql/
│   └── application/
├── playbooks/                # feature playbooks
├── filter_plugins/
├── callback_plugins/
└── vault/                    # encrypted secrets

Core Design Patterns

Pattern 1 – Multi‑Environment Configuration

Problem: Different environments diverge dramatically.

Solution: Store environment‑specific variables in group_vars and select the inventory at runtime.

# inventories/production/group_vars/all.yml
environment: production
db_host: prod-db.example.com
redis_host: prod-redis.example.com
app_replicas: 3

# inventories/staging/group_vars/all.yml
environment: staging
db_host: staging-db.example.com
redis_host: staging-redis.example.com
app_replicas: 1

Sensitive data (passwords, API keys) are encrypted with ansible‑vault:

# Create encrypted file
ansible-vault create inventories/production/group_vars/vault.yml

# Use in a playbook
- name: Deploy application
  template:
    src: app.conf.j2
    dest: /etc/app/app.conf
  vars:
    db_password: "{{ vault_db_password }}"

Pattern 2 – Role Composition

Combine reusable roles to model complex business logic.

# playbooks/web-cluster.yml
- hosts: web_servers
  roles:
    - common        # base setup
    - firewall
    - nginx
    - { role: ssl, when: use_ssl }
    - monitoring

- hosts: db_servers
  roles:
    - common
    - mysql
    - backup

Pattern 3 – Idempotency Guarantee

Ensure repeated runs produce the same state.

- name: Ensure nginx is installed and configured
  block:
    - name: Install nginx
      yum:
        name: nginx
        state: present
    - name: Configure nginx
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        backup: yes
      notify: restart nginx
    - name: Ensure nginx is running
      service:
        name: nginx
        state: started
        enabled: yes
  rescue:
    - name: Handle installation failure
      debug:
        msg: "Nginx installation failed, rolling back..."

Production Case – High‑Availability Deployment

Scenario

3 web servers behind a load balancer

Database master‑slave replication

Redis Sentinel for HA

Automatic health checks and failover

Main Playbook (site.yml)

---
- import_playbook: playbooks/infrastructure.yml
- import_playbook: playbooks/database.yml
- import_playbook: playbooks/cache.yml
- import_playbook: playbooks/application.yml
- import_playbook: playbooks/loadbalancer.yml
- import_playbook: playbooks/monitoring.yml

Application Deployment with Rollback

# playbooks/application.yml
- hosts: web_servers
  serial: 1               # rolling update
  max_fail_percentage: 0   # zero tolerance
  tasks:
    - name: Health check before deployment
      uri:
        url: "http://{{ inventory_hostname }}:{{ app_port }}/health"
        method: GET
        status_code: 200
        delegate_to: localhost
    - name: Deploy application
      include_role:
        name: application
    - name: Health check after deployment
      uri:
        url: "http://{{ inventory_hostname }}:{{ app_port }}/health"
        method: GET
        status_code: 200
        delegate_to: localhost
        retries: 30
        delay: 10
  block:
    - name: Backup current version
      command: cp -r {{ app_path }} {{ app_path }}.backup.{{ ansible_date_time.epoch }}
    - name: Deploy new version
      unarchive:
        src: "{{ app_package }}"
        dest: "{{ app_path }}"
    - name: Restart services
      service:
        name: "{{ item }}"
        state: restarted
      loop: "{{ app_services }}"
  rescue:
    - name: Rollback on failure
      command: |
        rm -rf {{ app_path }}
        mv {{ app_path }}.backup.{{ ansible_date_time.epoch }} {{ app_path }}
    - name: Restart services after rollback
      service:
        name: "{{ item }}"
        state: restarted
      loop: "{{ app_services }}"
    - name: Fail the play
      fail:
        msg: "Deployment failed, rolled back to previous version"

Performance Optimizations

Strategy 1 – Parallel Execution

# ansible.cfg
[defaults]
forks = 50               # number of parallel processes
host_key_checking = False
pipelining = True
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts_cache

Strategy 2 – Conditional Execution

- name: Install nginx only on RedHat when version differs
  yum:
    name: nginx
    state: present
  when:
    - ansible_os_family == "RedHat"
    - nginx_version is not defined or nginx_current_version != nginx_version

Strategy 3 – Batch Operations

- name: Install multiple packages at once
  yum:
    name: "{{ packages }}"
    state: present
  vars:
    packages:
      - nginx
      - redis
      - mysql-server
      - git

Monitoring & Alerting – Observability Design

Prometheus Integration

# roles/monitoring/tasks/main.yml
- name: Install node_exporter
  get_url:
    url: "{{ node_exporter_url }}"
    dest: /tmp/node_exporter.tar.gz

- name: Configure Prometheus targets
  template:
    src: prometheus.yml.j2
    dest: /etc/prometheus/prometheus.yml
  notify: restart prometheus

- name: Setup alerting rules
  template:
    src: alert.rules.yml.j2
    dest: /etc/prometheus/alert.rules.yml

Custom Health Check

- name: Custom health check
  uri:
    url: "http://{{ inventory_hostname }}:{{ app_port }}/health"
    method: GET
    return_content: yes
  register: health_check
  failed_when: health_check.json.status != "ok"
  retries: 3
  delay: 5

Security Best Practices

Vault‑Managed Secrets

# Use Ansible Vault for sensitive information
- name: Deploy with encrypted variables
  template:
    src: database.conf.j2
    dest: /etc/app/database.conf
    mode: '0600'
  vars:
    db_password: "{{ vault_db_password }}"
    api_key: "{{ vault_api_key }}"

File Permission Hardening

- name: Ensure proper file permissions
  file:
    path: "{{ item.path }}"
    mode: "{{ item.mode }}"
    owner: "{{ item.owner }}"
    group: "{{ item.group }}"
  loop:
    - { path: "/etc/ssl/private", mode: "0700", owner: "root", group: "root" }
    - { path: "/var/log/app", mode: "0755", owner: "app", group: "app" }

CI/CD Integration – GitLab Example

# .gitlab-ci.yml
stages:
  - syntax-check
  - deploy-staging
  - deploy-production

ansible-syntax:
  stage: syntax-check
  script:
    - ansible-playbook --syntax-check site.yml
    - ansible-lint playbooks/

deploy-staging:
  stage: deploy-staging
  script:
    - ansible-playbook -i inventories/staging site.yml
  only:
    - develop

deploy-production:
  stage: deploy-production
  script:
    - ansible-playbook -i inventories/production site.yml
  only:
    - master
  when: manual

Debugging Techniques

Run with maximum verbosity: ansible-playbook -vvv site.yml Debug a specific variable:

- debug:
    var: ansible_facts

Pause execution for manual confirmation:

- pause:
    prompt: "Press enter to continue deployment"

Common Issues & Fixes

Issue 1 – SSH Connection Failure

- name: Test connectivity
  ping:
  ignore_errors: yes
  register: ping_result

- debug:
    msg: "Host {{ inventory_hostname }} is unreachable"
  when: ping_result.failed

Issue 2 – Insufficient Privileges

- name: Tasks requiring sudo
  become: yes
  become_user: root
  become_method: sudo

Lessons Learned

Gradual Migration Strategy

Phase 1: Automate infrastructure provisioning.

Phase 2: Automate application deployment.

Phase 3: Automate monitoring and alerting.

Phase 4: Build a full CI/CD pipeline.

Team Collaboration Standards

# Recommended role directory layout
roles/
├── README.md          # role description
├── meta/main.yml      # metadata
├── defaults/main.yml  # default vars
├── vars/main.yml      # role vars
├── tasks/main.yml    # main tasks
├── handlers/main.yml # handlers
├── templates/        # Jinja2 templates
├── files/             # static files
└── tests/            # role tests

Performance Benchmarking

# Measure playbook execution time
time ansible-playbook site.yml

# Analyze task duration distribution
ansible-playbook site.yml --start-at-task="Deploy application"

Future Outlook – Next‑Generation Automation

Ansible Operator: Kubernetes‑native automation.

Event‑Driven Ansible: Reactive automation based on system events.

Ansible Content Collections: Modular distribution of roles, plugins, and modules.

References

GitHub: https://github.com/raymond999999

Gitee: https://gitee.com/raymond9

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Monitoringci/cdautomationDevOpssecurityInfrastructureansible
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.