Network Automation Pipeline Playbook

Oliver White

·10 min read
Network Automation Pipeline Playbook

Great — below is a concrete, production-ready playbook for building a Network Automation pipeline using Ansible + NetBox + GitHub Actions. It’s written as a hands-on blueprint you can follow step-by-step, with example files (playbooks, inventory plugin config, GitHub Actions workflows), best practices (testing, safety, rollbacks), and operational guidance (monitoring, RBAC, audits).

I’ll assume you manage modern network devices (Cisco/Arista/Juniper, Linux routers) and want a GitOps-style workflow where NetBox is the source-of-truth (SoT) and Ansible enforces state.


Network Automation Pipeline Playbook

Stack: NetBox (SoT) → Git repo (configs & playbooks) → GitHub Actions CI → Ansible (NAPALM/Netmiko) → Network devices Goals: safe, auditable, testable, promotable changes across environments (dev → staging → prod)


Contents

  1. Architecture & components
  2. Repo layout (recommended)
  3. NetBox: modelling & integration (SoT)
  4. Ansible: config patterns & examples
  5. Inventory: NetBox dynamic inventory config
  6. GitHub Actions: CI/CD workflows (lint → plan/dry-run → apply)
  7. Testing & validation (ansible-lint, molecule, dry-run, check_mode)
  8. Safety & rollback strategies
  9. Secrets management & security
  10. Observability, auditing & compliance
  11. Deployment checklist & runbook

1 — Architecture (high level)

  • NetBox: authoritative inventory, IPAM, device roles, sites, structured variables (custom fields / device config snippets).
  • Git repo: infrastructure/network holds playbooks, roles, templates, and device config templates; PRs are the change mechanism.
  • GitHub Actions: CI runs lint/tests and produces plan/dry-run output; merging to main triggers staged deploy (with approvals).
  • Ansible Controller: runs in CI (or separate runner) using credentials (stored in Vault/GitHub Secrets); uses NAPALM/Netmiko modules for device config.
  • Monitoring: Prometheus + Grafana + SNMP exporter; NetBox webhooks for change notifications.
  • Audit/tracking: all changes pushed to Git & NetBox change logs; device config backups saved to repo or network config vault.

1network-automation/
2├── ansible.cfg
3├── requirements.txt               # ansible collections (napalm, netbox)
4├── inventories/
5│   └── netbox.yml                # ansible-netbox inventory plugin config (dynamic)
6├── roles/
7│   ├── base_config/
8│   │   ├── tasks/
9│   │   ├── templates/
10│   │   └── vars/
11│   └── vlan_provision/
12│       ├── tasks/
13│       ├── templates/
14│       └── defaults/
15├── playbooks/
16│   ├── site.yml                   # main orchestration playbook
17│   ├── vlan.yml                   # example change playbook
18│   └── backup-configs.yml
19├── templates/
20│   └── device_config.j2
21├── tests/
22│   ├── molecule/                  # optional molecule tests
23│   └── lint/                      # sample test cases
24├── .github/
25│   └── workflows/
26│       ├── ci.yml                 # lint & dry-run
27│       └── deploy.yml             # gated deploy
28└── docs/
29    └── runbook.md

3 — NetBox: modelling & integration (SoT)

NetBox is the single source of truth for:

  • Devices (name, device role, platform, site)
  • Interfaces
  • IPAM (prefixes, IP addresses)
  • VLANs, VRFs, prefix assignments
  • Custom fields for device-level config variables (e.g., device_config_template, os_version, management_ip)
  • Device roles: leaf, spine, router, firewall, switch
  • Platforms: cisco_ios, eos, junos, etc. (used to pick NAPALM driver)
  • Custom Fields (device-level, JSON): config_vars — JSON blob for role-specific vars
  • Tags for environment: env:dev, env:staging, env:prod
  • Secrets / credentials: do not store credentials in NetBox. Use Vault or GitHub Secrets.

Webhooks & automation

  • Configure NetBox webhook to notify CI or run discovery jobs on device creation/changes.
  • Use NetBox change logs for auditing.

4 — Ansible: patterns & example playbook

Key practices

  • Use Ansible collections: napalm, netcommon, network_cli, community.general as needed.
  • Prefer NAPALM or network_cli modules over raw CLI when possible.
  • Use idempotent templates (Jinja2) and named blocks.
  • Use check_mode: true for dry-run planning and NAPALM get_config + compare_config patterns where supported.

ansible.cfg (important defaults)

1[defaults]
2inventory = ./inventories/netbox.yml
3host_key_checking = False
4forks = 20
5timeout = 30
6retry_files_enabled = False
7command_warnings = False
8roles_path = ./roles
9nocows = 1
10stdout_callback = yaml
11deprecation_warnings=False
12
13[ssh_connection]
14pipelining=True

Example playbooks/vlan.yml (adds a VLAN across a set of switches)

1---
2- name: Provision VLAN across edge switches
3  hosts: tag_env_prod:&platform_eos|platform_cisco_ios   # example targeting
4  gather_facts: no
5  connection: network_cli
6
7  vars:
8    vlan_id: 100
9    vlan_name: "users"
10
11  tasks:
12    - name: Ensure VLAN exists (IOS/NXOS)
13      napalm_install_config:
14        hostname: "{{ inventory_hostname }}"
15        config: |
16          vlan {{ vlan_id }}
17            name {{ vlan_name }}
18      when: ansible_network_os in ['ios', 'nxos']
19
20    - name: Configure VLAN on Arista EOS (example)
21      eos_config:
22        lines:
23          - "vlan {{ vlan_id }}"
24          - " name {{ vlan_name }}"
25      when: ansible_network_os == 'eos'

Note: Use napalm_config or napalm_install_config as your device supports. Many devices support candidate commit patterns — use them.

Playbook for config backup (backup-configs.yml)

1- name: Backup device configs
2  hosts: all
3  gather_facts: no
4  connection: network_cli
5  tasks:
6    - name: Get running-config
7      napalm_get:
8        getters:
9          - config
10      register: config
11
12    - name: Save config to repo-like structure
13      copy:
14        content: "{{ config['ansible_facts']['netconf_config']['running'] | default(config['config']) }}"
15        dest: "backups/{{ inventory_hostname }}-{{ ansible_date_time.iso8601 }}.cfg"

5 — Inventory: NetBox dynamic inventory plugin

Use Ansible’s NetBox inventory plugin (or community netbox plugin). Example inventories/netbox.yml:

1plugin: netbox
2api_endpoint: https://netbox.example.com/api/
3token: "{{ lookup('env', 'NETBOX_API_TOKEN') }}"
4validate_certs: false
5group_by:
6  - site
7  - device_role
8  - platform
9compose:
10  ansible_host: primary_ip.address
11  ansible_network_os: platform.slug
12  management_ip: primary_ip.address
13filters:
14  # only include managed devices
15  status: ``active``

Secrets: store NETBOX_API_TOKEN in GitHub Secrets or runner environment — never in repo.


6 — GitHub Actions: CI & CD workflows

CI workflow: lint + dry-run (.github/workflows/ci.yml)

  • Run on PR to main.
  • Steps: checkout, setup python, install deps, ansible-lint, yamllint, run ansible-playbook --syntax-check, run ansible-playbook in --check (dry-run) and capture changed/failed hosts.
1name: Network CI
2
3on:
4  pull_request:
5    branches: [ main ]
6
7jobs:
8  lint_and_dryrun:
9    runs-on: ubuntu-latest
10
11    steps:
12      - uses: actions/checkout@v4
13
14      - name: Set up Python
15        uses: actions/setup-python@v4
16        with:
17          python-version: '3.11'
18
19      - name: Install tools
20        run: |
21          python -m pip install --upgrade pip
22          pip install ansible ansible-lint yamllint pynetbox napalm
23
24      - name: ansible-lint
25        run: ansible-lint -v
26
27      - name: yamllint
28        run: yamllint .
29
30      - name: Syntax check
31        run: ansible-playbook playbooks/vlan.yml --syntax-check
32
33      - name: Dry-run (check mode)
34        env:
35          NETBOX_API_TOKEN: ${{ secrets.NETBOX_API_TOKEN }}
36        run: |
37          ansible-playbook playbooks/vlan.yml --check -i inventories/netbox.yml

Deploy workflow: gated deploy to staging/prod (.github/workflows/deploy.yml)

  • Trigger: merge to main or manual workflow_dispatch.
  • Environments: staging auto deploy, production requires approval (GitHub Environments).
  • Steps: checkout, build artifacts, set up Ansible, fetch digests, run backup, run playbook (no --check), commit config backup artifacts.
1name: Deploy Network Changes
2
3on:
4  workflow_dispatch:
5    inputs:
6      environment:
7        description: 'target env'
8        required: true
9        default: 'staging'
10        type: choice
11        options:
12          - staging
13          - production
14
15jobs:
16  deploy:
17    runs-on: ubuntu-latest
18    environment: ${{ github.event.inputs.environment }}
19
20    steps:
21      - uses: actions/checkout@v4
22
23      - name: Install packages
24        run: |
25          pip install ansible napalm pynetbox
26
27      - name: Backup configs (pre-change)
28        env:
29          NETBOX_API_TOKEN: ${{ secrets.NETBOX_API_TOKEN }}
30        run: |
31          ansible-playbook playbooks/backup-configs.yml -i inventories/netbox.yml
32
33      - name: Apply changes (live run)
34        env:
35          NETBOX_API_TOKEN: ${{ secrets.NETBOX_API_TOKEN }}
36          ANSIBLE_SSH_USER: ${{ secrets.NE_ANSIBLE_USER }}
37          ANSIBLE_SSH_PASS: ${{ secrets.NE_ANSIBLE_PASS }}
38        run: |
39          ansible-playbook playbooks/vlan.yml -i inventories/netbox.yml -e target_env=${{ github.event.inputs.environment }}

Production approval: configure GitHub Environment production with required reviewers.


7 — Testing & validation

  • Static tests: ansible-lint, yamllint, molecule (for roles, use molecule/driver/dokken or docker).
  • Dry-run: Always run ansible-playbook --check and capture changed/failed.
  • Plan output: For NAPALM-capable devices — use napalm_compare_config pattern to produce diffs (store diffs as artifact or attach to PR).
  • Unit tests: For Jinja templates, test with jinja2-cli + sample data.
  • Staging gate: Apply to staging first, monitor for X minutes, then promote to prod (manual approval).

8 — Safety & rollback strategies

  • Pre-change backup: Always capture running-config and store snapshot in backups/ (or S3/Vault).

  • Transaction support: Use device features (candidate/commit/rollback) when supported (JunOS, IOS-XR, NX-OS candidate config). Example with Napalm:

    • fetch get_config before change
    • load_replace_candidate -> compare_config -> commit_config or discard_config
  • Rollback plan:

    • If failed, run restore playbook to push previous config from backup.
    • Keep rollback playbooks and test them in staging.
  • Change windows & maintenance mode: enforce maintenance window for production changes and notify stakeholders.

  • Rate limiting & ramp-up: limit parallelism via Ansible serial parameter (e.g., serial: 5) for gradual rollout.


9 — Secrets & security

  • Do NOT store credentials in repo.

  • Options:

    • HashiCorp Vault with Ansible Vault lookup plugin.
    • GitHub Secrets for GitHub Actions runners.
    • Secrets manager (AWS Secrets Manager / Azure Key Vault) with role-based access.
  • Use per-device credentials when possible (service accounts with least privileges).

  • Use SSH certificates or centralized TACACS+/RADIUS for device access.

  • Secure APIs: NetBox token stored as secret, rotate periodically.

  • Audit access: restrict who can merge to main and require reviews.


10 — Observability, auditing & compliance

  • NetBox: Source-of-truth + change-log history.

  • Device config backups: store snapshots (S3 with lifecycle rules), link to PRs.

  • Prometheus & Grafana:

    • SNMP exporter for device metrics (CPU, memory, interface counters).
    • Dashboards for interface errors, CPU, temperature, and expected changes.
  • Alerting: Alertmanager for thresholds (interface errors > x, down devices).

  • Runbooks & Incident docs: store in docs/runbook.md and mount to cluster or as ConfigMap for quick access.

  • Audit logs: Git history + NetBox audit + CI logs form audit chain.


11 — Deployment checklist & runbook (operational)

Before PR

  • Update NetBox (create/modify object)
  • Update playbook or vars in feature branch
  • Add tests (ansible-lint fixes + unit tests)

Pull Request

  • CI: ansible-lint, yamllint, syntax-check
  • CI: ansible-playbook --check dry-run produced; review plan diffs
  • Peer review: network engineer signs off
  • Merge to main triggers staging deploy

Staging

  • Backup current configs (backup-configs.yml)
  • Apply change to staging
  • Monitor for 15–30 minutes (interface flaps, CPU)
  • If OK, request production approval

Production

  • Schedule maintenance window (if needed)
  • Pre-backup production configs
  • Deploy with serial (e.g., --limit or serial: 3)
  • Monitor for issues
  • If failure: run restore playbook from backups & open incident

Post-change

  • Tag release (e.g., network-change-2025-10-03)
  • Document in change log & NetBox notes
  • Remove temporary credentials & rollout logs to secure storage

Concrete snippets & helpers

Example: Ansible role task roles/base_config/tasks/main.yml

1- name: Render configuration template
2  template:
3    src: device_config.j2
4    dest: /tmp/{{ inventory_hostname }}.cfg
5
6- name: Check connectivity (napalm)
7  napalm_get_facts:
8  register: facts
9
10- name: Push configuration (candidate/commit)
11  napalm_config:
12    commit_changes: true
13    replace: false
14    config: "{{ lookup('file', '/tmp/' + inventory_hostname + '.cfg') }}"

Example: Using serial in playbook to ramp changes

1- hosts: tag_env_prod
2  serial: 3
3  tasks:
4    - name: Apply VLAN
5      import_role:
6        name: vlan_provision

Example: Generate plan/diff using NAPALM

Use device support: napalm_config with retrieve and compare. Ansible modules / plugins vary by device — create a task that registers compare output and uploads as GitHub artifact.


Appendix: Useful commands for operators

  • Dry run: ansible-playbook playbooks/vlan.yml -i inventories/netbox.yml --check
  • Run live (staging): ansible-playbook playbooks/vlan.yml -i inventories/netbox.yml -l tag_env_staging
  • Backup configs: ansible-playbook playbooks/backup-configs.yml -i inventories/netbox.yml
  • Restore from backup (example): ansible-playbook playbooks/restore-config.yml -e backup_file=backups/host-2025-10-03.cfg

Final notes — best practices & tips

  • Start small: automate low-risk tasks first (backups, audits, VLANs) then move to routing changes.
  • Enforce reviews: require two reviewers for production changes.
  • Document rollback: every change PR must include rollback steps.
  • Visibility: post CI dry-run diffs to PR for reviewers to inspect.
  • Test fixtures: keep simulated devices (containers, virtual lab) for testing changes (Cisco VIRL, EVE-NG, or vendor simulators).
  • Compliance: tag changes with ticket IDs, maintain CSRs in NetBox custom fields.