RBAC in Multi-Cluster CI/CD: Case Study | Hokstad Consulting

RBAC in Multi-Cluster CI/CD: Case Study

RBAC in Multi-Cluster CI/CD: Case Study

Role-Based Access Control (RBAC) is crucial for securing Kubernetes clusters in multi-cluster CI/CD environments. Without proper RBAC policies, organisations risk security vulnerabilities, compliance failures, and operational inefficiencies. This case study explores how a UK financial services company resolved RBAC challenges across eight clusters with help from Hokstad Consulting.

Key Takeaways:

  • Challenges Faced:

    • Over-permissive service accounts (e.g., cluster-admin roles used by CI/CD pipelines).
    • Inconsistent RBAC configurations across clusters.
    • Difficulty maintaining audit trails for compliance.
  • Solutions Implemented:

    • Designed a least-privilege RBAC model tailored to user roles and tasks.
    • Centralised RBAC management using GitOps workflows (Argo CD).
    • Phased rollout starting with non-production environments.
    • Automated tools for auditing and monitoring permissions.
  • Results:

    • Improved security by eliminating privilege escalation risks.
    • Simplified compliance with detailed access logs and audit trails.
    • Empowered developers with self-service access to specific namespaces.

This structured approach, emphasising security and operational efficiency, transformed their CI/CD workflows while ensuring regulatory compliance.

::: @figure Multi-Cluster RBAC Implementation Framework: Challenges, Solutions, and Results{Multi-Cluster RBAC Implementation Framework: Challenges, Solutions, and Results} :::

Challenges in Managing Access Across Multi-Cluster CI/CD

Inconsistent RBAC Across Clusters

The financial services company faced a growing challenge as its cluster environment expanded. Each of its eight clusters had developed its own unique set of access rules. Development teams had manually created RBAC (Role-Based Access Control) policies as needed, resulting in a mishmash of permissions that varied widely between environments. For example, a configuration that functioned as a RoleBinding in the staging cluster might show up as a ClusterRoleBinding in production, potentially granting broader access than intended.

This inconsistency not only caused confusion among engineers but also opened the door to unintended access. A developer with permissions limited to a namespace in one environment could inadvertently gain cluster-wide access in another. Tracking these differences manually became impossible as the organisation scaled. Research highlights this issue, with over 40% of organisations reporting Kubernetes misconfigurations in the past year, and more than 25% identifying access control issues as a high-risk security concern [3]. These irregularities posed serious risks, especially in the context of automated deployments.

Over-Privileged Service Accounts

The company's CI/CD pipelines relied on service accounts with cluster-admin privileges - a practice that introduced serious vulnerabilities. These overly broad permissions directly undermined the security principles of a multi-cluster CI/CD setup. As Kubernetes documentation points out:

Granting permission to create workloads also implicitly grants the API access levels of any service account in that namespace [1].

In practical terms, this meant a compromised Jenkins agent could access all Secrets, ConfigMaps, and PersistentVolumes within its namespace. The risk didn’t stop there - service accounts with permissions to create PersistentVolumes could mount hostPath volumes, exposing the underlying host filesystem. This highlighted the urgent need for more restrictive, least-privilege configurations to minimise exposure.

Compliance and Auditability Gaps

The challenges didn’t end with misconfigurations or over-permissioned accounts. Ensuring compliance with financial regulations added another layer of complexity. The organisation struggled to maintain a reliable record of who accessed which resources and when. The manual creation of RBAC rules made it difficult to establish consistent audit trails. This lack of oversight allowed zombie permissions to persist - when a user account was deleted, its RBAC bindings often remained active. As a result, any new user created with the same name would automatically inherit the old permissions [1].

To make matters worse, the audit team discovered that some high-privilege configurations bypassed logging mechanisms, making it nearly impossible to track administrative actions. Without centralised identity management, the organisation faced delays in revoking access when employees switched roles or left the company. One industry report summed it up perfectly:

Manual configuration of service accounts and RBAC rules is error-prone and time-consuming, leading to security vulnerabilities [5].

Managing RBAC Cross Multiple Kubernetes Clusters - Alena Prokharchyk, Rancher Labs, Inc.

Kubernetes

Designing and Implementing RBAC for Multi-Cluster CI/CD

To address inconsistencies and overly permissive configurations, the company adopted a structured, principle-based approach to Role-Based Access Control (RBAC) across its clusters.

Principles and Requirements

The company established four key principles to guide RBAC implementation. Least privilege was the cornerstone - every user and service account was granted only the permissions necessary for their specific tasks. Separation of duties was another critical principle, ensuring developers could deploy to non-production clusters, while operators managed production rollouts [8]. Environment-specific permissions further distinguished access between development and production environments. Lastly, centralised identity management simplified user provisioning and access control.

A significant step involved securing service accounts. The team disabled the automatic mounting of service account tokens by setting automountServiceAccountToken: false for pods that didn’t require API access [1]. This measure reduced the risk of exposing credentials within running containers.

These principles formed the foundation for the company’s multi-cluster RBAC architecture.

RBAC Architecture and Role Design

The company implemented a two-tier GitOps architecture using Argo CD. A Management GitOps instance was responsible for cluster-wide configurations such as Namespaces, NetworkPolicies, Quotas, and RBAC objects. This instance was restricted to the Site Reliability Engineering (SRE) team. Separate Application GitOps instances empowered developers to deploy workloads within predefined boundaries [8].

Argo CD’s AppProject resources enabled multi-tenancy. Each project defined restrictions on valid Git repositories, deployment destinations (clusters and namespaces), and permissible Kubernetes resources [8][10]. The team set the default policy to role:none, ensuring authenticated users had no permissions unless explicitly granted [7][8]. To streamline access, corporate Active Directory groups were mapped to custom Argo CD roles, each defining specific permissions for applications, clusters, and projects. Integration with AWS Identity Center supported up to 1,000 identities per Argo CD instance, which comfortably met the organisation’s needs [6][9].

With this architecture in place, GitOps became the backbone for managing and enforcing RBAC policies.

GitOps-Based Rollout

All RBAC manifests - including Roles, RoleBindings, ClusterRoles, and AppProjects - were stored in Git repositories, serving as the single source of truth. This approach ensured automatic versioning, peer-reviewed changes via pull requests, and a complete audit trail of all modifications. The team used Kustomize to manage modular policy files, maintaining base configurations with environment-specific overlays [10][11].

Before rolling out changes to production, engineers validated policies using argocd admin settings rbac validate and the can command to test permissions against local files [12]. To minimise risks, they avoided using wildcards in policy definitions, explicitly specifying resources and actions to prevent privilege escalation as clusters evolved. By default, Argo CD polled Git repositories every six minutes, but webhook integration reduced synchronisation times to seconds for critical updates [6].

Need help optimizing your cloud costs?

Get expert advice on how to reduce your cloud expenses without sacrificing performance.

Implementation Journey and Key Solutions

To address the identified gaps, the company undertook a transformative approach to revamp its Role-Based Access Control (RBAC) for multi-cluster CI/CD. The process began with a thorough audit of existing permissions. Engineers meticulously reviewed service accounts and bindings across clusters, mapping out access patterns. This exercise revealed several over-privileged service accounts with cluster-admin rights being used for routine deployment tasks. These were replaced with custom roles that had narrower, more appropriate scopes.

Access Inventory and Risk Assessment

Automated tools played a crucial role in analysing role bindings and generating detailed reports. These reports flagged accounts with excessive permissions, providing the team with a clear picture of the access landscape. Service accounts were categorised based on their activity - active versus dormant - and actual usage patterns were studied. This analysis enabled the team to craft custom roles that better matched operational requirements. The findings were then validated through a targeted pilot deployment, ensuring the new RBAC configurations worked effectively in practice.

Pilot Deployment in Non-Production Environments

The company wisely opted for a phased approach, starting with a pilot programme in non-production environments. Separate namespaces were established for development, staging, and production, creating clear access boundaries. Developers had full access in the development environment, but their permissions were restricted in staging and production to safeguard those environments. Service accounts were further segmented by CI/CD pipeline stages. For instance, a build account was restricted to pulling code, while a deployment account was limited to updating specific resources. Broad roles like cluster-admin were eliminated in favour of custom roles that explicitly defined required resources and actions. To ensure consistency and maintain an audit trail, RBAC manifests were stored in Git.

Monitoring and Continuous Improvement

After the pilot, the company implemented continuous monitoring to fine-tune the RBAC policies. Regular audits and automated scans were conducted to identify over-privileged or unused service accounts, reducing security risks. This ongoing process ensured the RBAC policies stayed aligned with operational changes and evolving needs. By balancing strong security measures with developer efficiency, the organisation maintained a dynamic yet secure environment for its multi-cluster CI/CD operations.

Outcomes and Lessons Learned

The RBAC overhaul brought tangible improvements in security, compliance, and efficiency. By eliminating over-privileged cluster-admin service accounts from routine deployment tasks and introducing comprehensive audit trails, the company significantly reduced its exposure to security risks while simplifying compliance reviews.

Security and Compliance Gains

The updated RBAC model introduced stricter controls by restricting the escalate and bind verbs, effectively preventing privilege escalation. This ensured users couldn't bypass safeguards to grant themselves higher permissions[2][1]. Broad, insecure group bindings were disabled, and custom RBAC policies were implemented to limit default group access, such as system:authenticated[2]. With these changes, the organisation gained complete visibility into resource access, making it far easier to meet regulatory requirements with minimal manual intervention. This robust security framework also gave development teams the confidence to operate within a safer environment.

Efficiency and Developer Autonomy

The new system empowered developers with self-service access to their designated namespaces, allowing them to work independently in development environments without waiting for administrative approvals. At the same time, production environments remained tightly controlled, with roles tailored to prevent accidental changes while still enabling essential deployment actions. A centralised RBAC dashboard streamlined tracking across all clusters, cutting down troubleshooting time. Additionally, automated gates and visual promotion policies replaced manual scripts, ensuring strong security measures without slowing delivery[4]. These changes highlight best practices for managing multi-cluster CI/CD environments.

Key Takeaways for Similar Projects

Collaboration across teams was crucial. Security, platform, and development teams worked together to define roles that balanced safeguarding resources with maintaining productivity. Treating RBAC policies as code - storing manifests in Git and managing them via GitOps - ensured consistency and created an auditable record of changes. A phased rollout combined with continuous audits validated the RBAC model's effectiveness, ensuring policies stayed aligned with operational demands while maintaining a solid security framework.

Conclusion

This case study highlights how a structured approach to Role-Based Access Control (RBAC) can turn multi-cluster CI/CD environments from potential security risks into well-organised and efficient systems. The organisation addressed several common challenges - such as inconsistent cluster configurations, overly permissive service accounts, and compliance gaps - by adopting GitOps workflows, creating custom namespace-specific roles, and enforcing centralised policies. A phased implementation ensured the solution was thoroughly tested before being rolled out to critical systems.

By following this strategy, the organisation not only reduced risks but also established a new standard for secure and efficient CI/CD operations. With stricter security controls, privilege escalation risks were eliminated, while comprehensive audit trails made regulatory compliance more straightforward. Additionally, self-service access empowered developers to work independently without compromising the stability of production systems. The RBAC framework, managed as code and supported by automated checks and regular audits, provided a robust combination of security, transparency, and operational efficiency.

A major factor in this success was the collaboration between security, platform, and development teams. Together, they defined roles that maintained a balance between strong protection and developer productivity. Storing RBAC policies in Git and managing them through GitOps ensured an auditable history of all changes, keeping policies aligned with the organisation's evolving needs.

Hokstad Consulting played a pivotal role in guiding this transformation. They specialise in helping organisations navigate complex DevOps challenges, with services ranging from cloud cost engineering to strategic migrations and custom automation. Their No-Savings-No-Fee model guarantees businesses only pay when measurable improvements are achieved. Whether you're managing multi-cloud environments or streamlining CI/CD workflows, partnering with experts like Hokstad Consulting can turn security and compliance hurdles into opportunities for competitive growth.

FAQs

How does using a least-privilege RBAC model improve security in multi-cluster CI/CD workflows?

Implementing a least-privilege RBAC model ensures that users, service accounts, and automation tools are granted only the permissions essential for their tasks. By doing so, it reduces the likelihood of unauthorised access, curbs the chances of privilege escalation, and helps shrink the overall attack surface.

On top of that, this model establishes clear and auditable access controls across all clusters in your CI/CD pipeline. This makes it simpler to oversee and manage permissions while upholding strong security practices.

What are the advantages of using GitOps workflows for managing RBAC across multiple clusters?

GitOps workflows let you handle RBAC policies as declarative, version-controlled code stored in Git. By doing so, you create a single source of truth, ensuring permissions remain consistent across all clusters. Automating these tasks helps minimise configuration drift while simplifying the management of compliance and security.

Another advantage of GitOps is the auditable change history it provides. This makes tracking and reviewing updates to RBAC policies straightforward. The result? Operations become more efficient, and your CI/CD workflows gain greater transparency and accountability.

How can organisations maintain compliance and track activity in a multi-cluster CI/CD environment?

To ensure compliance and effectively monitor activity in a multi-cluster CI/CD environment, organisations should set up a centralised Kubernetes audit logging system. This system should enforce consistent logging policies across all clusters. Encrypting these logs and storing them in a tamper-proof solution ensures they remain secure and unaltered.

Incorporating policy-as-code tools, such as OPA or Kyverno, into the CI/CD pipelines adds another layer of automated compliance checks. Tools like Falco or Trivy can also be integrated to scan deployments and identify vulnerabilities in real time. Together, these measures create a detailed and reliable audit trail for every action within the system.