Skip to content
Back to Blog
ai-governanceenterprisesecurityrbacmcpssodevops

Enterprise AI Tooling Governance: RBAC, MCP Configs, and SSO for GitHub Copilot, Claude Code, and Cursor

Most enterprise AI tooling rollouts are underprepared for the governance questions. Here is the framework I use: RBAC tiers, MCP server configurations, Keycloak SSO integration, and LLM acceptance criteria for engineering workflows.

Most enterprise AI developer tool rollouts follow the same pattern: a few teams get GitHub Copilot access, someone installs Cursor, someone else starts using Claude Code, and six months later security asks "wait, what data is going to which models?" The governance question arrives after the deployment, not before it.

I have been building the governance layer for AI developer tooling at Oracle -- RBAC policies, MCP server configurations, SSO via Keycloak, and LLM acceptance criteria for engineering workflows. This is what that framework looks like and where I think most approaches are getting it wrong.

The governance gap that matters

The security conversation about AI developer tools usually focuses on data exfiltration: code snippets leaving the organization, prompts containing credentials, completions suggesting insecure patterns. These are real concerns. They are also the obvious concerns -- most enterprise security teams have at least a partial answer for them.

The concern that gets less attention: access control is binary in most deployments. You either have GitHub Copilot or you do not. You either can configure MCP servers or you cannot. The intermediate states -- different access tiers based on role, data classification, or team -- are either absent or enforced only by policy document rather than by technical controls.

Binary access creates two problems. First, high-privilege capabilities (MCP servers that can query production databases, agents that can create PRs, tools with broad file system access) are accessible to everyone who has the base tool. Second, when you do need to restrict or audit a specific capability, you have no mechanism for it.

RBAC tiers that map to actual risk

The tier model I use maps access to risk level rather than to job title:

Tier 1: Completions and chat only. Inline code completions, chat within the IDE, no tool use, no external connections. This tier covers the broadest set of engineers and carries the lowest risk. GitHub Copilot's base license and Claude Code in a mode without tool permissions both fit here.

Tier 2: Local tool use. File reads and writes, shell execution in a sandboxed environment, no network access beyond the model API. Engineers who need agents to refactor, test, or document code at the file or project level. Claude Code with bash and read/write permissions but no external API tools.

Tier 3: Integrated tool use. MCP servers with connections to internal systems: code search, documentation, CI/CD status. Engineers who need the model to have context from internal tooling to be useful. This is where most of the governance complexity lives.

Tier 4: Agentic with external writes. MCP servers or agents that can create issues, open PRs, post to Slack, trigger deployments. A small set of users -- senior engineers who have demonstrated they understand the blast radius of an autonomous action going wrong.

The tiers are enforced technically, not just policy-stated. Tier 1 users cannot install MCP server configurations because the configuration file location is managed by the provisioning system. Tier 3 users can install MCP servers from an approved catalog. Tier 4 access requires an approval workflow.

MCP server governance

The Model Context Protocol is now the standard way to extend AI coding tools with tool access. Every major AI IDE -- Claude Code, Cursor, Cline, Continue -- supports it. The governance problem is that MCP servers are essentially arbitrary code running on the developer's machine with whatever permissions the developer has.

The controls I have implemented:

Approved catalog with pinned versions. Internal tooling manages an approved-mcp-servers.json catalog. Engineers at Tier 3+ can install from the catalog but cannot configure arbitrary MCP servers. Each catalog entry pins the server version and the tool set available:

{
  "catalog_version": "1.4",
  "servers": [
    {
      "name": "internal-code-search",
      "source": "internal-registry/mcp-code-search",
      "version": "2.1.0",
      "allowed_tools": ["search_code", "get_file", "list_repos"],
      "denied_tools": ["write_file", "delete_file", "create_repo"],
      "data_classification": "internal",
      "tier_required": 3
    },
    {
      "name": "jira-readonly",
      "source": "internal-registry/mcp-jira",
      "version": "1.3.2",
      "allowed_tools": ["get_issue", "search_issues", "get_project"],
      "denied_tools": ["create_issue", "update_issue", "delete_issue"],
      "data_classification": "internal",
      "tier_required": 3
    }
  ]
}

Scoped credentials per server. MCP servers that connect to internal systems use service accounts scoped to exactly the operations the catalog entry allows. The Jira MCP server's service account can read issues but cannot create them, regardless of whether the MCP server code tries to.

Audit logging. Every MCP tool invocation is logged: timestamp, user, server name, tool name, input parameters (truncated to avoid logging sensitive content), and response size. These logs feed into OCI Logging and are queryable by security. This is where you catch unexpected usage patterns -- a user who is supposed to be on Tier 3 but whose audit log shows a pattern that looks like they are using the tool as an agent.

SSO integration with Keycloak

Keycloak 26.x is what I am using for centralized identity across the AI tooling stack. The OAuth 2.0 device authorization flow is what makes sense for CLI tools -- Claude Code and the Cline VSCode extension both support it. The SAML integration path is available for tools that require it (GitHub Copilot in some enterprise configurations), but the OAuth path is simpler to manage.

The Keycloak realm configuration that matters for AI tooling:

{
  "realm": "ai-developer-tools",
  "clients": [
    {
      "clientId": "claude-code-enterprise",
      "protocol": "openid-connect",
      "publicClient": false,
      "standardFlowEnabled": false,
      "deviceAuthorizationGrantEnabled": true,
      "attributes": {
        "access.token.lifespan": "3600",
        "client.session.max.lifespan": "28800"
      }
    }
  ],
  "roles": {
    "realm": [
      {"name": "ai-tools-tier1"},
      {"name": "ai-tools-tier2"},
      {"name": "ai-tools-tier3"},
      {"name": "ai-tools-tier4"}
    ]
  }
}

Role assignment flows from the HR system via SCIM provisioning. When an engineer's role changes or they leave, their AI tool access is revoked automatically within the SCIM sync interval (currently 15 minutes) rather than requiring a manual ticket.

The session lifetime configuration is a deliberate decision. The 8-hour client session cap means developers re-authenticate at the start of each workday. This is slightly more friction than a 30-day session, and it is the right trade-off: AI tools with write permissions should require re-authentication frequently enough that a stolen session token has a short window.

LLM acceptance criteria for engineering workflows

This is the part most governance frameworks skip. They define who can access the tools but not what "good" output looks like from those tools for the organization's specific workflows.

I define acceptance criteria as explicit rubrics for each workflow where AI output feeds into a production decision. Two examples:

Code review assistance. When Claude Code or Copilot Chat is used to summarize a PR for review, the output is evaluated against: Does it correctly identify the change type (feature, refactor, bug fix)? Does it surface security-relevant changes (new dependencies, credential handling, input validation)? Does it miss any changed files? The acceptance criteria for this workflow are evaluated quarterly on a sample of 50 AI-assisted reviews, with human-reviewed ground truth.

Infrastructure plan review. When an agent is used to summarize a terraform plan output before human approval, the output is evaluated against: Does it correctly identify resource creates, updates, and destroys? Does it flag destructive operations? Does it identify resources that are outside the normal change pattern (unexpected deletions, new IAM policies)? False negatives on this workflow have higher cost than false positives, so the rubric is tuned toward recall.

These acceptance criteria serve two purposes. First, they give you a measurable signal when a model update changes behavior in a way that affects your workflows -- you run the eval before and after the upgrade. Second, they give the security team a concrete answer to "how do you know the AI tool is producing correct output" rather than a vague "the engineers review everything."

What is getting done wrong

Most enterprise AI tooling governance I have seen has the same gap: it is a policy document that no technical control enforces. "Engineers should not input confidential data into external AI tools" is a policy. A network proxy that inspects requests to external LLM APIs and blocks patterns matching internal IP ranges and credentials is a control.

I am not claiming the policy-only approach is useless. It establishes accountability and it is auditable in the sense that you can point to it when something goes wrong. But it does not prevent the wrong thing from happening -- it just creates a paper trail after it does.

The controls that actually reduce risk are: provisioning-managed configuration files (prevents unauthorized MCP servers), catalog-enforced approved servers (limits blast radius of any one server), scoped service account credentials (limits what an MCP server can do regardless of what it tries), and session lifetime constraints (limits window of a compromised credential).

The acceptance criteria work is less about preventing the bad thing and more about building the organizational confidence to use these tools at higher-leverage points. You do not let an AI tool summarize a terraform plan for a production change without first knowing that its summaries are accurate on your infrastructure patterns. The eval work is what lets you extend trust responsibly rather than either blocking the tool entirely or using it without knowing how it behaves on your specific workflows.

What I have not solved

MCP server sandboxing is the unresolved problem. The current model is that MCP servers run as the developer's local process with the developer's file system access. An approved catalog limits which servers run, but it does not limit what an approved server can do outside of its defined tool set at the API level. The right solution is container isolation for MCP server processes -- the server runs in a container with a limited set of mounts and no network access beyond its defined upstream. I have a prototype of this working for the Jira MCP server. Generalizing it to the full catalog is on the roadmap.


The AI platform governance work described here is one of the services I offer through Optivulnix. If you are designing an enterprise AI tooling governance framework and want to compare approaches, my calendar is open.