Why I Set Kubernetes CPU Requests Equal to Limits (And You Should Too)

The 8-Core Mystery That Changed My Approach

Our production workload was configured for 8 CPU cores. The request was set to 8, limits were set to 8. Everything should have been perfect.

But it wasn't getting 8 cores.

After weeks of investigation, I discovered something that fundamentally changed how I think about Kubernetes resource management. The problem wasn't our configuration—it was everyone else's.

The Noisy Neighbor Problem (And Why It's Worse Than You Think)

Here's what was happening on our node: Our workload had both CPU requests and limits set to 8 cores, following what seemed like good practice. But the other workloads on the same node? They had 2 CPU requests with 16 CPU limits—the "typical best practice" we'd all been taught.

Those other workloads were consuming 12-14 CPUs each, way more than their 2 CPU requests. The node scheduler saw "plenty of room" based on requests, but reality was different. Our guaranteed 8 cores became 1-2 cores.

The revelation: Setting requests lower than limits creates a scheduling lie.

Understanding What Requests and Limits Actually Do

Before diving into the solution, let's clarify what these settings actually control. Requests tell the Kubernetes scheduler how much resource to reserve when placing pods—it's the scheduler's primary decision-making input. Limits, on the other hand, define the maximum resources a container can consume before being throttled or killed.

Here's the crucial part: the scheduler only looks at requests when deciding where to place pods. It assumes your workload will use what it requests. When workloads consistently exceed their requests, the scheduler makes bad decisions.

The Counter-Intuitive Solution: `Requests = Limits`

resources:
  limits:
    cpu: 8
    memory: "16GiB"
  requests:
    cpu: 8        # Same as limits
    memory: "16GiB" # Same as limits

"But that's wasteful!"

That's what I thought too. Here's why it's not:

Resource Efficiency Through Honesty

When you set requests equal to limits, you're not wasting resources—you're creating honest resource accounting. Combined with the Horizontal Pod Autoscaler, this approach actually becomes more efficient. The HPA can make accurate scaling decisions because it's working with real resource usage data, not wishful thinking. Performance becomes predictable because there's no surprise resource starvation. Capacity planning becomes honest because you know exactly what each pod consumes. Even the scheduler's bin packing improves because it's making informed placement decisions based on reality.

When Horizontal Scaling Has Limitations

This approach becomes even more critical when horizontal scaling is constrained. Think about database-heavy applications where connection pools limit how many instances you can run, or enterprise software where licensing costs scale with every new instance. Sometimes shared state makes scaling complex, or you're dealing with legacy applications that simply can't scale horizontally.

In these cases, getting your requested resources becomes crucial. You can't just "scale out" when performance degrades.

What Causes Resource Starvation?

The root cause is simple: resource allocation dishonesty at scale.

When multiple workloads on a node have higher limits than requests, they compete for resources the scheduler doesn't know about. The node reaches capacity, but the scheduler continues placing pods because the "requested" resources look available.

This creates a cascade of performance issues. Existing workloads start getting throttled as they compete for resources. New workloads can't get their requested resources because the node is secretly oversubscribed. Performance becomes unpredictable, and debugging becomes a nightmare because the symptoms don't match your configuration.

Being a Good Citizen Isn't Enough

Here's the harsh reality: setting your own requests equal to limits makes you a "good citizen" in the cluster, but it doesn't guarantee you'll get your requested resources.

Why? Because being good doesn't protect you from bad neighbors.

Our 8-core workload was the perfect citizen—it requested exactly what it needed and never tried to consume more. But the other workloads on the node were requesting 2 CPUs while actually consuming 12-14 CPUs each. They were lying to the scheduler.

The real guarantee comes from everyone being a good citizen. When every workload on a node promises to only consume what it requests, then and only then can the scheduler make honest decisions. When everyone follows the requests=limits rule, no one can steal resources that weren't allocated to them.

This is why cluster-wide adoption is crucial—it's not about your individual workload's behavior, it's about creating an environment where resource promises actually mean something.

The Hard Truth About Implementation

Here's what no one wants to hear: you can't solve this problem halfway.

Starting with just your critical workloads won't protect them from noisy neighbors. Making your team a good citizen while other teams continue to lie to the scheduler? Your workloads will still get starved.

This isn't a gradual rollout strategy—it's a collective action problem. Either your entire cluster adopts honest resource allocation, or the scheduler continues making decisions based on lies.

A Practical Migration Path: The Good Citizens Nodepool

But there is a way to migrate incrementally while maintaining the guarantees. Create a dedicated nodepool exclusively for workloads that follow the requests=limits rule.

Here's how it works: All workloads start on the existing "wild west" nodepools where noisy neighbors continue their resource grabbing. When a team commits to being good citizens—setting requests equal to limits—their workloads get labeled and gain access to the protected nodepool.

# Good citizen workload gets the special label
metadata:
  labels:
    resource-policy: "honest"

# Node selector ensures it only runs with other good citizens
spec:
  nodeSelector:
    nodepool: "good-citizens"

This creates positive incentives. Teams that adopt honest resource allocation get guaranteed performance. Teams that continue gaming the system stay in the chaotic environment they created. Over time, the good citizens nodepool grows while the wild west shrinks.

The beauty: Good citizens are immediately protected from noisy neighbors, and the scheduler can make honest decisions within the protected nodepool because everyone there follows the same rules.

When Everyone Follows This Rule, Magic Happens

The benefits compound when adopted cluster-wide. Resource allocation becomes honest, so scheduler decisions actually reflect reality. Performance becomes predictable—no more mystery slowdowns that take hours to debug. Scaling decisions become accurate because the HPA is working with real data instead of guesses. Capacity planning improves dramatically because you know your actual resource needs. Most importantly, multi-tenancy actually works because teams can't steal each other's resources anymore.

The Real Best Practice

Stop setting requests lower than limits unless you have a specific reason. The traditional advice creates resource contention in multi-tenant clusters.

New rule: Default to requests = limits, deviate only when you understand the tradeoffs.

Conclusion: Embrace Resource Honesty

The lesson from our 8-core mystery is simple: honesty in resource allocation prevents chaos.

When every workload accurately declares its resource needs through equal requests and limits, Kubernetes can do what it does best—make intelligent scheduling decisions based on real data.

Your future self (and your on-call teammates) will thank you for choosing predictable performance over theoretical efficiency. In production environments, guaranteed resources beat optimistic resource sharing every time.

Stop letting noisy neighbors steal your CPU cycles. Set requests = limits, let the HPA handle scaling, and watch your performance issues disappear.