Skip to main content

Managing Actions per Second (APS) Limits in Temporal Cloud

If you're running Workflows on Temporal Cloud, you've probably noticed that each Namespace comes with an Actions Per Second (APS) limit. But what exactly does that mean, and why does it matter?

In Temporal, an "action" is any operation that modifies Workflow state or interacts with the Temporal service. Your Namespace's APS limit controls how many of these operations can happen per second across all Workflows within that Namespace. When the APS limit is reached, Temporal begins to throttle requests. Depending on the business priority of the Workflow, this may be fine or it may have significant impact.

The difficulty is that APS consumption isn't always intuitive. A single Workflow Execution generates multiple actions from the moment it starts, and use cases that fit nicely within APS limits at small scale can exhaust those limits as they grow. Many customers are surprised to find they're hitting APS constraints well before they expected to based on their Workflow count alone.

This guide will help you understand why customers hit APS limits, how to design Workflows that use actions efficiently, and what to do when you're approaching capacity. When design changes aren't enough, Temporal Cloud offers Provisioned Capacity Mode that let you reserve additional capacity using Temporal Resource Units (TRUs) for spiky or unpredictable workloads.

Whether you're just getting started with Temporal Cloud or optimizing an existing deployment, managing APS effectively is key to building scalable, reliable applications.

Understanding Actions in Temporal

Before we dive into why customers hit APS limits, let's talk about what actions are.

What Counts as an Action?

In Temporal, actions are the fundamental operations that drive your Workflows forward. Here's an overview of what counts, with the full list in our documentation.

  • Workflows: Starting, completing, resetting. Also starting Child Workflows, as well as Schedules and Timers
  • Activities: Starting, retrying, Heartbeating
  • Signals, Updates, and Queries

Actions that count toward an APS limit are, with a few exemptions, the same as actions that are billable. The key insight here is that nearly everything that happens in Temporal--state changes, decision points, interactions--is counted as an action.

The Action Multiplier Effect

What this means is that when you start a single Workflow, you're not performing just one action as it relates to APS because a Workflow isn’t a single atomic operation, it’s a series of events that Temporal orchestrates. Each Activity at the start of the Workflow is an Action, so there can be a burst of Activities at the start of a Workflow. Additionally, there are often business reasons to start multiple Workflows at the same time.

These can all contribute to the multiplier effect.

The Effect of Rate Limiting

In Temporal Cloud, the effect of rate limiting is increased latency, not lost work. Workers might take longer to complete Workflows.

Common Reasons Customers Hit APS Limits

Now that you understand how actions are defined and how they count toward APS limits, let's look at the patterns that most commonly push customers into APS constraints.

Bursty Traffic

Most businesses don't operate at constant velocity—they have rhythms, cycles, and spikes. These patterns can create APS challenges because Temporal Cloud enforces limits at the per-second level.

Common bursty patterns include:

  • Calendar-driven spikes: Month-end financial close processes, quarterly reporting Workflows, payroll that runs on the 1st and 15th, scheduled batch jobs that kick off at midnight. These create predictable but intense load concentrations.
  • Event-driven surges: Product launches, marketing campaigns, flash sales, breaking news, or seasonal events like Black Friday.
  • Recovery scenarios: When a downstream dependency fails and then recovers, you often get a thundering herd effect—hundreds or thousands of Workflows that were waiting all suddenly resume execution simultaneously, creating an artificial spike in APS consumption.
  • Geographic/business hours concentration: Global applications often see load follow the sun, with peak activity during business hours in each region. If your business concentrates in specific markets, you'll see daily peaks rather than even 24/7 distribution.
  • Retry Storms: when a large number of Workflows get stuck on an Activity, and that Activity is failing, if retry delay is very short, this can cause a spike in Actions.
  • Timer Storms: a large number of Workflows all set a Timer for the exact same time--triggering a spike as those Timers fire and then Activities run, causing a lot of actions all at the same time.

These types of processes can result in your Namespace averaging 200 APS over a day, but spiking to 800 APS or more during your peak hour/day/event/etc.

How to Mitigate

You can’t change the patterns of how customers interact with your systems, but there are some adjustments you can make to your Workflows to make traffic patterns more consistent, especially for use cases where immediate response isn’t necessary.

These adjustments include:

  • Implement application-level queuing or rate limiting to smooth out predictable spikes.
  • For scheduled batch operations, stagger start times rather than triggering everything at once--implement jitter in your high-volume Schedules.
  • Implement jitter when starting Workflows, such as with Start Delay.
  • Accept rate limiting
  • Provisioned Capacity

Cascading Workflows and Fan-Out Patterns

Decomposing complex processes into parent and Child Workflows (or with Nexus) is a common and often appropriate pattern, but the APS costs multiply dramatically with depth and fan-out.

Consider an order fulfillment Workflow that spawns Child Workflows for payment processing, inventory management, shipping, and customer notifications. Each Child Workflow goes through its full action lifecycle (start, tasks, activities, completion), and all of those actions count toward the APS limits on your Namespace.

This pattern appears frequently in:

  • Batch processing: A parent workflow processes a file with 1,000 records, spawning a Child Workflow for each record. Batch processing is also often bursty whenever the batch begins.
  • Map-reduce patterns: Data processing Workflows that fan out to process partitions in parallel, then aggregate results.

This challenge additionally compounds when you have multiple levels of nesting--parent Workflows that create children, which create their own children.

How to Mitigate

  • Evaluate whether Child Workflows are necessary--other options include Activities or Workflows in another Namespace (via Nexus)
  • When you do use Child Workflows, limit fan-out size--design a Child Workflow to process its work in batches rather than one Child per work item. This sample application shows more detail.
  • Consider flattening deeply nested hierarchies into shallower structures.

Human-in-the-Loop Processes at Scale

Workflows that incorporate human decision-making--approvals, reviews, manual data entry, quality checks--tend to be long-running and interaction-intensive, which creates sustained APS load.

These Workflows can involve Queries from UIs to display current state and pending tasks.

At small scale, this is manageable. But when you're running thousands of them at the same time--like a content moderation queue with pending reviews, or a loan approval system processing applications, or a support ticket system managing thousands of open cases--the cumulative APS load from all of those long-running Workflows adds up.

How to Mitigate

  • Avoid polling patterns where UIs constantly query Workflow state. Instead, push state changes to a database that UIs can read.

Real-Time SLAs and Deadline Management

Businesses with strict service level agreements often implement active monitoring and escalation in their Workflows. This is generally accomplished by setting Timers every [x] minutes to determine if an SLA deadline is approaching, allowing the Workflow to trigger escalations or alerts.

Each of these Timers/monitoring actions affect APS. When you have thousands of in-flight Workflows all actively monitoring their own SLAs, the background load becomes significant. You're consuming substantial APS capacity even when Workflows aren't doing their primary work.

How to Mitigate

  • Use longer monitoring intervals where possible. For example, check SLAs every 30 minutes rather than every 1 minute.
  • Where possible, consolidate Timers. Rather than 10 Timers that check 10 tasks, have 1 Timer and then check those 10 tasks.
  • Where possible, have an external system signal your Workflow rather than using short-lived Timers to poll.
  • For retries, use exponential backoff with reasonable initial intervals.

Additional Design Patterns

There are some design patterns that can lead to high APS that are consistent across many different types of business use cases.

Many Small Activities

Consider two approaches to processing 1,000 records:

  • Approach A: Create a Workflow that spawns 1,000 separate activities, one per record.
  • Approach B: Create a Workflow that spawns 10 activities, each processing 100 records in a batch.

Approach B will clearly result in less APS. This is a simple example, but this pattern shows up everywhere: processing individual transactions versus batches, sending individual notifications versus bulk operations, or making separate API calls versus batch endpoints. Each separate Activity adds Action overhead.

How to Mitigate

Multiple Use Cases in One Namespace

Often when starting with Temporal, the first use case is implemented in a single Namespace, generally one per logical environment. When the second Temporal use case is implemented, it runs in the same Namespace, the same for the third, fourth, etc.

An APS limit is set per Namespace, so multiple use cases with multiple traffic patterns in the same Namespace can exhaust this limit quickly.

How to Mitigate

Plan for a set of Namespaces (one per environment) per use case. See Managing a Namespace for more details.

Provisioned Capacity and TRUs

The strategies above help you design Workflows that use actions efficiently. But sometimes you need more capacity than the on-demand model provides, especially for spiky or unpredictable workloads.

Temporal Cloud offers two Capacity Modes:

  • On-Demand mode (default): Your Namespace automatically scales based on your trailing 7-day usage. This works well for steady, predictable workloads.
  • Provisioned mode: You reserve capacity by adding Temporal Resource Units (TRUs), giving you guaranteed headroom for traffic spikes.

See Capacity Modes for complete details on TRUs, available increments, and how to manage capacity via UI, CLI, or API.

Choosing the Right Approach

Use Provisioned capacity when the on-demand model can't respond quickly enough:

ScenarioPatternRecommendation
Planned spikesPromotions, holiday traffic, product launchesPre-provision TRUs before the event starts
Unplanned spikesSudden traffic surges, viral eventsReact instantly via UI/CLI/API when you see throttling
Load testingValidating new services at scaleProvision TRUs for the test, deprovision after
Batch jobsScheduled high-throughput jobsAutomate TRU scaling via API around job schedules
MigrationsOnboarding a new workload faster than on-demand adjustsBridge with TRUs for approximately 7 days while the on-demand envelope catches up
note

When switching back to on-demand mode, your APS limit resets to the running average from the last 7 days. Plan for this if your workload is sensitive to the transition.

Cost Optimization Tips

See Capacity Modes Pricing for billing details. To minimize costs:

  • Provision only when you need extra capacity
  • Deprovision promptly after spikes end
  • For predictable patterns, automate scaling to minimize time in provisioned mode

Automation Best Practices

Since you understand your workload patterns better than any auto-scaling system, consider building your own TRU automation:

  • Use the Cloud Ops API, Terraform Provider, or tcld CLI to programmatically scale capacity based on your application's signals
  • Set utilization thresholds: For example, scale up when hitting 70-80% of your limit, scale down after sustained low usage
  • Schedule capacity changes: Use Temporal Schedules or Workflows to increase TRUs before known events
  • React to leading indicators: If your application has upstream signals (incoming order queue depth, marketing campaign start), use those to trigger capacity changes proactively

Knowing if You're Hitting APS Limits

In addition to understanding the patterns that can affect APS limits on a Temporal Namespace, it's also important to know if you're approaching (or exceeding) these limits. Temporal Cloud provides several metrics that, if tracked, will tell you if you're being rate limited due to APS. See the documentation on detecting resource exhaustion for an explanation of those metrics as well as a sample Grafana dashboard that shows how they could be viewed.

Monitoring for TRU Decisions

If you're considering Provisioned capacity, set up monitoring to understand your usage patterns:

  • Use OpenMetrics: For real-time visibility into APS consumption, integrate Temporal Cloud metrics with your observability stack
  • Track APS usage vs. limits: Monitor temporal_cloud_v0_resource_exhausted_errors to detect throttling events
  • Set alerts at 70-80% utilization: This gives you time to provision TRUs before hitting limits
  • Analyze historical patterns: Understanding your traffic patterns helps you decide between reactive TRU provisioning and proactive automation

Key Takeaways

Let's recap the main reasons customers hit APS limits and how to address them:

Reason for Hitting APS LimitsHow to Address It
Bursty TrafficImplement application-level queuing or rate limiting to smooth spike, stagger start times for scheduled batch operations.
Cascading Workflows
and Fan-Out Patterns
Evaluate if Child Workflows are necessary (consider activities or another Namespace), limit fan-out size by processing work in batches within a Child Workflow, consider flattening deeply nested hierarchies.
Human-in-the-Loop
Processes at Scale
Design long-running Workflows to minimize sustained APS load from interaction (by avoiding polling where UIs constantly Query state and using Signals only for key human inputs).
Many small activitiesConsider if you can combine multiple external calls within a single Activity. If processing a large amount of data, process it in chunks.
Multiple use cases
in one Namespace
Plan for a set of Namespaces (one per environment) per use case.
Planned traffic spikesPre-provision TRUs before the event, then deprovision after.
Unpredictable spikes
requiring instant response
Switch to Provisioned mode for self-service capacity scaling via UI, CLI, or API.
Load testing at scaleProvision TRUs for the test duration, deprovision when complete.
New workload onboardingBridge with TRUs while the on-demand envelope adjusts (approximately 7 days).

General guidance

When designing Temporal Workflows with an eye toward APS limits, ask yourself the following questions:

  • How many actions will a single execution of this Workflow consume?
  • How many Workflows will typically be running at the same time?
  • What happens to APS consumption when the number of Actions * number of active Workflows scales to 100x current volume?
  • Are there natural opportunities to combine operations: combine activities, or process chunks of data together?
  • Am I polling when I could be using Signals?
  • Does this Workflow need to run continuously, or can it be event-driven?

A few hours spent optimizing Workflow design can save you from capacity crunches, emergency limit increases, and potentially significant cost increases down the road.