API rate limits

Last updated July 14th at 11:57am

Centra’s Integration API Rate Limits#

To keep our platform stable for all clients, we limit how often some APIs can be used. We use different methods to enforce these limits. In addition, as part of our fair usage policy, we ask all developers to follow industry best practices avoiding unnecessary API calls and unnecessary API call complexity, caching results, and retrying requests in a responsible way.

Rate Limits in Centra’s Integration API#

Centra’s Integration API uses several different methods for rate limiting. They are further described below. The key standard limits enforced per Centra environment are:

	Standard limit per 10 seconds	Standard limit per 1 hour
Request limit Number of HTTP requests in a time period	40 requests	3600 requests
Mutation limit Number of mutations in a time period	20 mutations	1800 mutations
Query complexity limit The aggregated complexity of queries in a time period	200 000 complexity points	25 000 000 complexity points

Centra clients may have higher rate limits defined in their agreements with Centra. It is possible to query the Integration API to see what rate limits are enforced for a particular client.

Rate limits of the Integration API apply per Centra environment. This means that requests made by all integrations share the same rate limits. One integration may use up the limit and thus may cause an integration you are building to be temporarily rate-limited.

We may temporarily lower rate limits to ensure the stability of the platform and its performance for all users.

Your integration must be built to handle lowered rate limits gracefully and to behave as a good citizen in an environment where multiple integrations share the same rate limits for the same client. See Avoiding hitting the Rate Limits.

Integration API token bucket rate limit algorithm#

The rate limit implementation in Centra’s Integration API is based on the token bucket algorithm. A short explanation of the token bucket algorithm concept:

There is a bucket with a fixed capacity that contains tokens.
The bucket is full at the beginning.
An allowed request consumes one or more tokens.
If the bucket contains enough tokens for a particular request, the request is allowed and tokens are consumed. Otherwise, the request is denied and no tokens are consumed.
The bucket is replenished with new tokens at a constant rate (e.g. 30 tokens every second).
Once the bucket is fully replenished, it stays full until the next request consumes some tokens.

The below diagram shows a token bucket of size 4 that is filled with a rate of one token per 15 minutes:

Gray boxes represent available tokens that were not consumed, blue boxes represent available tokens that were consumed, and red boxes represent an attempt to consume a non-existing token (rate limit exceeded). Reading the diagram from left to right:

At 10:00, the token bucket was full.
At 10:15 there was only one request consuming one token, and it was topped up in the next time frame.
At 10:45 there were two requests, consuming one token each, so the next window has the two remaining plus one available token.
At 11:00 all three remaining tokens were used, and one token was replenished at 11:15.
At 11:30, one token was used and one was replenished.
At 11:45, another token was added to the bucket (for two in total), but a third request was denied because the bucket did not have enough tokens at this point.

Token buckets used for Centra’s Integration API rate limits#

The Integration API uses six token buckets for rate limiting:

Burst request limit: Number of HTTP requests in 10 seconds.
Burst mutation limit: Number of mutations in 10 seconds.
Burst query complexity limit: The aggregated complexity of queries in 10 seconds.
Sustained request limit: Number HTTP requests in 1 hour.
Sustained mutation limit: Number of mutations in 1 hour.
Sustained query complexity limit: The aggregated complexity of queries in 1 hour.

Tokens in each of these buckets are consumed and replenished independently. If at least one of the buckets contains insufficient tokens for a given request, the request is denied.

Token calculation in Centra’s Integration API#

The token calculation logic is straightforward:

For the request limits (burst and sustained), 1 HTTP request costs 1 token.
For the mutation limits (burst and sustained), 1 mutation costs 1 token.
For the query complexity limits (burst and sustained), 1 complexity point costs 1 token. See how we calculate the query complexity here.

What counts as one request? And what about batched mutations?#

Technically speaking, one HTTP request can contain multiple GraphQL operations, and each operation can include multiple top-level fields, which would roughly translate to REST calls. For example:

query threeInOne {
    viewer { name, integrationName }

    counters {
        orders(where: {status: [PENDING]})
    }

    rateLimits {
        type
        intervalSeconds
        quota
        usedQuota
        remainingQuota
    }
}

mutation twoInOne {
    captureShipment(id: 345) {
        userErrors { message, path }
        userWarnings { message, path }
    }

    addOrderNote(input: {
        order: {externalId: "my-id-123"}
        message: "Hello world"
    }) {
        userErrors { message, path }
        userWarnings { message, path }
    }
}

If you send such a document in your request, the GraphQL server needs to know which of the two operations to run; hence the JSON body must also include the operationName parameter. Here's the official specification: link.

Avoid sending extra (not executed) operations in your requests, as it's an inefficient use of bandwidth and server resources.

So, with a JSON body like this

{
    "query": "(as above)",
    "operationName": "twoInOne",
    "variables": {}
}

then, in terms of rate limits, this counts as:

One request for the REQUEST_COUNT buckets.
Two mutations for the MUTATION_COUNT buckets.
Two points for QUERY_COMPLEXITY.

Some GraphQL servers allow for execution of multiple independent operations in one batch by wrapping them in a JSON array (link). This way isn't supported by the Integration API.

Checking the Integration API rate limit status#

The following query can be used to get information about currently enforced rate limits, and available tokens in each token bucket:

query {
  rateLimits {
   type
   intervalSeconds
   quota
   usedQuota
   remainingQuota
  }
}

The result returned is a list of six objects, which represent the six token buckets:

type – type of the rate limit managed by this token bucket, represented as an enum RateLimitType that can have 3 possible values:
- REQUEST_COUNT
- QUERY_COMPLEXITY
- MUTATION_COUNT
intervalSeconds – the time it takes to replenish an empty bucket:
- TEN_SECONDS and
- ONE_HOUR
quota – size of the bucket, the maximum amount of tokens that will ever fit in the bucket
usedQuota – how many tokens have been consumed by requests and not yet replenished
remainingQuota – how many tokens remain in the bucket and are available for consumption

The query consumes 1 request and 10 complexity points.

Testing Rate Limits#

No matter how good your integration is, it can still encounter the HTTP "429 Too Many Requests" status code sometimes and must handle it correctly. To see how such a response looks like and simplify testing these scenarios you can include a special header: X-Trigger-Rate-Limit-Error: true. The response will contain a "Retry-After" header with a date formatted according to the RF2822 format (https://www.rfc-editor.org/rfc/rfc2822.html), 10 seconds into the future. For example: Fri, 21 Mar 2025 19:15:55 GMT.

Avoiding hitting the Rate Limits#

To avoid hitting rate limits, it is critical to follow industry best practices for efficient API usage. This means avoiding any unnecessary and unnecessarily complex requests, caching data, monitoring error messages, and backing off when requested by the API.

Architecture
- Use the Integration API for asynchronous backend integrations only
  Never use the Integration API for serving data to a frontend website. Serving data to a frontend (even with a proxy) means that the rate limits will very likely be hit at periods of high website traffic, with user-visible errors as a result. Use the DTC API for serving frontends instead, which offer several orders of magnitude more throughput and lower latency.
- Ensure the data model is efficient
  A custom data model set up in Centra that is inefficient may lead to a need to mutate the same data multiple times (e.g. updating a "hand wash only" icon file that’s attached to each of 10,000 products separately as opposed to using a single file shared between the products).
  Do not use Dynamic Attributes for data that could be normalized by using Mapped Attributes. This is especially important for translatable attributes. See more about attributes in Centra.
- Use caching for data that your app uses often
  If you need to access some data frequently, cache it. Some data changes very seldom (e.g., markets, stores, countries, pricelists, product catalog).
- Subscribe to events to update your cached data
  You should subscribe to events for cache invalidation, rather than brute force poll the API repeatedly. See more about the events.
- Only mutate data that has changed
  Keep track of mutations that have already been made. Don’t attempt to mutate data that is already up to date in Centra.
Craft requests
- Optimize your code to get only data it needs
  Select only necessary fields, especially for nested objects and lists. Sometimes, you can use a different query to simplify the structure. Avoid deep nesting: even if GraphQL is flexible enough to be able to query "everything" in one go, sometimes it's more efficient to issue additional queries instead.
- Use the most comprehensive mutation for your task
  For some common tasks, the Integration API offers mutations that conveniently carry out multiple activities in just one mutation. For example, if syncing a product with a variant and sizes, use the mutation for that, rather than multiple mutations in sequence.
- For batch jobs, use batch operations where available
  The Integration API offers some batch operations, geared for larger import jobs. Use those if available for batch tasks, and monitor the status.
Handle limiting
- Smoothen out the rate of requests
  Regulate the rate of requests to ensure a smooth distribution. This especially applies if you send requests asynchronously, which enables sharper load spikes (sudden spikes are more likely to get rate-limited).
- Provide a great user experience while they wait for large operations
  Syncing a large amount of data to or from the Integration API will take time, whether running multiple mutations in sequence or a batch job. Give your users clear information about the progress and status of large jobs, such as adding a new collection of products to Centra or loading an empty data warehouse with historical data.
- Handle errors appropriately
  Requests that result in user errors (and warnings) should be handled appropriately in order to prevent spamming the API and to ensure your integration can recover gracefully after having been rate-limited.
- Respect the backoff time
  When your integration gets rate-limited, the response will be returned with HTTP status code 429. It will contain the Retry-After HTTP header with the timestamp of when you should resume your requests. Your integration should wait until that time passes to make any further requests.

Are you still hitting Rate Limits?#

The rate limits have been designed based on careful consideration, with the belief to be sufficient for the vast majority of use cases. Please get in touch with our support for guidance if you are struggling to stay within the rate limits.

Are you still in need of higher rate limits, despite following the guidelines? We are able to offer higher rate limits as an additional (paid) service. Please get in touch with us for more details.