API Design Best Practices for long-running Operations: GraphQL vs REST

API Design Best Practices for long-running Operations: GraphQL vs REST

ยท

12 min read

Featured on Hashnode

I've recently read an article where the author stated that GraphQL is "not appropriate for long-running operations". I'd like to show that GraphQL can very well be used for long-running operations.

We'll also take a look at how such a problem can be solved using a traditional REST API and compare the two approaches. What we'll end up seeing is that a GraphQL Schema makes it a lot easier to reason about long-running operations from a developer's perspective. The REST approach on the other hand is a lot easier to implement.

As this is not a marketing post but purely a technical one, a few words upfront on what we do: WunderGraph puts a JSON RPC layer in front of all your APIs (GraphQL, REST, Databases, etc.). Combined with a generated client, we make using APIs as easy as possible, highly secure and performant. It's like all your services become a single GraphQL Server, secured by a JSON RPC Layer. With that, let's focus on schema design now.

Let's now use an example to illustrate and compare the two approaches to designing an API for long-running operations.

What actually is a long-running operation?

Let's imagine we're building a SaaS that uses machine learning to detect the sentiment of a blog post. You'll submit a link to the blog post and the service will analyze the sentiment of the post and return it to you.

This operation might take a few seconds or even a minute to complete. From your own perspective, when you click a button on the web, how long until you get nervous when watching the progress bar? You will probably allow the service to run for a few seconds if you understand the complexity of the task. However, we're used to seeing some kind of progress after less than 5 seconds, at least when we're using a desktop browser.

Let's just assume that our "sentiment analysis" service takes more than 5 seconds to complete.

If we're not able to respond within 5 seconds, how can we still provide a good user experience?

If there's simply a loading indicator, and we're waiting for the request to finish, the user might cancel the request at any time, close the browser, or even just leave the page. They might also just hit escape to cancel the request and try it again.

What we need is an API that understands the notion of a long-running operation. Instead of just calling this long-running operation and waiting for a synchronous response, we should design an asynchronous API that allows for easy monitoring of the progress of the operation and even cancellation.

Designing a synchronous REST API for long-running operations

If we were to design our API as a synchronous API, it would probably look like this:

curl -X POST -H "Content-Type: application/json" -d '{"url": "https://www.wundergraph.com/blog/long_running_operations"}' http://localhost:3000/api/v1/analyze_sentiment

This is how the response could look like:

{
  "status": "success",
  "data": {
    "url": "https://www.wundergraph.com/blog/long_running_operations",
    "sentiment": "positive"
  }
}

However, as we've said earlier, this operation might take forever to complete and the user might cancel it.

A better approach would be to design our API as an asynchronous API.

Designing an asynchronous REST API for long-running operations

Let's now turn the synchronous API into an asynchronous API.

Instead of returning a response immediately, we should return a response with a unique identifier so that the client can poll the server for the result.

The proper way to design such an API is by returning the 202 Accepted status code.

The request has been received but not yet acted upon. It is noncommittal, since there is no way in HTTP to later send an asynchronous response indicating the outcome of the request. It is intended for cases where another process or server handles the request, or for batch processing.

Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

So, in this case, our API response could look like this. It'll be a response with the status code 202 and the following body.

{
  "data": {
    "url": "https://www.wundergraph.com/blog/long_running_operations",
    "id": 1
  },
  "links": [
    {
      "rel": "status",
      "href": "http://localhost:3000/api/v1/analyze_sentiment/1/status"
    },
    {
      "rel": "cancel",
      "href": "http://localhost:3000/api/v1/analyze_sentiment/1/cancel"
    }
  ]
}

Instead of returning the result immediately, we're returning a list of links so that the caller of the API can get the current status of the job or cancel it.

The client can then use the following command to get the status of the job:

curl -X GET http://localhost:3000/api/v1/analyze_sentiment/1/status

Excellent, we've now designed our API to be asynchronous.

Next, we'll look at how GraphQL can be used to design a similar API.

Designing an asynchronous GraphQL API for long-running operations

Similarly to REST, we could implement an asynchronous API to poll the server for the status of the job using GraphQL. However, GraphQL doesn't just support Queries and Mutations but also Subscriptions. This means, we've got a better way of designing our API without forcing the client to poll the server.

Here's our GraphQL schema:

type Job {
  id: ID!
  url: String!
  status: Status!
  sentiment: Sentiment
}

enum Status {
    queued
    processing
    finished
    cancelled
    failed
}

enum Sentiment {
  positive
  negative
  neutral
}

type Query {
    jobStatus(id: ID!): Job
}

type Mutation {
  createJob(url: String!): Job
  cancelJob(id: ID!): Job
}

type Subscription {
  jobStatus(id: ID!): Job
}

Creating the Job would look like this:

mutation ($url: String!) {
  createJob(url: $url) {
    id
    url
    status
  }
}

Once we've got the id back, we can subscribe to Job changes:

subscription ($id: ID!) {
  jobStatus(id: $id) {
    id
    url
    status
    sentiment
  }
}

It's a good start, but we can even improve this schema further. In its current state, we have to make "sentiment" nullable because the field will only have a value if the job is finished.

Let's make our API more intuitive:

interface Job {
  id: ID!
  url: String!
}

type SuccessfulJob implements Job {
  id: ID!
  url: String!
  sentiment: Sentiment!
}

type QueuedJob implements Job {
  id: ID!
  url: String!
}

type FailedJob implements Job {
  id: ID!
  url: String!
  reason: String!
}

type CancelledJob implements Job {
  id: ID!
  url: String!
  time: Time!
}

enum Sentiment {
  positive
  negative
  neutral
}

type Query {
    jobStatus(id: ID!): Job
}

type Mutation {
  createJob(url: String!): Job
  cancelJob(id: ID!): Job
}

type Subscription {
  jobStatus(id: ID!): Job
}

Turning the Job into an interface makes our API much more explicit. We can now subscribe to the job status using the following subscription:

subscription ($id: ID!) {
  jobStatus(id: $id) {
    __typename
    ... on SuccessfulJob {
      id
      url
      sentiment
    }
    ... on QueuedJob {
      id
      url
    }
    ... on FailedJob {
      id
      url
      reason
    }
    ... on CancelledJob {
      id
      url
      time
    }
  }
}

Only if the __typename field is set to "SuccessfulJob"will the sentiment field be returned.

Comparing REST and GraphQL for long-running operations

As we can see from the above examples, both REST and GraphQL can be used to design asynchronous APIs. Let's now open up a discussion on the pros and cons of each approach.

To start this off, let me say that it should be very clear that both approaches to asynchronous APIs are better than their synchronous counterparts. Independent of your choice of using REST or GraphQL, when an operation takes more than a few seconds to complete, I'd always suggest that you design your API in an asynchronous way.

Now let's look into the tiny little details that make the difference.

Hypermedia controls vs. GraphQL type definitions

One huge benefit of the REST approach is that everything is a resource, and we can leverage hypermedia controls. Let me translate this "jargon" to simple words:

One of the core concepts of REST APIs is that every "thing" can be accessed through a unique URL. If you submit a Job to the API, you'll get back a URL that you can use to check the status of the job.

In comparison, GraphQL has only one endpoint. If you submit the Job via a GraphQL mutation, what you get back is an id of type ID!. If you want to check the status of the job, you have to use the id as an argument on the correct root field of the Query or Subscription type. As a developer, how do you know the relationship between the Job id and the root fields of the Query or Subscription type? Unfortunately, you don't!

If you want to be nice when designing your GraphQL schema, you could put this information into the description of the fields. However, the GraphQL specification doesn't allow us to make these "links" explicit, like in REST.

The same rule applies for the cancellation of a job. With the REST API, you can return a URL to the client which can be called to cancel the job.

With GraphQL, you have to know that you have to use the cancelJob mutation to cancel the job and pass the id as an argument.

This might sound a bit like exaggeration, as we're using a very small schema, but imagine if we had a few hundred root fields on both our Query and Mutation types. It might become very hard to find the correct root field to use.

REST APIs can have a Schema, GraphQL APIs must have a Schema

The lack of resources and unique URLs seems to be a weakness of GraphQL. However, we can also make an argument the other way around.

It's possible to return links with actions in REST APIs. That said, it's not obvious what links we're getting back from submitting the Job. Additionally, we also don't know e.g. if the cancellation URL should be called using POST or GET. Or maybe we should just DELETE the job?

There exists additional tooling to help with this. One such tool/specification is Siren by Kevin Swiber.

If you want to design good REST APIs, you should definitely look into solutions like Siren.

That said, considering the fact that REST APIs are the dominant API style, Siren is more than 5 years old and only has 1.2k stars indicates a problem.

For me, it seems like good (REST) API design is optional. Most developers build simple CRUD-style APIs instead of leveraging the power of Hypermedia.

GraphQL on the other hand doesn't allow you to build Hypermedia APIs due to its lack of resources. However, the Schema is mandatory in GraphQL, forcing developers to make their CRUD-style APIs more explicit and type-safe.

In my personal opinion, Hypermedia APIs are a lot more powerful that CRUD-style APIs, but this power comes at a cost and adds complexity. It's this complexity that make GraphQL a better choice for most developers.

As a developer of REST APIs, you "can" use Siren, but most developers just don't care. As a developer of GraphQL APIs, you "must" have a Schema, there's no way around it.

If you look at the second version of our GraphQL schema, the use of interfaces helps us make the API very explicit and type-safe. It's not perfect, we're still lacking "links", but it's a very good trade-off.

Polling vs. Subscriptions

Subscribing to the status of a Job is way more elegant than polling for the status, it's obvious. From a mental model of the API user, it's much more intuitive to subscribe to an event stream than polling for the status.

That said, nothing comes for free.

To be able to use Subscriptions, you usually have to use WebSockets. WebSockets, being stateful come with a cost. I wrote about this topic extensively in another blog post.

Adding WebSockets to your stack also means a lot more complexity for the API backend. Does your hosting provider support WebSockets? Some Serverless environments don't allow long-running operations or simply deny HTTP Upgrade requests. WebSocket connections also scale differently than short-lived HTTP connections.

WebSockets are also an HTTP 1.1 feature, which means that you can't use them with HTTP/2. If your website is using HTTP/2 for all endpoints, clients have to open up another TCP connection for the WebSocket.

Additionally, WebSockets might not work in all environments, e.g. if you're behind a reverse proxy or if you're using a load balancer.

HTTP polling on the other hand is a very simple and boring solution. So, while GraphQL Subscriptions offer a simpler mental model to the API user, they come with a big cost in terms of implementation.

Keep in mind that you're not forced to use Subscriptions with GraphQL. You can still use HTTP polling for the status of a Job by using a Query instead of a Subscription.

Conclusion

Both REST and GraphQL API styles are great tools to design synchronous and asynchronous APIs. Each of them has its own strengths and weaknesses.

GraphQL is more explicit about the Schema and its type system. REST, on the other hand, can be a lot more powerful thanks to unique URLs and Hypermedia controls.

I personally like the approach of Siren a lot. However, the lack of an explicit Schema for REST APIs leaves too much room for interpretation to the average developer.

With the right tooling and good API governance, you should be able to design great REST APIs.

One could argue that GraphQL comes with more features out of the box and needs less governance, but I don't think this is true. As you can see from the two versions of our GraphQL schema, there's a lot of flexibility in the design of a GraphQL Schema. Even the second version of the Schema could be improved further.

In the end, I don't see how one solution is much better than another. It's a lot more important to put effort into your API design than choosing between REST or GraphQL.

Talk to your users and figure out how they want to use your API. Are they used to REST APIs or GraphQL APIs? Would they benefit from Subscriptions over WebSockets or do they prefer simple boring polling?

Maybe you don't even have to choose between REST and GraphQL. If you can build a great REST API, you can easily wrap it with GraphQL, or the other way around. This way, you can offer two API styles to your users, if that brings value to them.

Your takeaway should be that good API design and talking to your users is a lot more important than choosing cool fancy tech.

Cheers!

What to read next

This is a curated list of articles that I think you'll find interesting.

In the WunderHub Announcement, I talk about how WunderHub will change the way we share and collaborate on APIs. It allows you to share APIs like npm packages.

How automating API integrations benefits your business is dedicated to C-level executives who want to learn more about the business benefits of automating API integrations.

Another interesting topic is to JOIN APIs without Schema Stitching or Federation, just by using a single GraphQL Operation

For those interested in the most common GraphQL Security vulnerabilities, I suggest to read about them and how WunderGraph helps you to avoid them.

A classic post but still relevant is I believe that GraphQL is not meant to be exposed over the Internet. It's a controversial topic and many misunderstand it. But think about it, why is HTTP not mentioned a single time in the GraphQL specification?

One very common problem of using GraphQL is the Double Declaration Problem, the problem of declaring your types over and over again. This post explains that it's even more complicated than just double declaration and how we can solve it.

The Fusion of GraphQL REST and HTTP/2 is a very long post, probably too long for a blog post. But if you're interested in a deep dive on the motivations behind creating WunderGraph, this is the post for you.

ย