Problems with GraphQL

Shem
February 25, 2021

GraphQL is overrated

GraphQL is an incredible piece of technology that has captured a lot of attention since I first started using it in production in 2018. You don’t have to scroll far back on this (rather inactive) blog to see I was once a big fan. After building many a React SPA on top of a hodge-podge of untyped JSON REST APIs, I found GraphQL a breath of fresh air. I was truly on the GraphQL hype train.

However, as the years have gone on and I’ve deployed to environments where non-functional requirements like security, performance, and maintainability were more of a concern, my perspective has changed. In this article, I’ll take you through why today, I would not recommend GraphQL to most people, and what I think are better alternatives.

I’ll use Ruby code with the excellent graphql-ruby library for examples, but I believe many of these problems are ubiquitous across different languages and GraphQL libraries. If you know of better solutions and mitigations, please do leave a comment. Now, let’s begin…

Attack Surface 🚨

It was obvious from GraphQL’s beginning that exposing a query language to untrusted clients increases the attack surface of the application. Nevertheless, the variety of attacks to consider was even broader than I imagined, and mitigating them is quite a burden. Here’s the worst I’ve had to deal with over the years…

Authorisation 🚫

Authorisation issues in GraphQL remind me of the immune system trying to distinguish between friend and foe. Initially, authorising objects seems like enough, but this quickly becomes insufficient. For example, say we are the Facebook API:

query {
  user(id: 789) {
    name # ✅ I am allowed to view Users' public info
    email  # 🛑 I shouldn't be able to see their PII just because I can view the User
  }
  user(id: 456) {
    blockedUsers {
      # 🛑 And sometimes I shouldn't even be able to see their public info,
      # because context matters!
      name
    }
  }
}

One wonders how much GraphQL holds responsibility for Broken Access Control climbing to the OWASP Top 10’s #1 spot. One mitigation here is to make your API secure by default by integrating with your GraphQL library’s authorisation framework. Every object returned and/or field resolved, your authorisation system is called to confirm that the current user has access.

Compare this to the REST world where, generally speaking, you would authorise every endpoint, a far smaller task.

Rate Limiting 🚦

With GraphQL, we cannot assume that all requests are equally hard on the server. There is no limit to how big a query can be. Even in a completely empty schema, the types exposed for introspection are cyclical, so it’s possible to craft a valid query that returns megabytes of JSON:

query {
  __schema {
    types {
      __typename
      interfaces {
        possibleTypes {
          interfaces {
            possibleTypes {
              name
            }
          }
        }
      }
    }
  }
}

I just tested this attack against a very popular website’s GraphQL API explorer and got a 500 response back after 10 seconds. I just ate 10 seconds of someone’s CPU time running this (whitespace removed) 128-byte query, and it doesn’t even require me to be logged in.

A common mitigation for this attack is to:

Estimate the complexity of resolving every single field in the schema, and abandon queries that exceed some maximum complexity value.
Capture the actual complexity of the run query and take it out of a bucket of credits that resets at some interval.

This calculation is a delicate affair to get right. It gets particularly tricky when you are returning list fields whose length is not known prior to execution. You can make an assumption about the complexity of these, but if you are wrong, you may end up rate limiting valid queries or not rate limiting invalid queries.

To make matters worse, it’s common for the graph that makes up the schema to contain cycles. Let’s say you run a blog with Articles which each have multiple Tags, from which you can see associated Articles.

type Blog {
  title: String
  categories: [Category]
}
type Category {
  name: String
  relatedCategories: [Category]
}

When estimating the complexity of Category.relatedCategories, you might assume that a blog will never have more than 5 categories, so you set this field’s complexity to 5 (or 5 its children’s complexity). The problem here is that Category.relatedCategories can be its own child, so your estimate’s inaccuracy can compound exponentially. The formula is N^5 1. So given this query:

query {
  category(name: "technology") {
    relatedCategories {
      relatedCategories {
        relatedCategories {
          relatedCategories {
            relatedCategories { name }
          }
        }
      }
    }
  }
}

You expect a complexity of 5^5 = 3,125. If an attacker is able to find a Blog with 10 categories, they can trigger a query with a “true” complexity of 10^5 = 100,000, 20x greater than estimated.

A partial mitigation here is to prevent deeply nested queries. However, the example above demonstrates that this is not really a defense, as it’s not an unusually deep query. GraphQL Ruby’s default maximum depth is 13; this is just 7.

Compare this to rate limiting a REST endpoint, which generally have comparable response times. In this case, all you need is a bucketed rate limiter that prevents a user exceeding, say, 200 requests per minute across all endpoints. If you do have slower endpoints (say, a CSV report or PDF generator) you can define more aggressive rate limits for these. With some HTTP middleware, this is pretty trivial:

Rack::Attack.throttle('API v1', limit: 200, period: 60) do |req|
  if req.path =~ '/api/v1/'
    req.env['rack.session']['session_id']
  end
end

Query Parsing 📜

Before a query is executed, it is first parsed. We once received a pen-test report evidencing that it’s possible to craft an invalid query string that OOM’d the server. For example:

query {
  __typename @a @b @c @d @e ... # imagine 1k+ more of these
}

This is a syntactically valid query, but invalid for our schema. A spec-compliant server will parse this and start building an errors response containing thousands of errors which we found consumed 2,000x more memory than the query string itself. Because of this memory amplification, it’s not enough to just limit the payload size, as you will have valid queries that are larger than the smallest dangerous malicious query.

If your server exposes a concept of the maximum number of errors to accrue before abandoning parsing, this can be mitigated. If not, you’ll have to roll your own solution. There is no REST equivalent to this attack of this severity.

Performance 🚀

Performance issues in GraphQL can feel like trying to move through a molasses swamp. When it comes to performance in GraphQL, people often talk about its incompatibility with HTTP caching. For me personally, this has not been an issue. For SaaS applications, data is usually highly user-specific, and serving stale data is unacceptable, so I have not found myself missing response caches (or the cache invalidation bugs they cause…).

The major performance problems I did find myself dealing with were…

Data Fetching and the N+1 Problem 🔄

The N+1 problem is akin to the traveling salesman problem in the realm of algorithms, causing inefficiencies and slowing down processes. I think this issue is pretty widely understood nowadays. TL;DR: If a field resolver hits an external data source such as a DB or HTTP API, and it is nested in a list containing N items, it will do those calls N times.

This is not a unique problem to GraphQL, and actually, the strict GraphQL resolution algorithm has allowed most libraries to share a common solution: the Dataloader pattern. Unique to GraphQL though is the fact that since it is a query language, this can become a problem with no backend changes when a client modifies a query. As a result, I found you end up having to defensively introduce the Dataloader abstraction everywhere just in case a client ends up fetching a field in a list context in the future. This is a lot of boilerplate to write and maintain.

Meanwhile, in REST, we can generally hoist nested N+1 queries up to the controller, which I think is a pattern much easier to wrap your head around:

class ArticlesController < ApplicationController
  def index
    @recent_articles = Article.limit(25).includes(:author, :tags)
    render json: ArticleSerializer.render(@recent_articles)
  end

  def show
    # No prefetching necessary here since N=1
    @article = Article.find(params[:id])
    render json: ArticleSerializer.render(@article)
  end
end

Authorisation and the N+1 Problem 🛂

But wait, there’s more N+1s! If you followed the advice earlier of integrating with your library’s authorisation framework, you’ve now got a whole new category of N+1 problems to deal with. Let’s continue with our Facebook API example from earlier:

class UserType < GraphQL::BaseObject
  field :name, String
  field :address, authorize_with: :view_private_info
end



class QueryType < GraphQL::BaseObject
  field :me, UserType
end

Given the query:

query {
  me {
    friends { # returns N Users
      handle
      address # runs UserPolicy#view_private_info? N times
    }
  }
}

This is actually trickier to deal with than our previous example because authorisation code is not always run in a GraphQL context. It may, for example, be run in a background job or an HTML endpoint. That means we can’t just reach for a Dataloader naively, because Dataloaders expect to be run from within GraphQL (in the Ruby implementation anyway).

In my experience, this is actually the biggest source of performance issues. We would regularly find that our queries were spending more time authorising data than anything else. Again, this problem simply does not exist in the REST world.

I have mitigated this using nasty things like request-level globals to memoise data across policy calls, but it’s never felt great.

Coupling 🧶

Coupling in GraphQL is like the way the roots of a tree intertwine and depend on one another, making it hard to separate one part without affecting others. In my experience, in a mature GraphQL codebase, your business logic is forced into the transport layer. This happens through a number of mechanisms, some of which we’ve already talked about:

Solving data authorisation leads to peppering authorisation rules throughout your GraphQL types.
Solving mutation/argument authorisation leads to peppering authorisation rules throughout your GraphQL arguments.
Solving resolver data fetching N+1s leads to moving this logic into GraphQL specific dataloaders.
Leveraging the (lovely) Relay Connection pattern leads to moving data fetching logic into GraphQL specific custom connection objects.

The net effect of all of this is to meaningfully test your application, you must extensively test at the integration layer, i.e., by running GraphQL queries. I have found this makes for a painful experience. Any errors encountered are captured by the framework, leading to the fun task of reading stack traces in JSON GraphQL error responses. Since so much around authorisation and Dataloaders happens inside the framework, debugging is often much harder as the breakpoint you want is not in application code.

And of course, again, since it’s a query language, you’re going to be writing a lot more tests to confirm that all those argument and field-level behaviours we mentioned are working correctly.

Complexity 🧩

Taken in aggregate, the various mitigations to security and performance issues we’ve gone through add significant complexity to a codebase. It’s not that REST does not have these problems (though it certainly has fewer), it’s just that the REST solutions are generally much simpler for a backend developer to implement and understand.

And More… 📜

So those are the major reasons I am, for the most part, over GraphQL. I have a few more peeves, but to keep this article from growing further I’ll summarise them here:

GraphQL discourages breaking changes and provides no tools to deal with them. This adds needless complexity for those who control all their clients, who will have to find workarounds.
Reliance on HTTP response codes turns up everywhere in tooling, so dealing with the fact that 200 can mean everything from everything is OK through to everything is down can be quite annoying.
Fetching all your data in one query in the HTTP 2+ age is often not beneficial to response time; in fact, it will worsen it if your server is not parallelised, versus sending separate requests to separate servers to process in parallel.

Alternatives 🚀

OK, end of the rant. What would I recommend instead? To be upfront, I am definitely early in the hype cycle here, but right now my view is that if you:

Control all your clients
Have ≤3 clients
Have a client written in a statically typed language
Are using >1 language across the server and clients

You are probably better off exposing an OpenAPI 3.0+ compliant JSON REST API. If, as in my experience, the main thing your frontend devs like about GraphQL is its self-documenting, type-safe nature, I think this will work well for you. Tooling in this area has improved a lot since GraphQL came on the scene; there are many options for generating typed client code, even down to framework-specific data fetching libraries. My experience so far is pretty close to “the best parts of what I used GraphQL for, without the complexity Facebook needed.”