Development

Development

To get info about new technologies, perspective products and useful services

BigData

BigData

To know more about big data, data analysis techniques, tools and projects

Refactoring

Refactoring

To improve your code quality, speed up development process

Category: GraphQL

Python & Graphql. Tips, tricks and performance improvements.

Python & Graphql. Tips, tricks and performance improvements.


Recently I’ve finished another back-end with GraphQL, but now on Python. In this article I would like to tell you about all difficulties I’ve faced and narrow places which can affect the performance.

Technology stack: graphene + flask and sqlalchemy integration. Here is a piece of requirements.txt:

graphene
graphene_sqlalchemy
flask
flask-graphql
flask-sqlalchemy
flask-cors
injector
flask-injector

This allow me to map my database entities directly to GraphQL.

It looks like this:

The model:

class Color(db.Model):
  """color table"""
  __tablename__ = 'colors'

  color_id = Column(BigInteger().with_variant(sqlite.INTEGER(), 'sqlite'), primary_key=True)
  color_name = Column(String(50), nullable=False)
  color_r = Column(SmallInteger)
  color_g = Column(SmallInteger)
  color_b = Column(SmallInteger)

The node:

class ColorNode(SQLAlchemyObjectType):
  class Meta:
    model = colours.Color
    interfaces = (relay.Node,)

  color_id = graphene.Field(BigInt)

Everything is simple and nice.

But what are the problems?

Flask context.

At the time of writing this article I was unable to send my context to the GraphQL.

app.add_url_rule('/graphql',
                 view_func=GraphQLView.as_view('graphql',
                 schema=schema.schema,
                 graphiql=True,
                 context_value={'session': db.session})
                 )

This thing didn’t work for me, as view in flask-graphql integration was replaced by flask request.

Maybe this is fixed now, but I have to subclass GrqphQLView to save the context:

class ContexedView(GraphQLView):
  context_value = None

  def get_context(self):
    context = super().get_context()
    if self.context_value:
      for k, v in self.context_value.items():
        setattr(context, k, v)
    return context

CORS support

It is always a thing I forget to add 🙂

For Python Flask just add flask-cors in your requirements and set it up in your create_app method via CORS(app). That’s all.

Bigint type

I had to create my own bigint type, as I use it in the database as primary key in some columns. And there were graphene errors when I try to send int type.

class BigInt(Scalar):
  @staticmethod
  def serialize(num):
    return num

  @staticmethod
  def parse_literal(node):
    if isinstance(node, ast.StringValue) or isinstance(node, ast.IntValue):
      return int(node.value)

  @staticmethod
  def parse_value(value):
    return int(value)

Compound primary key

Also, graphene_sqlalchemy doesn’t support compound primary key out of the box. I had one table with (Int, Int, Date) primary key. To make it resolve by id via Relay’s Node interface I had to override get_node method:

@classmethod
def get_node(cls, info, id):
  import datetime
  return super().get_node(info, eval(id))

datetime import and eval are very important here, as without them date field will be just a string and nothing will work during querying the database.

Mutations with authorization

It was really easy to make authorization for queries, all I needed is to add Viewer object and write get_token and get_by_token methods, as I did many times in java before.

But mutations are called bypassing Viewer and its naturally for GraphQL.

I didn’t want to add authorization code in every mutation’s header, as it leads to code duplication and it’s a little bit dangerous, as I may create a backdoor by simply forgetting to add this code.

So I’ve subclass mutation and reimplement it’s mutate_and_get_payload like this:

class AuthorizedMutation(relay.ClientIDMutation):
  class Meta:
    abstract = True

  @classmethod
  @abstractmethod
  def mutate_authorized(cls, root, info, **kwargs):
    pass

  @classmethod
  def mutate_and_get_payload(cls, root, info, **kwargs):
    # authorize user using info.context.headers.get('Authorization')
    return cls.mutate_authorized(root, info, **kwargs)

All my mutations subclass AuthorizedMutation and just implement their business logic in mutate_authorized. It is called only if user was authorized.

Sortable and Filterable connections

To have my data automatically sorted via query in connection (with sorted options added to the schema) I had to subclass relay’s connection and implement get_query method (it is called in graphene_sqlalchemy).

class SortedRelayConnection(relay.Connection):
  class Meta:
    abstract = True

  @classmethod
  def get_query(cls, info, **kwargs):
    return SQLAlchemyConnectionField.get_query(cls._meta.node._meta.model, info, **kwargs)

Then I decided to add dynamic filtering over every field. Also with extending schema.

Out of the box graphene can’t do it, so I had to add a PR https://github.com/graphql-python/graphene-sqlalchemy/pull/164 and subclass connection once again:

class FilteredRelayConnection(relay.Connection):
  class Meta:
    abstract = True

  @classmethod
  def get_query(cls, info, **kwargs):
    return FilterableConnectionField.get_query(cls._meta.node._meta.model, info, **kwargs)

Where FilterableConnectionField was introduced in the PR.

Sentry middleware

We use sentry as error notification system and it was hard to make it work with graphene. Sentry has good flask integration, but problem with graphene is – it swallows exceptions returning them as errors in response.

I had to use my own middleware:

class SentryMiddleware(object):

  def __init__(self, sentry) -> None:
    self.sentry = sentry

  def resolve(self, next, root, info, **args):
    promise = next(root, info, **args)
    if promise.is_rejected:
      promise.catch(self.log_and_return)
    return promise

  def log_and_return(self, e):
    try:
      raise e
    except Exception:
      traceback.print_exc()
      if self.sentry.is_configured:
      if not issubclass(type(e), NotImportantUserError):
        self.sentry.captureException()
    return e

It is registered on GraphQL route creation:

app.add_url_rule('/graphql',
                 view_func=ContexedView.as_view('graphql',
                 schema=schema.schema,
                 graphiql=True,
                 context_value={'session': db.session},
                 middleware=[SentryMiddleware(sentry)]
                )

Low performance with relations

Everything was well, tests were green and I was happy till my application went to dev environment with real amounts of data. Everything was super slow.

The problem was in sqlalchemy’s relations. They are lazy by default. https://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html

It means – if you have graph with 3 relations: Master -> Pet -> Food and query them all, first query will receive all masters (select * from masters`). F.e. you’ve received 20. Then for each master there will be query (select * from pets where master_id = ?). 20 queries. And finally – N food queries, based on pet return.

My advice here – if you have complex relations and lots of data (I was writing back-end for big data world) you have to make all relations eager. The query itself will be harder, but it will be only one, reducing response time dramatically.

Performance improvement with custom queries

After I made my critical relations eager (not all relations, I had to study front-end app to understand what and how they query) everything worked faster, but not enough. I looked at generated queries and was a bit frightened – they were monstrous! I had to write my own, optimized queries for some nodes.

F.e. if I have a PlanMonthly entity with several OrderColorDistributions, each of it having one Order.

I can use subqueries to limit the data (remember, I am writing back-end for big data) and populate relations with existing data (I anyway had this data in the query, so there was no need to use eager joins, generated by ORM). It will facilitates the request.

Steps:

  1. Mark subqueries with_labels=True
  2. Use root’s (for this request) entity as return one:
    Order.query \
      .filter(<low level filtering here>) \
      .join(<join another table, which you can use later>) \
      .join(ocr_query, Order.order_id == ocr_query.c.order_color_distribution_order_id) \
      .join(date_limit_query,
            and_(ocr_query.c.order_color_distribution_color_id == date_limit_query.c.plans_monthly_color_id,
                 ocr_query.c.order_color_distribution_date == date_limit_query.c.plans_monthly_date,
                 <another table joined previously> == date_limit_query.c.plans_monthly_group_id))
  3. Use contains_eager on all first level relations.
    query = query.options(contains_eager(Order.color_distributions, alias=ocr_query))
  4. If you have second layer of relations (Order -> OrderColorDistribution -> PlanMonthly) chain contains_eager:
    query = query.options(contains_eager(Order.color_distributions, alias=ocr_query)
                 .contains_eager(OrderColorDistribution.plan, alias=date_limit_query))

Reducing number of calls to the database

Besides data rendering level I have my service layer, which knows nothing about GraphQL. And I am not going to introduce it there, as I don’t like high coupling.

But each service needs fetched months data. To use all the data only once and have it in all services, I use injector with @request scope. Remember this scope, it is your friend in GraphQL.

It works like a singleton, but only within one request to /graphql. In my connection I just populate it with plans, found via GraphQL query (including all custom filters and ranges from front-end):

app.injector.get(FutureMonthCache).set_months(found)

Then in all services, which need to access this data I just use this cache:

@inject
def __init__(self,
             prediction_service: PredictionService,
             price_calculator: PriceCalculator,
             future_month_cache: FutureMonthCache) -> None:
  super().__init__(future_month_cache)
  self._prediction_service = prediction_service
  self._price_calculator = price_calculator

Another nice thing is – all my services, which manipulate data and form the request have also @request scope, so I don’t need to calculate predictions for every month. I take them all from cache, do one query and store the results. Moreover, one service can rely on other service’s calculated data. Request scope helps a lot here, as it allows me to calculate all data only once.

On the Node side I call my request scope services via resolver:

def resolve_predicted_pieces(self, _info):
  return app.injector.get(PredictionCalculator).get_original_future_value(self)

It allows me to run heavy calculations only if predicted_pieces were specified in the GraphQL query.

Summing up

That’s all difficulties I’ve faced. I haven’t tried websocket subscriptions, but from what I’ve learned I can say that Python’s GraphQL is more flexible, than Java’s one. Because of Python’s flexibility itself. But if I am going to work on high-load back-end, I would prefer not to use GraphQL, as it is harder to optimize.

GraphQL with Spring: Query and Pagination

GraphQL with Spring: Query and Pagination

In this article I’ll describe you how to use GraphQL with Spring with this library. Full example is available here.

Why annotations?

From my point of view schema should not be written manually, just because it is easy to make a mistake. Schema should be generated from code instead. And your IDE can help you here, checking types and typos in names.

Nearly always GraphQL schema has the same structure as back-end data models. This is because back-end is closer to data. So it would much be easily to annotate your data models and keep the existing schema rather than to write schema manually (maybe on front-end side) and then create bridges between this schema and existing data models.

Add library and create core beans

First thing to do is to add library to your spring boot project. I assume you’ve already added web, so just add graphql to your build.gradle:

compile('io.github.graphql-java:graphql-java-annotations:5.2')

Graphql object is a start execution for GraphQL queries. To build it we need to provide a schema and strategy to it.
Let’s create a schema bean in your Configuration class:

@Bean
public GraphQLSchema schema() {
    GraphQLAnnotations.register(new ZonedDateTimeTypeFunction());
    return newSchema()
            .query(GraphQLAnnotations.object(QueryDto.class))
            .mutation(GraphQLAnnotations.object(MutationDto.class))
            .subscription(GraphQLAnnotations.object(SubscriptionDto.class))
            .build();
}

Here we register custom ZoneDateTime type function to convert ZonedDateTime from java to string with format yyyy-MM-dd’T’HH:mm:ss.SSSZ and back.

Then we use a builder to create new schema with query, mutation and subscription. This tutorial covers query only.

Building a schema is not so cheap, so it should be done only once. GrahpQlAnnotations will scan your source tree starting from QueryDto and going through it’s properties and methods building a schema for you.

After schema is ready you can create a GraphQL bean:

@Bean
public GraphQL graphQL(GraphQLSchema schema) {
    return GraphQL.newGraphQL(schema)
            .queryExecutionStrategy(new EnhancedExecutionStrategy())
            .build();
}

According to the documentation building GraphQL object is cheap and can be done per request, if required. It is not needed for me, but you can add prototype scope on it.

I’ve used EnhancedExecutionStrategy to have ClientMutationId be inserted automatically to support Relay mutations.

Create controller with CORS support

You will receive your graphql request as ordinary POST request on /graphql:

@CrossOrigin
@RequestMapping(path = "/graphql", method = RequestMethod.POST)
public CompletableFuture<ResponseEntity<?>> getTransaction(@RequestBody String query) {
    CompletableFuture<?> respond = graphqlService.executeQuery(query);
    return respond.thenApply(r -> new ResponseEntity<>(r, HttpStatus.OK));
}

It should always return ExecutionResult and Http.OK, even if there is an error!
Also it is very important to support OPTIONS request. Some front-end GraphQL frameworks send it before sending POST with data.
In Spring all you need is just add @CrossOrigin annotation.

Execute graphql with spring application context

You can get your query in two formats: json with variables:

{"query":"query SomeQuery($pagination: InputPagination) { viewer { someMethod(pagination: $pagination) { data { inner data } } } }","variables":{"pagination":{"pageSize":50,"currentPage":1}}}

or plain GraphQL query:

query SomeQuery {
 viewer {
   someMethod(pagination: {pageSize:50, currentPage:1}) {
     data { inner data }
   }
 }
}

The best way to convert both formats to one is to use this inner class:

private class InputQuery {
    String query;
    Map<String, Object> variables = new HashMap<>();
    InputQuery(String query) {
        ObjectMapper mapper = new ObjectMapper();
        try {
            Map<String, Object> jsonBody = mapper.readValue(query, new TypeReference<Map<String, Object>>() {
            });
            this.query = (String) jsonBody.get("query");
            this.variables = (Map<String, Object>) jsonBody.get("variables");
        } catch (IOException ignored) {
            this.query = query;
        }
    }
}

Here we parse JSON first. If parsed – we provide query with variables. If not – we just assume input string to be a plain query.
To execute your query you should construct GraphQL execution input and pass it to the execute method of your GraphQL object.

@Async
@Transactional
public CompletableFuture<ExecutionResult> executeQuery(String query) {
    InputQuery queryObject = new InputQuery(query);
    ExecutionInput executionInput = ExecutionInput.newExecutionInput()
            .query(queryObject.query)
            .context(appContext)
            .variables(queryObject.variables)
            .root(mutationDto)
            .build();
    return CompletableFuture.completedFuture(graphQL.execute(executionInput));
}

Where:

@Autowired
private ApplicationContext appContext;
@Autowired
private GraphQL graphQL;
@Autowired
private MutationDto mutationDto;

appContext is spring application context. It is used as execution input context in order to access spring beans in GraphQL objects.
GraphQL is your bean, created earlier.
MutationDto is your mutation. I’ll cover it in another tutorial.

The query

Query is a start point for your GraphQL request.

@GraphQLName("Query")
public class QueryDto

I used Dto suffix for all GraphQL objects to separate them from data objects. However this suffix is redundant for schema, so @GraphQLName annotation is used.

@GraphQLField
public static TableDto getFreeTable(DataFetchingEnvironment environment) {
    ApplicationContext context = environment.getContext();
    DeskRepositoryService repositoryService = context.getBean(DeskRepositoryService.class);
    return repositoryService.getRandomFreeDesk().map(TableDto::new).orElse(null);
}

Every public static method in QueryDto annotated with @GrahpQLField will be available for query:

query {
 getFreeTable {
   tableId name
 }
}

GraphQL Objects

Your query returns TableDto which is your GraphQL object.

The difference between QueryDto and normal TableDto is that first one is always static, while objects are created. In listing above is is created from Desk.

To make fields and methods of created object visible for the query you should make them public and annotate with @GraphQLField.

In case of properties you can leave them private. GraphQL library will access them anyway:

@GraphQLNonNull
@GraphQLField
private Long tableId;
@GraphQLNonNull
@GraphQLField
private String name;

@GraphQLField
public String getWaiterName(DataFetchingEnvironment environment) {
     //TODO use context to retrieve waiter.
    return "default";
}

DataFetchingEnvironment will be automatically filled in by GrahpQl Annotations library if added to function’s arguments. You can skip it if not needed:

@GraphQLField
public String getWaiterName() {
    return "default";
}

You can also use any other arguments including objects:

@GraphQLField
public String getWaiterName(DataFetchingEnvironment environment, String string, MealDto meal) {
     //TODO use context to retreive waiter.
    return "default";
}

You can use @GraphQLNonNull to make any argument required.

Relay compatibility

Every object should implement Node interface, which has non null id:

@GraphQLTypeResolver(ClassTypeResolver.class)
public interface Node {
    @GraphQLField
    @GraphQLNonNull
    String id();
}

ClassTypeResolver allows GraphQL to include interface to your schema.
I usually use Class name + class Id for Node Id. Here is AbstractId every object extends.

Then in TableDto constructor I will use: super(TableDto.class, desk.getDeskId().toString());

For the ability to get Table by it’s Node id let’s use this:

public static TableDto getById(DataFetchingEnvironment environment, String id) {
    ApplicationContext context = environment.getContext();
    DeskRepositoryService repositoryService = context.getBean(DeskRepositoryService.class);
    return repositoryService.findById(Long.valueOf(id)).map(TableDto::new).orElse(null);
}

It is be called from QueryDto:

@GraphQLField
public static Node node(DataFetchingEnvironment environment, @GraphQLNonNull String id) {
    String[] decoded = decodeId(id);
    if (decoded[0].equals(TableDto.class.getName()))
        return TableDto.getById(environment, decoded[1]);
    if (decoded[0].equals(ReservationDto.class.getName()))
        return ReservationDto.getById(environment, decoded[1]);
    log.error("Don't know how to get {}", decoded[0]);
    throw new RuntimeException("Don't know how to get " + decoded[0]);
}

by this query: query {node(id: "unique_graphql_id") {... on Table { reservations {edges {node {guest from to}} }}}}

The pagination

To support pagination your method should return PaginatedData<YourClass> and have additional annotation @GraphQLConnection:

@GraphQLField
@GraphQLConnection
@GraphQLName("allTables")
public static PaginatedData<TableDto> getAllTables(DataFetchingEnvironment environment) {
    ApplicationContext context = environment.getContext();
    DeskRepositoryService repositoryService = context.getBean(DeskRepositoryService.class);
    Page page = new Page(environment);
    List<Desk> allDesks;
    if(page.applyPagination()) {
        allDesks = repositoryService.findAll(); // TODO apply pagination!
    } else {
        allDesks = repositoryService.findAll();
    }
    List<TableDto> tables = allDesks.stream().map(TableDto::new).collect(Collectors.toList());
    return new AbstractPaginatedData<TableDto>(false, false, tables) {
        @Override
        public String getCursor(TableDto entity) {
            return entity.id();
        }
    };
}

Here I create Page using default GraphQL pagination variables. But you can use any other input for pagination you like.

To implement pagination you have two options:

  • in memory pagination. You retrieve all objects from your repository and then paginate, filter and sort them. For this solution it is much better to create your implementation of PaginatedData and pass environment, as well as pagination/sorting/filtering input there.
  • Actions in repository. It is much better, as you won’t load lot’s objects to memory, but it requires you to generate complex queries in the repository.

All GraphQL objects can have methods with pagination. In TableDto you can retrieve a paginated list of all reservations for current table with:

@GraphQLField
@GraphQLConnection
public PaginatedData<ReservationDto> reservations(DataFetchingEnvironment environment) {

Next steps

In the next article I will cover mutations and subscriptions.