Security is a key aspect in cloud and microservices deployments. However it also has some high impacts like a high CPU consumption due to the cryptographic computation algorithms.

JWT and cryptography

As a quick reminder, a JSON Web Token (JWT) is composed of three segments:

base64Url(jsonHeader).base64Url(jsonPayload).base64Url(signature)

Each segment is Base64 URL encoded. The segments composing the JWT are:

  1. The JSON header, it contains metadata about the payload and signature (exactly as HTTP headers contain metadata about the HTTP payload - content length, content-type etc…​). Generally it is what enables to verify the signature (algorithm, key id), what enables to read the payload (is it compressed or not) etc…​,

  2. The JSON payload which is an almost free JSON where you can put the data you want to share,

  3. The signature which guarantees the token was not modified. It is the signature of the first two segments (concatenated with a separating dot).

As you can see, the signature computation is what consumes the most of the CPU since other parts (base64 computation and JSON serialization) are really acceptable/simple.

The signature can be of multiple types but the most common are:

  • RSA (Rivest–Shamir–Adleman from the name of their creators),

  • EC (elliptic curve),

  • Hmac (hash based message authentication code),

  • PS (also known as RSASSA-PSS for RSA Probabilistic Signature Scheme).

Indeed, in the context of this post we only care about the performances but keep in mind you don’t always select the algorithm for its performances. A common example is that if you select Hmac, except being exposed to brute force hacking, you can’t share the key to verify the JWT since it would enable to create custom JWT which would be valid from a signature point of view. Then in terms of pure performance, depending if you prefer a fast encryption (signing) or decryption (verifying) you will pick RSA or EC depending other security constraints we will not detail in this particular post.

Microservice and JWT common usage

What we often see in microservice architectures is this kind of deployment:

Failed to generate image: PlantUML preprocessing failed: [From <input> (line 12) ]

@startuml
scale max 1024 width

skinparam nodesep 10
skinparam ranksep 10

left to right direction
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!define AzurePuml https://raw.githubusercontent.com/RicardoNiepel/Azure-PlantUML/release/2-1/dist
!define CloudInsight https://raw.githubusercontent.com/plantuml/plantuml-stdlib/master/cloudinsight

!includeurl KubernetesPuml/kubernetes_Common.puml
^^^^^
 Cannot open URL

scale max 1024 width

skinparam nodesep 10
skinparam ranksep 10

left to right direction
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!define AzurePuml https://raw.githubusercontent.com/RicardoNiepel/Azure-PlantUML/release/2-1/dist
!define CloudInsight https://raw.githubusercontent.com/plantuml/plantuml-stdlib/master/cloudinsight

!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
!includeurl KubernetesPuml/OSS/KubernetesSvc.puml
!includeurl KubernetesPuml/OSS/KubernetesPod.puml
!includeurl AzurePuml/AzureCommon.puml
!includeurl AzurePuml/Databases/AzureSqlDatabase.puml
!includeurl CloudInsight/mysql.puml

Cluster_Boundary(cluster, "Kubernetes Cluster") {
  Namespace_Boundary(ns, "my-app") {
    KubernetesSvc(svc, "Service", "")
    KubernetesPod(gateway, "Gateway", "")
    KubernetesPod(pod1, "API1", "")
    KubernetesPod(pod2, "Frontend1", "")
    KubernetesPod(pod3, "Backend", "")
    database "<$mysql>" as sql #white
  }
}

Rel(svc,gateway,"GET /service* -H 'Authorization:Bearer xxx'")
Rel(gateway,pod1,"GET /service1 -H 'Authorization:Bearer xxx'")
Rel(gateway,pod2,"GET /service2 -H 'Authorization:Bearer xxx'")
Rel(pod2,pod3,"GET /service3 -H 'Authorization:Bearer xxx'")
Rel(pod3,sql,"SELECT *")

What is interesting to see is that for a quite common application with an API layer, a frontend and a backend service (in real deployments we often have N backend services and not a single one), we propagate the JWT at least 4 times.

in real applications the JWT is also propagated to topics/Kafka messages to be able to load the security context which makes it even worse.

Propagating the JWT is not really crazy since it is mainly propagating a header but before using a JWT you must ensure it is valid otherwise you can not trust its content so each service in this chain will have to:

  1. Ensure there is a JWT (cheap),

  2. Ensure the JWT is valid (cheap, mainly base64 and JSON validation on the header),

  3. Ensure the JWT signature is valid (expensive in terms of CPU),

  4. Ensure the JWT payload/content is valid (not expired, correct issuer etc…​),

  5. Use the JWT (read /groups from the claims/payload for example).

This list immediately shows two really impacting points:

  • On the overall cluster we will pay the token validation again and again on each service (instead of paying 1 we will pay 4 in the simplified example we had before),

  • Each service must have access to the validation data (valid issuer, signature key/certificate, time window validation - there is often some tolerance for expiry case).

The first point implies a high cloud/machine cost and the last one a high cost in terms of maintenance/deployment/ops.

Reduce CPU and maintenance cost for JWT based applications

The simplest solution to solve this cost is to ensure it is paid only once. This is done adding/relying on the gateway in front of all services:

Failed to generate image: PlantUML preprocessing failed: [From <input> (line 12) ]

@startuml
scale max 1024 width

skinparam nodesep 10
skinparam ranksep 10

left to right direction
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!define AzurePuml https://raw.githubusercontent.com/RicardoNiepel/Azure-PlantUML/release/2-1/dist
!define CloudInsight https://raw.githubusercontent.com/plantuml/plantuml-stdlib/master/cloudinsight

!includeurl KubernetesPuml/kubernetes_Common.puml
^^^^^
 Cannot open URL

scale max 1024 width

skinparam nodesep 10
skinparam ranksep 10

left to right direction
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!define AzurePuml https://raw.githubusercontent.com/RicardoNiepel/Azure-PlantUML/release/2-1/dist
!define CloudInsight https://raw.githubusercontent.com/plantuml/plantuml-stdlib/master/cloudinsight

!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
!includeurl KubernetesPuml/OSS/KubernetesSvc.puml
!includeurl KubernetesPuml/OSS/KubernetesPod.puml
!includeurl AzurePuml/AzureCommon.puml
!includeurl AzurePuml/Databases/AzureSqlDatabase.puml
!includeurl CloudInsight/mysql.puml

Cluster_Boundary(cluster, "Kubernetes Cluster") {
  Namespace_Boundary(ns, "my-app") {
    KubernetesSvc(svc, "Service", "")
    KubernetesPod(gateway, "Gateway", "")
    KubernetesPod(pod1, "API1", "")
    KubernetesPod(pod2, "Frontend1", "")
    KubernetesPod(pod3, "Backend", "")
  }
}

Rel(svc,gateway,"GET /service* -H 'Authorization:Bearer xxx'")
Rel(gateway,pod1,"GET /service1")
Rel(gateway,pod2,"GET /service2")
Rel(pod2,pod3,"GET /service3")

With this new architecture, we only validate read the JWT once…​.but we loose the JWT claims - which is the only interesting part functionally. For some applications it will be ok-ish (for applications only requiring to be logged in) but generally it will not. This is where the gateway will add a real value because its roles now become:

  • to validate the JWT and ensure the call is valid,

  • to explode the JWT payload and propagate needed informations through headers.

Once this second step is done, the calls now look like:

Failed to generate image: PlantUML preprocessing failed: [From <input> (line 12) ]

@startuml
scale max 1024 width

skinparam nodesep 10
skinparam ranksep 10

left to right direction
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!define AzurePuml https://raw.githubusercontent.com/RicardoNiepel/Azure-PlantUML/release/2-1/dist
!define CloudInsight https://raw.githubusercontent.com/plantuml/plantuml-stdlib/master/cloudinsight

!includeurl KubernetesPuml/kubernetes_Common.puml
^^^^^
 Cannot open URL

scale max 1024 width

skinparam nodesep 10
skinparam ranksep 10

left to right direction
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!define AzurePuml https://raw.githubusercontent.com/RicardoNiepel/Azure-PlantUML/release/2-1/dist
!define CloudInsight https://raw.githubusercontent.com/plantuml/plantuml-stdlib/master/cloudinsight

!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
!includeurl KubernetesPuml/OSS/KubernetesSvc.puml
!includeurl KubernetesPuml/OSS/KubernetesPod.puml
!includeurl AzurePuml/AzureCommon.puml
!includeurl AzurePuml/Databases/AzureSqlDatabase.puml
!includeurl CloudInsight/mysql.puml

Cluster_Boundary(cluster, "Kubernetes Cluster") {
  Namespace_Boundary(ns, "my-app") {
    KubernetesSvc(svc, "Service", "")
    KubernetesPod(gateway, "Gateway", "")
    KubernetesPod(pod1, "API1", "")
    KubernetesPod(pod2, "Frontend1", "")
    KubernetesPod(pod3, "Backend", "")
  }
}

Rel(svc,gateway,"GET /service* -H 'Authorization:Bearer xxx'")
Rel(gateway,pod1,"GET /service1 -H 'JWT-sub:user1' -H 'JWT-groups:admin,manager'")
Rel(gateway,pod2,"GET /service2 -H 'JWT-groups:admin,manager'")
Rel(pod2,pod3,"GET /service3 -H 'JWT-groups:admin,manager'")

With such solution, the important JWT information - here the sub which is the username/login and groups which contains the list of roles - are propagated through dedicated headers to services.

the JWT- prefix is a convention but you can use any header name, just ensure to normalize it in the company to enable a service to transparently forward them if needed by filtering them on the prefix.

The last step you can go through is to validate the role on the gateway when possible but often it will be linked to some business rules so will need to be done in the service.

Meecrogate example

Meecrogate gateway enables to implement this pattern by configuration. Using the JSON configuration - but it works exactly the same with the environment variables configuration, the route will be composed of 4 sections:

  1. The matcher section which specifies how to select this route configuration for an incoming request,

  2. The proxy section which specifies where to forward the request once validated/prepared,

  3. The shield section which specifies how to validate the request before starting to prepare it and forwarding it,

  4. The enricher section which enables to modify the incoming request before forwarding it to another service.

Assuming we are defining the route for the service 1 of previous diagram, the matcher section can look like:

{
  "shortName": "JWT validation+explosion",
  "matcherConfiguration": {
    "matcherType": "STARTS_WITH",
    "matcherValue": "/service1"
  },
  ...
}

Similarly, the proxy section will just target the service1 host:

{
  ...,
  "mode": "PROXY",
  "proxyConfiguration": {
    "proxyScheme": "https",
    "authorities": [ "service1.company.com" ]
  },
  ...
}

Now the core of the security (the shield layer) will validate our incoming JWT:

{
  ...,
  "shieldConfiguration": [{ (1)
    "jwt": {
      "typValidation": [{ (2)
        "expected": "JWT",
        "required": true
      }],
      "expiryValidation": [{ (3)
        "expiryRequired": true,
        "issuedAtRequired": true
      }],
      "claimValidations": [ (4)
        { "claimKey": "sub" },
        { "claimKey": "groups" }
      ],
      "issuerValidation": [{ (5)
          "expected": "https://app.company.com/oauth2/"
      }],
      "signatureValidation": [{ (6)
        "alg": "RS256",
        "kid": "app-key",
        "key": "xxxxxxx"
      }],
    }
  }],
  ...
}
1 We add a shield configuration which is basically our JWT validation,
2 We validate our JWT typ value,
3 We validate our validity window (expiry and issued at values),
4 We validate the presence of the needed claims (sub and groups there) - note that this block can also validate the actual value and not only the presence,
5 We validate the JWT was issued for the application we proxy - here again, if the application/service tolerates multiple issuers we can have multiple routes,
6 We validate the JWT signature to guarantee we can trust it giving the key to use to,

Now our JWT is validated, we can modify the incoming request to forward the needed information to the actual service:

{
  ...,
  "enricherConfiguration": {
    "requestAdditionalHeaders": [ (1)
      { (2)
        "action": "SET",
        "name": "JWT-sub",
        "value": "${request:jwt:claim:sub}"
      },
      { (2)
        "action": "SET",
        "name": "JWT-groups",
        "value": "${request:jwt:claim:groups?csv=,}"
      },
      { (3)
        "action": "REMOVE",
        "name": "Authorization"
      }
    ]
  }
}
1 We add now the configuration to mutate the incoming request headers to the proxied request headers,
2 We add our two JWT-x headers propagating the value from the JWT claims using request:jwt:claim:x interpolation mecanism - enabling to apply this pattern to any custom claims keys and values too. The csv=, option added to groups enables to convert the array of string to a CSV string enabling a simpler consumption by downstream services,
3 We drop the original Authorization header which is no more needed in the subsystem.

Impact of forwarding data into headers on services

If you migrate from a JWT parsing solution (microprofile-jwt-auth, spring-boot-starter-security, etc…​) to this header based solution you will still need to load the security context but you can now trust the incoming headers since the JWT was validated properly and information are properly forwarded.

In terms of (pseudo-)code, it means moving from this kind of implementation:

final var token = readJwtTokenFromHeader(request);
validateJwt(token);
final var jwt = extractJwtPayload(token);
//...
final var groups = jwt.getClaim("groups");
if (!isAllowed(groups)) {
    throw new WebApplicationException(...);
}

to

final var ctx = extractUserContext(request);
//...
final var groups = ctx.getClaim("groups");
if (!isAllowed(groups)) {
    throw new WebApplicationException(...);
}

The interesting part is that the change is mainly about the part before the comment line. This means that once integrated with your security framework (from a simple interceptor to the full security framework like Apache Shiro or the widely spread Spring security), it does not change your application much.

this is one of the reason to take time to properly setup your application stack and not take the shortcut to code transversal concerns like security ones in the business layer.

Now the JWT extracton is not that complex to implement, in particular since the JVM or most common languages contain all the needed bricks, but its validation can be tricky and if you do an error there, there is no more security in the system. On another side, the user context loading is only about reading headers and optionally parsing a json value for most complex cases so it is very very doable by any framework and developer.

To illustrate that I will just show what extractUserContext can look like in a JAX-RS based application in two steps:

  1. Define the UserContext model,

  2. Inject the user context instance in your endpoints.

Here is how tp define the user context we spoke about ealier:

@Data
public class UserContext {
   @HeaderParam("JWT-sub")
   private String sub;

   @HeaderParam("JWT-groups")
   private List<String> groups;
}

Once we have an user context we can simply inject it into any endpoint:

@RequestScoped
@Path("api")
public class MyResource {
    @GET
    public MyData get(@BeanParam final UserContext ctx) {
        if (!ctx.hasRole("admin")) {
            throw new WebApplicationException(Response.Status.FORBIDDEN);
        }
        return fetchData();
    }
}
making UserContext hosting some security helper method makes it generally easier to use.

Use a gateway, don’t do it in a service

It is natural to think that any service can play that gateway role and that you can keep microprofile-jwt-auth, spring-security etc to do that gateway role. Technically it is true but the key point here is the architecture: the service should be central (which does not mean a SPoF, previous Meecrogate example has no SPoF since it scales properly horizontally for example). Using a service, the risk - which is quite high - is to start adding other services bypassing it and being back to the original issue so the rule of thumb there is to use a gateway. Techically you can use whatever you want from a plain HTTPd/NGINX/Tomcat with custom glue code to a real gateway like Meecrogate one but ensuring the instances act as gateways is what will make this solution robust in time.

Another good reason to make some instances acting as a gateway is to enable them to use a cache. Once some instances are the "gateway", you can optimize them to have the relevant CPU and memory profile making your security layer very cheap for end user in terms of experience which is key to offer a good service.

Depending the size of your company you can also be concerned about security audits, and in such a case, having a gateway enables to justify only once about the security concerns whereas doing it in all services will require to provide such guarantees in all services which can be a lot of efforts and as much errors you can encounter.

Last point is that once you passed the gateway, everything is "user context" which means it will integrate smoothly with messaging system needing the user context and will not require Kafka topics to be replayed overloading the authentication system for example (using CQRS pattern it happens very easily when global architecture was not thought upfront).

Conclusion

We saw in this post that:

  • cloud and microservice deployments require some understanding of the overall system and not just of each piece compsing it alone,

  • using JWT token enables stateless deployments without having to imply an outstanding CPU/resource consumption,

  • using a gateway enables to focus on the business in services,

  • using a gateway enables concentrate the security effort and user context modelling in a single location,

  • using a gateway enables to optimize resource consumption,

  • using a gateway enables to have better guarantees about the security and simplify the audits.

So the conclusion of this post is: always think your system widely and not per service and if relevant use a real gateway!