Micro service - communication
In this page we will have a look at the communication between the services based on the example of the development state of April 2017. This is to give an overview over the inner workings between the services. Important for this document is to differentiate between the term "service" and "instance". A service is a set of similar instances, and because we will have a lot of them we call them "microservice". An instance is... an instance of the service? Guess that's self-explanatory.
Setup
In the diagram you can roughly see an overview of what components there are right now. I want to focus your attention on the fact that there omscore and omsevents appear twice. This is to give an example for scaling these applications. Scaling means replicating the same service several times, which brings the advantage of fault-tolerance, higher throughput and the possibility to use more physical machines with our setup. Theoretically each of the boxes with a double border can run on any machine, they just all need to be connected in a swarm-cluster. The number of two replicas is arbitrarily chosen by me.
Then there are two parts which appear different, the swarm dns and the Traefik reverse-proxy. Both provide some way to perform service discovery inside our network. The swarm dns is aware of the services which are defined inside the docker-compose.yml and will return an instance of the existing instances of these service by a scheduling mechanism like round robin. This is useful as a service inside the swarm can just query an address like http://omscore and will be connected to one of the instances of omscore. Traefik is also able to do loadbalancing, however therefore it is necessary to query traefik with a specific url, like /omsevents/frontend/all/eventsController.js. Traefik then matches this request to a backend by specific rules, also called "frontend". In the scenario, three frontends are active. Backend is the name for an instance which is part of the service hooked to the frontend. Traefik will reverse-proxy requests from the wild internet and performs its own loadbalancing and health checking. It can also be used for internal service discovery by just querying one of the frontends from inside the network.
Client request
So let's look at what roughly happens when a request from a client hits the service
This is just a sample request and in fact an API call is missing in here, but to see the inner workings it is enough. What should be visible is that each service can serve its data without or with little interference from other services. All requests to the outside are proxied through the traefik reverse proxy, and each service is able to contact its own database. The core knows where his database lives through the address returned by the swarm dns, he just connects to postgres://postgres and the address will be resolved to the correct container.
The concrete request here is an initial page load, as the client loads js from a microservice frontend. That only happens once at the beginning, all subsequent requests are just api calls as it is common in a SPA design. But now to the interesting part
Interservice Communication
This should lay out all the interservice communication process, some things will happen at other times though.
- Get Auth Token Internal communication can be authenticated in two ways, if one service wants something from another service on behalf of a user request, it can use the access token from the user to authenticate against another microservice. This is the preferred way of authenticating interservice requests, as users permissions are automatically applied everywhere in the service chain. However, sometimes it is impossible to provide a user token such as in cron-executed requests, which is why the service registry provides auth token for internal services. A service can get one of those tokens by supplying the api-key to the registry as a proof of authenticity. The api-key which is generated by the registry sits in a shared docker volume and thus is accessible to all microservices. This request to get the auth token can happen at the startup of the service and is not necessarily a part of the communication chain, however it has to happen before the rest of this communication which is why it is shown here.
- Query for service in category The registry holds servicenames by categories, so when a service needs information about users, it can query the registry for a service which serves that category. For performance this response can be performed at the startup of the service and cached for the whole lifetime of the application, as this is not going to change often and currently changes would involve a restart of the whole system anyways.
- Name resolution The result of the registry query is a url which the service can use to query other microservices. This url is either a Traefik frontend rule which is directly proxied to the instance or an internal domain name, which can be resolved by the swarm dns as we already saw it in the db request from the core in the above scenario. This part is pretty obvious and it is just included to raise your awareness of the two ways of resolving a name to another service inside the setup, but most of the time you will not have to care about this, as the registry currently only returns swarm-dns names and even if it wouldn't, the microservices won't have to change any code to reflect that.
- Query The actual query for data from the other service
- Validate token In case the other service used an auth token supplied by the registry, the other service has to validate that token against the registry. In case it was a user token, the usual authentication method can be used (Currently this is done only in the core by means of middleware)
- Response On successful validation the queried service will return the requested data to the requesting service. Note that token supplied by the registry automatically mean high access rights, as many requests will be open without limitations to other services, so it is vital to protect those auth tokens inside the services.