Got Tech? Will Hack.
A couple of years ago, I was part of group of individuals working on defining different Event Driven architectures during a weekend summit. A summary of the summit was already published by Martin Fowler as first as a blog and later as a talk, the blog takes a slightly different view than the explanation I needed and thus this post was created. This is a recreation of the contents in the talk. If you have watched it, you can skip reading this summary.
This technique is a popular technique to avoid coupling in systems. These systems tend to eventually become good sources of data that the business would like to build data platforms, insights and models on.
This page exists to
An event is when a system wants to announce what has happened but not what is to be done. For example, a new insurance quote being generated is an event. It announces to the world that a quote has been generated but not what should happen as a result.
A command is when a system wants something to be done and is asking a system to do it. For example, an upstream system might ask the communications system to send an email with specific details and this is a command to the communications system.
Both of these are usually implemented as events on a queue. The primary differences are how they are named and what the intent is.
Let’s start with an example to help visualise the problem in which the customer changes an address for their house insurance in an insurance provider’s system which leads to a new quote being generated. This quote needs to be sent back to the user via an email.
If the services are built as visualised with calls across services being made, the services will be tightly coupled in their flow (since customer management needs to know of the existence of the quoting system which in-turn needs to know about the existence and need for communication). Here is how that problem can be solved with event driven architectures.
In this pattern, a source system will send a “notification” to all other systems that something has happened. The consumer needs to setup an event listener and figure out how to react to it. An example of this can be seen by the customer management system generating
Since the events do not have any information about what has changed, the downstream systems still need to call the upstream system to understand the details of what has changed to take action on the changes.
Here are a couple of versions of the customer changed event. The first version is one where the customer address changed event could include only the ID of the customer who’s address has changed. For every other part of the information (including what’s changed), the downstream systems need to contact the customer mamangement service.
Of course, these additional questions could be included in the event notification because they are related to the core event itself. There will always be some fields that a downstream system might need that are not directly part of the event but are required by the downstream system.
Systems built are decoupled. When there’s other actions that need to be made based on an address being changed, it’s easy to add another system to take action on this event without changes being required on the customer management side.
Systems built will be devoid of any behavior and there is no easy way to trace what happens downstream. There is no easy way to trace all the changes that happen in the code (by looking at the source code) to understand the list of changes that happen when the user changes their address.
Distributed tracing systems like zipkin aim to address these challenges by allowing visualisation of flows on environments with a full setup. Code can be traced by using mono-repos with the event names being the same across services. These are techniques to deal with the inability to trace code/flows across systems and while neither of them are as effective as tracing usages of your code, they help drive a balance between decoupling and ease of use.
Even when all the related information to the event have been added into the event payload, there will still be a need for downstream systems to require information. This means additional API calls will have to be made to the upstream system. As more downstream systems subscribe to a particular event, the upstream system will be under higher load to provide access to information and the downstream system’s availability is depedent on the upstream system.
Event carried state transfer (or ECST, for short) is sends all information related to the domain object in the event to avoid Event Notification’s need for call backs for additional information.
Downstream systems need to store the parts of the information they need for their usecase. If a difference between the old and new data is required, the data-structures chosen should make calculating differences easier.
Systems using this pattern have a lower dependence on their upstream services and thus have higher availability.
The higher availability comes at the cost of making the system eventually consistent. The data will also have higher replication.
An event sourced system is one where the events are stored on an event store/event log and where the current application state can be completely recreated based on the event store.
The event store is an append only log of events that have occurred and, in the example, the customer DB is an example of a snapshot. A snapshot is required for enhancing the performance of your store (since it stores the current state of your system for quick access).
Both source control systems (like git, svn etc.) and financial accounting ledgers are good examples of event sourcing.
This system makes audit, debuggability and replayability simple. Such systems are great to recreate issues and understand what order things happened in. The ability to timetravel with data on a production system is quite useful. Concepts like branching is possible with data and what-ifs are easy to simulate to figure out the difference. Differences can then be applied through the creation of compensating actions.
This system will make event versioning mandatory. Interacting with external systems becomes more complicated since these calls are side-effects and event sourced systems should not cause this side-effect again when events are being replayed.
Command Query Responsibility Segregation (or CQRS, for short) is a model in which reads and writes are separated. This allows scaling and optimisation reads and writes separately as per requirements.
This pattern is rarely necessary but extremely useful to allow write heavy vs read heavy transactions to be scaled separately. The read side(s) can be optimised for the usecases they are being used for.
Adds significant complexity building and maintaining a system.