Dev Process Improvement

The objective is to create a robust and agile integration platform that can expose and monetize APIs and assets. The architecture challenges include the lack of standardization, governance, security, service mesh, and logging. The development challenges include the dependency on developer’s skills, the absence of performance testing, code reviews, and end-to-end testing. The operational challenges include the log losses, the lack of monitoring and troubleshooting, and the undefined archival and purging policies. The solution approach is to separate the traffic paths, use API gateways, implement a secrets manager, an EAI layer, and SCAT logging. The plan code change is based on DORA and APM metrics, and the microservice pattern is chosen from Saga, CQRS or API Composition. The caching strategy uses change data capture and a distributed cache.

We will share with you how we tackled some of the common challenges faced by integration platforms and delivered a stable, secure, scalable, agile and reliable solution that enabled better business outcomes at optimal cost. We also created a unified platform to discover, design and publish APIs, and provided a predictive and pre-emptive monitoring framework. Finally, we leveraged the integration platform to monetize our APIs and assets.

One of the main challenges we faced was the lack of standardized common capabilities that we could reuse across different microservices. We also had to deal with governance and security issues, as well as the complexity of building each microservice from scratch. Moreover, we did not have a service mesh to control and shape the traffic between microservices, nor did we have adequate audit and logging mechanisms within the solution.

To address these challenges, we adopted the following architectural principles:

– Separate external and internal ingress paths to segregate traffic. This way, we could apply different security policies and routing rules for different types of clients.
– All services masked behind API Gateways (internal and external). This enabled us to expose a consistent and unified interface for our APIs, as well as to enforce authentication, authorization, throttling, caching and other policies.
– Dedicated egress path to communicate with backend systems. This allowed us to isolate the backend systems from the external traffic and to apply transformation, validation and error handling logic.
– 3rd party secrets manager. This helped us to store and manage sensitive information such as credentials, keys and certificates in a secure and centralized way.
– EAI Layer with Pub/Sub Capability. This facilitated the integration of different microservices and backend systems using asynchronous messaging patterns such as publish/subscribe, request/reply and event-driven.
– Enforce logging using SCAT. This is a framework that we developed to standardize the logging format and content across all microservices. SCAT stands for Service Name, Correlation ID, Action Name and Transaction ID. These are the four mandatory fields that every log message must contain, along with other optional fields such as timestamp, severity level, message text and payload. SCAT enables us to trace and troubleshoot transactions across multiple microservices and systems.

Another set of challenges we faced was related to the development process of the microservices. We had to ensure that each microservice met the performance and quality standards, as well as that they were tested and tuned before deployment. We also had to conduct code reviews and measure code quality metrics such as defect density and defect seepage. Furthermore, we had to perform rigorous end-to-end testing across the board, including non ESB deliveries.

To overcome these challenges, we adopted the following development practices:

– Plan code change based on DORA and APM metrics. DORA stands for Deployment Frequency, Lead Time for Changes, Change Failure Rate and Mean Time to Restore Service. These are the four key metrics that measure the performance of software delivery teams. APM stands for Application Performance Monitoring. These are the tools that measure the performance of applications in terms of response time, throughput, error rate and availability. By using these metrics, we could identify the areas that needed improvement and prioritize our code changes accordingly.
– Use change data capture for caching static data for longer duration and a distributed cache for shorter duration. This is a technique that we used to optimize the performance of our microservices by reducing the number of calls to the backend systems. Change data capture is a process that captures the changes made to the data in the backend systems and propagates them to a cache layer. This way, we could cache static data such as configuration parameters or reference data for longer periods of time without worrying about data inconsistency. For dynamic data that changes frequently or has a short lifespan, we used a distributed cache that could scale horizontally and provide high availability.
– Standardize on Saga, CQRS or API Composition as a microservice pattern. These are some of the common patterns that we used to design our microservices based on their requirements and characteristics. Saga is a pattern that implements a long-running transaction across multiple microservices using compensating actions in case of failure. CQRS is a pattern that separates the read and write operations of a microservice into different models or services. API Composition is a pattern that aggregates data from multiple microservices into a single response.

The last set of challenges we faced was related to the operational aspects of the integration platform. We had to deal with log losses in logs captured by the log collector, lack of operations metrics for monitoring, inadequate troubleshooting leading to inadequate RCA (root cause analysis), and lack of defined archival and purging policies.

To solve these challenges, we implemented the following operational measures:

– Use ELK stack for log aggregation and analysis. ELK stands for Elasticsearch, Logstash and Kibana. These are open source tools that enable us to collect, store, search and visualize logs from different sources in a centralized way. Elasticsearch is a distributed search engine that indexes and queries the logs. Logstash is a data processing pipeline that ingests, transforms and sends the logs to Elasticsearch. Kibana is a web-based interface that provides dashboards and visualizations for the logs. By using ELK, we could avoid log losses and gain insights into the performance and behavior of our microservices and systems.
– Use Jaeger for distributed tracing and RCA. Jaeger is an open source tracing system that tracks the flow of requests across multiple microservices and systems. It uses a concept of spans and traces to represent the individual units of work and the end-to-end transactions. By using Jaeger, we could identify the bottlenecks, errors and dependencies of our microservices and systems, as well as perform RCA for any incidents or problems.

In conclusion, we successfully delivered a stable, secure, scalable, agile and reliable integration platform that met the business objectives and expectations. We also created a unified platform to discover, design and publish APIs, and provided a predictive and pre-emptive monitoring framework. Finally, we leveraged the integration platform to monetize our APIs and assets. We hope that this blog post has given you some insights into how we approached and solved some of the common challenges faced by integration platforms.

Open chat
1
Hello 👋
Can we help you?