Shahzad Bhatti Welcome to my ramblings and rants!

April 19, 2024

Effective Load Shedding and Throttling Strategies for Managing Traffic Spikes and DDoS Attacks

Filed under: Design — Tags: — admin @ 9:48 pm

Online services experiencing rapid growth often encounter abrupt surges in traffic and may become targets of Distributed Denial of Service (DDoS) attacks orchestrated by malicious actors or inadvertently due to self-induced bugs. Mitigating these challenges to ensure high availability requires meticulous architectural practices, including implementing caching mechanisms, leveraging Content Delivery Networks (CDNs), Web Application Firewalls (WAFs), deploying queuing systems, employing load balancing strategies, implementing robust monitoring and alerting systems, and incorporating autoscaling capabilities. However, in this context, we will focus specifically on techniques related to load shedding and throttling to manage various traffic shapes effectively.

1. Traffic Patterns and Shapes

Traffic patterns refer to the manner in which user requests or tasks interact with your online service throughout a given period. These requests or tasks can vary in characteristics, including the rate of requests (TPS), concurrency, and the patterns of request flow, such as bursts of traffic. These patterns must be analyzed for scaling your service effectively and providing high availability.

Here’s a breakdown of some common traffic shapes:

  • Normal Traffic: defines baseline level of traffic pattern that a service receives most of the time based on regular user activity.
  • Peak Traffic: defines recurring period of high traffic based on daily or weekly user activity patterns. Auto-scaling rules can be set up to automatically allocate pre-provisioned additional resources in response to anticipated peaks in traffic.
  • Off-Peak Traffic: refers to periods of low or minimal traffic, such as during late-night hours or weekends. Auto-scaling rules can be set to scale down or consolidating resources during periods of low demand help minimize operational costs while maintaining adequate performance levels.
  • Burst Traffic: defines sudden, short-lived spikes in traffic that might be caused by viral contents or promotional campaigns. Auto-scaling rules can be configured to allocate extra resources in reaction to burst traffic. However, scaling resources might not happen swiftly enough to match the duration of the burst traffic. Therefore, it’s typically recommended to maintain surplus capacity to effectively handle burst traffic situations.
  • Seasonal Traffic: defines traffic patterns based on specific seasons, holidays or events such as Black Friday or back-to-school periods. This requires strategies similar to peak traffic for allocating pre-provisioned additional resources.
  • Steady Growth: defines gradual and consistent increase in traffic over time based on organic growth or marketing campaigns. This requires proactive monitoring to ensure resources keep pace with demand.

Classifying Requests

Incoming requests or tasks can be identified and categorized based on various contextual factors, such as the identity of the requester, the specific operation being requested, or other relevant parameters. This classification enables the implementation of appropriate measures, such as throttling or load shedding policies, to manage the flow of requests effectively.

Additional Considerations:

  • Traffic Patterns Can Combine: Real-world traffic patterns are often a combination of these shapes, requiring flexible and adaptable scaling strategies.
  • Monitoring and Alerting: Continuously monitor traffic patterns to identify trends early and proactively adjust your scaling strategy. Set up alerts and notifications to inform about sudden traffic surges or potential DDoS attacks so you can take timely action.
  • Incident Response Plan: Develop a well-defined incident response plan that outlines the steps for communication protocols, mitigation strategies, engaging stakeholders, and recovery procedures.
  • Cost-Effectiveness: Balance scaling needs with cost optimization to avoid over-provisioning resources during low traffic periods.

2. Throttling and Rate Limiting

Throttling controls the rate of traffic flow or resource consumption within a system to prevent overload or degradation of service. Throttling enforces quota limits and protects system overload by limiting the amount of resources (CPU, memory, network bandwidth) a single user or client can consume within a specific time frame. Throttling ensures efficient resource utilization, allowing the service to handle more users in a predictable manner. This ensures better fairness and stability while preventing a noisy neighbor problem where unpredictable spikes or slowdowns caused by heavy users. Throttling can be implemented by API Rate Limiting on the number of API requests a client can make with a given time window; by limiting maximum bandwidth allowed for various network traffic; by limiting rate of read/write; or by limiting the number of concurrent connections for a server to prevent overload.

These throttling and rate limiting measures can be applied to both anonymous and authenticated requests as follows:

  • Anonymous Requests:
    • Rate limiting: Implement rate limiting based on client IP addresses or other identifiers within a specific time window, preventing clients from overwhelming the system.
    • Concurrency limits: Set limits on the maximum number of concurrent connections or requests that can be processed simultaneously.
    • Server-side throttling: Apply throttling mechanisms at the server level, such as queue-based rate limiting or token bucket algorithms, to control the overall throughput of incoming requests.
  • Authenticated Requests:
    • User-based rate limiting: Implement rate limiting based on user identities or API keys, ensuring that authenticated users cannot exceed specified request limits.
    • Prioritized throttling: Apply different throttling rules or limits based on user roles, subscription tiers, or other criteria, allowing higher priority requests to be processed first during peak loads.
    • Circuit breakers: Implement circuit breakers to temporarily disable or throttle load from specific services or components that are experiencing high latency or failures, preventing cascading failures.

2.1 Error Response and Headers

When a request exceeds the rate limit, the server typically returns a 429 HTTP status code indicating that the request has been throttled or rate-limited due to Too Many Requests. The server may also return HTTP headers such as Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Reset, and X-RateLimit-Resource.

3. Load Shedding

Load shedding is used to prioritize and manage system resources during periods of high demand or overload. It may discard or defer non-critical tasks or requests to ensure the continued operation of essential functions. Load shedding helps maintain system stability and prevents cascading failures by reallocating resources to handle the most critical tasks first. Common causes of unexpected events that require shedding to prevent overloading system resources include:

  • Traffic Spikes: sudden and significant increases in the volume of incoming traffic due to various reasons, such as viral content, marketing campaigns, sudden popularity, or events.
  • DDoS (Distributed Denial of Service): deliberate attempts to disrupt the normal functioning of a targeted server, service, or network by overwhelming it with a flood of traffic. A DDoS attack can be orchestrated by an attacker who commands a vast botnet comprising thousands of compromised devices, including computers, IoT devices, or servers. Additionally, misconfigurations, software bugs, or unforeseen interactions among system components such as excessive retries without exponential delays that can also lead to accidental DDoS attacks.

Here is how excessive load for anonymous and authenticated requests can be shed:

  • Anonymous Requests: Drop requests during extreme load conditions or when server capacity is reached, drop a percentage of incoming requests to protect the system from overload. This can be done randomly or based on specific criteria such as request types, and headers. Alternatively, service can degrade non-critical features or functionalities temporarily to reduce the overall system load and prioritize essential services.
  • Authenticated Requests: Apply load shedding rules based on user roles, subscription tiers, or other criteria, prioritizing requests from high-value users or critical services.

3.1 Error Response

When a request exceeds the rate limit, the server typically returns a 503 HTTP status code indicating that the request has been throttled or rate-limited due to Too Many Requests. The server may also return HTTP headers such as Retry-After, other headers specifically employed for throttling are less prevalent in the context of load shedding. Unlike throttling errors, which fall under user-errors with 4XX error codes, load shedding is categorized as a server error with 5XX error codes. Consequently, load shedding requires more aggressive monitoring and alerting compared to throttling errors. Throttling errors, on the other hand, can be considered expected behavior as a means to address noisy neighbor problems and maintain high availability.

4. Additional Techniques for Throttling and Load Shedding

Throttling, rate-limiting and load shedding measures described above can be used to handle high traffic and to prevent resource exhaustion in distributed systems. Here are common techniques that can be used to implement these measures:

  • Admission Control: Set up thresholds for maximum concurrent requests or resource utilization.
  • Request Classification and Prioritization: Classify requests based on priority, user type, or criticality and then dropping low-priority requests when the thresholds for capacity are exceeded.
  • Backpressure and Queue Management: Use a fixed-length queues to buffer incoming requests during high loads and applying back-pressure by rejecting requests when queues reach their limits.
  • Fault Isolation and Containment: Partition the system into isolated components or cells to limit the blast radius of failures.
  • Redundancy and Failover: Build redundancy into your infrastructure and implement failover mechanisms to ensure that your services remain available even if parts of your infrastructure are overwhelmed.
  • Simplicity and Modularity: Design systems with simple, modular components that can be easily understood, maintained, and replaced. Avoid complex dependencies and tight coupling between components.
  • Circuit Breaker: Monitor the health and performance of downstream services or components and stop forwarding requests if a service is overloaded or unresponsive. Periodically attempt to re-establish the connection (close the circuit breaker).
  • Noisy Neighbors: Throttle and apply rate limits to customer traffic to prevent them from consuming resources excessively, thereby ensuring fair access for all customers. 
  • Capacity Planning and Scaling: Continuously monitor resource utilization and plan for capacity growth. Implement auto-scaling mechanisms to dynamically adjust resources based on demand.
  • Communication Optimization: Employ communication optimization techniques like compression, quantization to minimize network traffic and bandwidth requirements.
  • Privacy and Security Considerations: Incorporate privacy-preserving mechanisms like secure aggregation, differential privacy, and secure multi-party computation to ensure data privacy and model confidentiality.
  • Graceful Degradation: Identify and disable non-critical features or functionality during high loads.
  • Monitoring and Alerting: Monitor system metrics (CPU, memory, request rates, latency, etc.) to detect overload scenarios and sending alerts when thresholds are exceeded.
  • Defense in Depth: Implement multi-layered defense strategy to detect, mitigate, and protect customer workloads from malicious attacks, like blacklisting IP addresses or employing Geo-location filters, at the Edge Layer using CDN, Load Balancer, or API Gateway. Constrain network bandwidth and requests per second (RPS) for individual tenants at the Network Layer. Applying resource quota, prioritization and admission control at the Application Layer based on account information, request attributes and system metrics. Isolating tenants’ data in separate partitions at the Storage Layer. Each dependent service may use similar multi-layered defense to throttle based on the usage patterns and resource constraints.
  • Adaptive Scaling: Automatically scale resources up or down based on demand and multi-tenant fairness policies. Employ predictive auto-scaling or load-based scaling.
  • Fault Tolerance and Checkpointing: Incorporate fault tolerance mechanisms, redundant computation and checkpointing to ensure reliable and resilient task processing in the face of potential resource failures. The fault tolerance mechanisms can be used to handle potential failures or stragglers (slow or unresponsive devices).
  • Web Application Firewall (WAF): Inspects incoming traffic and blocks malicious requests, including DDoS attacks, based on predefined rules and patterns.
  • Load Balancing: By distributing incoming traffic across multiple servers or instances, load balancing helps prevent any single server from becoming overwhelmed.
  • Content Delivery Network (CDN): Distribute your content across multiple geographic locations, reducing the strain on your origin servers.
  • Cost-Aware Scaling: Implements a cost-aware scaling strategy like like cost modeling and performance prediction that considers the cost of different resource types.
  • Security Mechanisms: Incorporate various security mechanisms such as secure communication channels, code integrity verification, and runtime security monitoring to protect against potential vulnerabilities and attacks in multi-tenant environments.
  • SOPs and Run books: Develop well-defined procedures that outlines the steps for detecting traffic spikes, pinpointing source of malicious attack, analyzing the logs and monitoring metrics, mitigation strategies, engaging stakeholders, and recovery procedures.

5. Pitfalls with Use of Throttling and Load Shedding

Here are some potential challenges to consider when implementing throttling and load shedding:

  • Autoscaling Failures: If your throttling policies are too aggressive, they may prevent your application from generating enough load to trigger autoscaling policies. This can lead to under-provisioning of resources and performance degradation. Conversely, if your throttling policies are too lenient, your application may scale up unnecessarily, leading to overspending.
  • Load Balancer Health Checks: Some load balancers use synthetic health checks to determine the health of backend instances. If your throttling policies block these health checks, it can cause instances to be marked as unhealthy and removed from the load balancer, even though they are still capable of serving traffic.
  • Unhealthy Instance Overload: When instances are marked as unhealthy by a load balancer, the remaining healthy instances may become overloaded if throttling policies are not properly configured. This can lead to a cascading failure scenario where more and more instances are marked as unhealthy due to the increased load.
  • Sticky Sessions: If your application uses sticky sessions (session affinity) for user sessions, and your throttling policies are not consistently applied across all instances, it can lead to inconsistent user experiences or session loss.
  • Cache Invalidation: Aggressive throttling or load shedding policies can lead to more frequent cache invalidations, which can impact performance and increase the load on your backend systems.
  • Upstream Service Overload: If your application relies on upstream services or APIs, and your throttling policies are not properly coordinated with those services, you may end up overloading those systems and causing cascading failures.
  • Insufficient capacity of the Failover: The failover servers must possess adequate capacity to manage the entire expected traffic load from the primary servers.
  • Monitoring Challenges: Throttling and load shedding policies can make it more difficult to monitor and troubleshoot performance issues, as the metrics you’re observing may be skewed by the throttling mechanisms.
  • Delays in Updating Throttling Policies: The policy adjustments for throttling and load shedding should be capable of updating at runtime swiftly to adapt to various traffic patterns..
  • Balancing Load based on number of connections: When directing incoming traffic based on the host with the least number of connections, there’s a risk of unhealthy hosts will have fewer connections due to their quick error responses. Consequently, the load balancer may direct more traffic towards these hosts, resulting in a majority of requests failing. To counteract this, it’s essential to employ robust Layer 7 health checks that comprehensively assess the application’s functionality and dependencies. Layer 4 health checks, which are susceptible to false positives, should be avoided. The unhealthy host should be removed from the available pool as quickly as possible. Additionally, ensuring that error responses from the service have similar latency to successful responses can serve as another effective mitigation strategy.

To mitigate these issues, it’s essential to carefully coordinate your throttling and load shedding policies with the autoscaling, load balancing, caching, and monitoring strategies. This may involve tuning thresholds, implementing consistent policies across all components, and closely monitoring the interaction between these systems. Additionally, it’s crucial to thoroughly test your configurations under various load conditions to identify and address potential issues before they impact your production environment.

6. Monitoring Metrics and Notifications

Here are some common metrics and alarms to consider for throttling and load shedding:

6.1 Network Traffic Metrics:

  • Incoming/Outgoing Bandwidth: Monitor the total network bandwidth to detect abnormal traffic patterns.
  • Packets per Second (PPS): Track the number of packets processed per second to identify potential DDoS attacks or traffic bursts.
  • Connections per Second: Monitor the rate of new connections being established to detect potential connection exhaustion or DDoS attacks.

6.2 Application Metrics:

  • Request Rate: Track the number of requests per second to identify traffic spikes or bursts.
  • Error Rate: Monitor the rate of errors or failed requests, which can indicate overloading or application issues.
  • Response Time: Measure the application’s response time to detect performance degradation or latency issues.
  • Queue Saturation: Monitor the lengths of queues or buffers to identify potential bottlenecks or resource exhaustion.

6.3 System Metrics:

  • CPU Utilization: Monitor CPU usage to detect resource contention or overloading.
  • Memory Utilization: Track memory usage to identify potential memory leaks or resource exhaustion.
  • Disk I/O: Monitor disk read/write operations to detect storage bottlenecks or performance issues.

6.4 Load Balancer Metrics:

  • Active Connections: Monitor the number of active connections to the load balancer to identify potential connection exhaustion.
  • Unhealthy Hosts: Track the number of unhealthy or unresponsive hosts to ensure load balancing efficiency.
  • Request/Response Errors: Monitor errors related to requests or responses to identify issues with backend services.

6.5 Alarms and Notifications:

  • Set up alarms for critical metrics, such as high CPU utilization, memory exhaustion or excessive error rates. For example, send alarms when error rate > 5% or response code of 5XX for consecutive 5 seconds or data points.
  • Set up alarms for high latency, e.g., P90 latency exceeds 50ms for more than 30 seconds.
  • Establish fine-grained alarms for detecting breaches in customer service level agreements (SLAs). Configure the alarm thresholds to trigger below the customer SLAs and ensure they can identify the affected customers.

6.6 Autoscaling Policies:

  • CPU Utilization-based Scaling: Scale out or in based on CPU usage thresholds to handle traffic bursts or DDoS attacks.
  • Memory Utilization-based Scaling: Scale resources based on memory usage to prevent memory exhaustion.
  • Network Traffic-based Scaling: Scale resources based on incoming or outgoing network traffic patterns to handle traffic spikes.
  • Request Rate-based Scaling: Scale resources based on the rate of incoming requests to maintain optimal performance.

6.7 Throttling / Load Shedding Overhead:

  • Monitor the processing time for throttling and load shedding, accounting for any communication overhead if the target host is unhealthy. Keep track of the time to ascertain priority, identify delays in processing, and ensure that high delays only impact denied requests.
  • Monitor the system’s utilization and identify when it reaches its capacity.
  • Monitor the observed target throughput at the time of the request.
  • Monitor the time taken to determine if load shedding is necessary and track when the percentage of denied traffic exceeds X% of incoming traffic.

It’s essential to tailor these metrics and alarms to your specific application, infrastructure, and traffic patterns.

7. Summary

Throttling and Load Shedding offer effective means for managing traffic for online services to maintain high availability. Traffic patterns may vary in characteristics like rate of requests, concurrency, and flow patterns. Understanding these shapes, including normal, peak, off-peak, burst, and seasonal traffic, is crucial for scaling and ensuring high availability. Requests can be classified based on contextual factors, enabling appropriate measures such as throttling or load shedding.

Throttling manages traffic flow or resource usage to avoid overload, whereas load shedding prioritizes tasks during periods of high demand. These methods can complement other strategies such as admission control, request classification, backpressure management, and redundancy. However, their implementation requires careful monitoring, notification, and thorough testing to ensure effectiveness.

8. References

May 9, 2023

Applying Domain-Driven Design and Clean/Onion/Hexagonal Architecture to MicroServices

Filed under: Computing,Design — admin @ 8:41 pm

1. Abstract

In software design, modular design facilitates building large systems by decomposing functionality into independent modules where each module defines an interface for the behavior it implements. The modular design evolved into component-based design that emphasized separation of concerns and into distributed systems, which gave rise to web services, service-oriented architectures and event-driven architectures. This evolution led to Microservices architecture in which each service defines a bounded-context for the business domain of its functionality. Each service is autonomous, agile, loosely coupled, resilient, reliable, independently deployable and scalable. This architecture encourages use of abstraction, single-responsibility, DRY, dependency-inversion, common-closure, common-reuse, release-equivalence and persistence Ignorance principles. The software development teams often use Inverse Conway Maneuver to define a clear ownership of the service, which improves developer velocity. As cloud computing gained wider adoption over the last 15 years, microservice architecture was extended with the architecture of cloud native applications (CNA), which offer properties of Isolated State, Distribution, Elasticity, Automation, and Loose Coupling (IDEAL). The extended benefits of CNA and Microservice architecture include:

  • Fit for purpose
  • Rightsized and modular
  • Elasticity
  • Sovereign and tolerant
  • Resilient and protected
  • Controllable and adaptable
  • Workload-aware and resource-efficient
  • Agile and tool-supported
  • Observability including metric, tracing, and logging
  • Resilience
  • Availability
  • Independent, autonomous
  • Zero-Trust Security
  • Automation
  • Decentralized governance

2. Applying Domain-Driven Design

Following sections examines primary concepts from the domain driven design:

2.1 Layers

The domain-driven design by Eric Evans simplifies the architecture of microservices, which builds upon layer architecture such as:

  • presentation layer for user-interface.
  • application layer for use-cases that define the behavior.
  • domain layer for representing business rules and domain model.
  • infrastructure layer for data access and persistence for the domain objects based on Persistence Ignorance and Infrastructure Ignorance principles.
DDD Layers

2.2 Domain Model

The software development team and domain experts define model using workshop based Event Storming. The domain layer employs defines following types of model:

  • Entity is a mutable object defined not by their attributes, but rather by a thread of continuity and identity.
  • Value object is an immutable object defined by their attributes instead of an identifier.
  • Domain events to notify data update.

The domain-driven design recommends rich behavior in entity objects in addition to the data attributes and cautions against AnemicDomainModel that only hold data attributes.

2.3 Aggregates

The entities and values can be clustered into aggregates that become a unit for retrieving and persisting data together. An aggregate entity becomes root for controlling lifecycle and access to the objects inside its boundary.

2.4 Ubiquitous Language

The domain model employs Ubiquitous Language to bring domain experts and software development team together and eliminate inaccuracies, contradictions and confusion from the model.

2.5 Services

The services define high-level business logic that doesn’t fit within the domain objects. The services are generally designed as stateless with clearly defined interfaces.

2.6 Repositories

The repositories implement data persistence logic for retrieving and persisting aggregate and entity objects.

2.7 Factories

The factories help create complex objects, values and aggregates.

2.8 Bounded Context

The Bounded Context defines the boundaries of the domain model, which may consists of other sub-domains. This becomes foundation for the boundary of microservices, where each service is cohesive and loosely coupled that avoids chatty communication between microservices.

2.9 Context Map

The context map help define boundaries of bounded context explicitly to prevent Big Ball of Mud architecture with following patterns:

  • Shared Kernel shares a common domain model between teams.
  • Partnership with mutual dependency between teams.
  • Customer-Supplier defines an interface that supplier implements and consumer consumes it.
  • Open Host Service / Published Language relies on well documented or readily available information for integration.
  • Conformist where the downstream team conforms to the model of the upstream without any translation of models.
  • Anticorruption Layer isolates and abstracts the downstream’s models from external system’s models by translation.

3. Applying Hexagonal Architecture

The hexagonal architecture or ports & adapter architecture by Alistair Cockburn defines ports to receive incoming requests, which is then translated to internal message or procedure by an adapter. Similarly, when the application need to connect to external systems on the driven side, it sends a message through a port to an adapter. The port and adapter architecture decouples driver side and driven side from the implementation technology.

Hexagonal Architecture

The port uses a protocol or an application program interface (API) for communicating with the application, which is then translated by the adapter for internal consumption. When the application needs to connect to external systems such as database, it goes through similar port or interface and is then translated into underlying database protocol by the adapter. This architecture essentially uses Dependency Inversion and Inversion of control Principles by only depending on the ports and decoupling external and internal components from the implementation technology. The application is the core of the system that defines use-cases that can be triggered by CLI or UI. The application layer internally contains commands, handlers and services, which receives commands or queries from ports and communicates with external systems via ports and adapters. The application layer may trigger application events as an outcome of a use-case. The domain layer defines domain model and domain specific services, which are used by the application layer. The driver or primary side in hexagonal architecture allows users to initiate communication with the application core and the driven or secondary side within the application core initiates communication with external dependent systems.

4. Applying Onion Architecture

The Onion Architecture defines concentric circles for layers where all code can depend on layers more central, but code cannot depend on layers further out from the core. The Domain Model represents the state and behavior combination that models truth for the organization. The number of layers in application core will vary but it has domain model at the center. The interfaces for repository to to retrieve and persist data surrounds domain model and the interfaces for repository are defined in the application core. The Onion Architecture uses the Dependency Inversion principle to inject implementations for the interfaces defined in the application core.

Onion Architecture

5. Applying Clean Architecture

The Clean architecture defines concentric circles to represent different areas of software and uses dependency rule to point dependency inwards.

Clean Architecture
  • The entities encapsulate business rules with behavior and data structure.
  • The use-cases encapsulates application specific use-cases and orchestrates flow of data to and from the entities.
  • The interface adapters convert data from the use cases and entities to external systems, which are used by presenters, views and controllers.
  • The frameworks and drivers layer is composed of frameworks and tools such as the database and web framework.

The Clean Architecture uses Dependency Inversion Principle to communicate across boundaries with interfaces and inner circle does not depend on outer circle.

6. Related Design Patterns

6.1 Model-driven architecture

Model-driven engineering and Model-driven architecture facilitate domain-driven design by generating source code, documentation, tests, etc. from the domain model.

6.2 Command Query Responsibility Segregation

Command Query Responsibility Segregation (CQRS) coined by Bertrand Meyer generalizes message-driven and event-driven architecture by segregating behavior for querying the data and updating the data.

6.3 Event sourcing

Event sourcing tracks internal by reading and committing events to an event store.

7. Putting it all together

Following sections describe how a library management system can be built with the domain driven and hexagonal/clean architecture:

7.1 User Stories

Following is a list of primary user-stories that will be implemented for the library management system:

  • As a library administrator, I want to add a book to the collection so that patrons of the library may checkout it.
  • As a library administrator, I want to remove a book from the collection so that it’s no longer available to borrow.
  • As a library administrator or a patron, I want to search books based on different criteria such as title, author, publisher, dates, etc. so that I may see details or use it to checkout the book later.
  • As a patron, I want to checkout a book so that I can read it and return later.
  • As a patron, I want to return a checked out book when I am done with reading.
  • As a patron, I want to hold a book, which is not currently available so that I can checkout later.
  • As a patron, I want to cancel the hold that previously made when I am no longer interested in the book.
  • As a patron, I want to checkout the book hold that previously made so that I can read it.

7.1.1 Constraints and Validation Policies

In addition, the library may impose certain policies and restrictions on the books and checkout/hold actions such as restricted book can be held by a researcher patron or limit the number of books that can be held or checkout at a time.

7.2 Layered Architecture

The library management application divides the application into multiple domains such as patrons, catalog, checkout and hold. Each domain then divides into following layers:

7.2.1 Application-Service and Controller Layer

This layer defines remote APIs for communicating with the microservices defined in the library management application.

7.2.2 Command and Query Layer (CQRS)

This layer defines operations for commands and queries that are invoked by the controller layer, which depend on underlying domain service layer. The command layer also defines the scope of a transaction so that all changes are persisted atomically.

Note: This layer may use SAGA pattern to handle distributed transactions when you need to invoke multiple services or databases for performing an operation.

7.2.3 Domain Service Layer

This layer defines additional business behavior that is built upon the domain model layer and is used by the commands and queries layer

7.2.4 Domain Model Layer

This layer defines data and behavior of the domain and defines entity, value, aggregates, factories, and interface to model the domain.

7.2.5 Infrastructure Layer

This layer defines repositories and gateways to persist domain entities and connect to external services such as messaging and logging.

7.3 Domain Model

Following domain model was defined as a result of above use-stories and an event-storming exercise:

Class Diagram

7.3.1 Party Pattern

Above design uses party-pattern to model patrons, library administrator and library branches because they share a lot of common attributes to describe people and organizations. The base Party class uses Address class to store physical address, so the Party class acts as an Aggregate for all related data about people and organizations.

7.3.2 Book

The book class models a library book that can be added to the library collection, queried by the administrators or patrons and then checked out or held by the patrons.

7.3.3 Checkout

The Checkout class abstracts the data when checking out a book, which can be returned later.

7.3.4 Hold

The Hold class abstracts the data when holding a book that is not currently available so that it can be checked out later.

Note: The domain driven design considers anemic domain without business behavior an antipattern so above domain model defines invariant business rules and behavior along with the data attributes.

7.4 Components and Modules

The library management application was divided into following modules:

components
Component Diagram

The core, utils and gateway module is shared by other modules; the parties and books module define low-level modules and catalog, patrons, checkout and hold modules define high-level modules, which also act as bounded context for managing books-catalog, patrons for managing library members and checkout/hold modules for defining behavior for the library operations.

7.4.1 core module

The core module abstracts common domain model, domain events and interfaces for command pattern, repository and controllers.

7.4.2 parties module

The parties module defines domain model for the party class and data access methods for persisting and querying parties (people and organizations).

7.4.3 books module

The books module defines domain model and data transfer model for books as well as repository for persisting and querying books using AWS DynamoDB.

7.4.4 patrons module

The patron module built upon the parties module and defines service sub-module for business logic to query and persisting patrons. The patron module also includes controller, command classes and binary/main module to define microservices based on AWS Lambda.

7.4.5 checkout module

The checkout module implements services for checking out and returning book, which are then made available as microservices using controller, command and binary sub-modules.

7.4.6 hold module

The hold module implements services for holding a book or canceling/returning it later, which are then made available as microservices using controller, command and binary sub-modules.

7.4.7 gateway module

The gateway module defines interfaces to connect to external services such as AWS CloudWatch for managing metrics and AWS SNS for publishing events from the domain and user-action changes.

7.5 Domain-Driven Design Patterns

The library management systems applies following design patterns from the domain driven design and hexagonal/clean architecture:

7.5.1 Ubiquitous Language

The domain model uses the same terminology used by the stakeholders and experts from underlying problem space such as library patrons, books, checkout, hold, etc. so that software development team can model the business problem as close as possible.

7.5.2 Domain Events

The domain events capture data change in the domain model specifically aggregate objects. This decouples domains in different bounded context as other domains can listen to the domain events asynchronously and make a local change. Following is an example of domain events in the library management systems:

#[derive(Debug, PartialEq, Serialize, Deserialize)]
pub(crate) struct DomainEvent {
    pub event_id: String,
    pub name: String,
    pub group: String,
    pub key: String,
    pub kind: DomainEventType,
    pub metadata: HashMap<String, String>,
    pub json_data: String,
    #[serde(with = "serializer")]
    pub created_at: NaiveDateTime,
}

7.5.3 Aggregates and Event Stream

The modules and domain model communicate with each other using event streams, which is built upon AWS SNS service underneath, e.g.,

impl EventPublisher for SESPublisher {
    async fn publish(&self, event: &DomainEvent) -> Result<(), LibraryError> {
        let topic = self.topics.get(event.name.as_str());
        if let Some(arn) = topic {
            let json = serde_json::to_string(event)?;
            self.client.publish().topic_arn(arn).message(json).send().await?;
            Ok(())
        } else {
            Err(LibraryError::runtime(format!("topic is not found {}", event.name).as_str(), None))
        }
    }
}

Following example depicts publishing an event as a result of checking out a book:

    async fn checkout(&self, patron_id: &str, book_id: &str) -> LibraryResult<CheckoutDto> {
        let patron = self.patron_service.find_patron_by_id(patron_id).await?;
        let book = self.catalog_service.find_book_by_id(book_id).await?;
        if book.status() != BookStatus::Available {
            return Err(LibraryError::validation(format!("book is not available {}",
                                                        book.id()).as_str(), Some("400".to_string())));
        }
        if book.is_restricted() && patron.is_regular() {
            return Err(LibraryError::validation(format!("patron {} cannot hold restricted books {}",
                                                        patron.id(), book.id()).as_str(), Some("400".to_string())));
        }
        let checkout = CheckoutDto::from_patron_book(self.branch_id.as_str(), &patron, &book);
        self.checkout_repository.create(&CheckoutEntity::from(&checkout)).await?;
        let _ = self.events_publisher.publish(&DomainEvent::added(
            "book_checkout", "checkout", checkout.checkout_id.as_str(), &HashMap::new(), &checkout.clone())?).await?;
        Ok(checkout)
    }

The events_publisher publishes the domain events upon checking out a book. Similar events are published for other user-actions or domain changes.

7.5.4 Domain Services

The domain services define high-level business logic and each bounded context defines a layer for domain services such as:

pub(crate) trait CatalogService: Sync + Send {
    async fn add_book(&self, book: &BookDto) -> LibraryResult<BookDto>;
    async fn remove_book(&self, id: &str) -> LibraryResult<()>;
    async fn update_book(&self, book: &BookDto) -> LibraryResult<BookDto>;
    async fn find_book_by_id(&self, id: &str) -> LibraryResult<BookDto>;
    async fn find_book_by_isbn(&self, isbn: &str) -> LibraryResult<Vec<BookDto>>;
}
pub(crate) trait CheckoutService: Sync + Send {
    async fn checkout(&self, patron_id: &str, book_id: &str) -> LibraryResult<CheckoutDto>;
    async fn returned(&self, patron_id: &str, book_id: &str) -> LibraryResult<CheckoutDto>;
    async fn query_overdue(&self, predicate: &HashMap<String, String>,
                           page: Option<&str>, page_size: usize) -> LibraryResult<PaginatedResult<CheckoutDto>>;
}
pub(crate) trait HoldService: Sync + Send {
    async fn hold(&self, patron_id: &str, book_id: &str) -> LibraryResult<HoldDto>;
    async fn cancel(&self, patron_id: &str, book_id: &str) -> LibraryResult<HoldDto>;
    async fn checkout(&self, patron_id: &str, book_id: &str) -> LibraryResult<HoldDto>;
    async fn query_expired(&self, predicate: &HashMap<String, String>,
                           page: Option<&str>, page_size: usize) -> LibraryResult<PaginatedResult<HoldDto>>;
}

7.5.5 Repositories

The library management application uses repository pattern to persist or query data, which can be implemented based on any supported implementation (such as DynamoDB). Also, it uses polymorphic associations for managing inheritance, e.g. parties DynamoDB table can store patrons, administrators and library branches. The repository implementation can be pointed to a local DynamoDB or AWS managed DynamoDB service, e.g.,

#[async_trait]
impl Repository<BookEntity> for DDBBookRepository {
    async fn create(&self, entity: &BookEntity) -> LibraryResult<usize> {
        let table_name: &str = self.table_name.as_ref();
        let val = serde_json::to_value(entity)?;
        self.client
            .put_item()
            .table_name(table_name)
            .condition_expression("attribute_not_exists(book_id)")
            .set_item(Some(parse_item(val)?))
            .send()
            .await.map(|_| 1).map_err(LibraryError::from)
    }

    async fn update(&self, entity: &BookEntity) -> LibraryResult<usize> {
        let now = Utc::now().naive_utc();
        let table_name: &str = self.table_name.as_ref();

        self.client
            .update_item()
            .table_name(table_name)
            .key("book_id", AttributeValue::S(entity.book_id.clone()))
            .update_expression("SET version = :version, title = :title, book_status = :book_status, dewey_decimal_id = :dewey_decimal_id, restricted = :restricted, updated_at = :updated_at")
            .expression_attribute_values(":old_version", AttributeValue::N(entity.version.to_string()))
            .expression_attribute_values(":version", AttributeValue::N((entity.version + 1).to_string()))
            .expression_attribute_values(":title", AttributeValue::S(entity.title.to_string()))
            .expression_attribute_values(":book_status", AttributeValue::S(entity.book_status.to_string()))
            .expression_attribute_values(":restricted", AttributeValue::Bool(entity.restricted))
            .expression_attribute_values(":dewey_decimal_id", AttributeValue::S(entity.dewey_decimal_id.to_string()))
            .expression_attribute_values(":updated_at", string_date(now))
            .condition_expression("attribute_exists(version) AND version = :old_version")
            .send()
            .await.map(|_| 1).map_err(LibraryError::from)
    }
  ...
}

7.5.6 Factories

The library management application uses factories to create instance of repositories, event publishers and services based on different implementations, e.g.,

pub(crate) async fn create_checkout_repository(store: RepositoryStore) -> Box<dyn CheckoutRepository> {
    match store {
        RepositoryStore::DynamoDB => {
            let client = build_db_client(store).await;
            Box::new(DDBCheckoutRepository::new(client, "checkout", "checkout_ndx"))
        }
        RepositoryStore::LocalDynamoDB => {
            let client = build_db_client(store).await;
            let _ = create_table(&client, "checkout", "checkout_id", "checkout_status", "patron_id").await;
            Box::new(DDBCheckoutRepository::new(client, "checkout", "checkout_ndx"))
        }
    }
}

pub(crate) async fn create_checkout_service(
  config: &Configuration, store: RepositoryStore) -> Box<dyn CheckoutService> {
    let checkout_repo = factory::create_checkout_repository(store).await;
    let catalog_svc = create_catalog_service(config, store).await;
    let patron_svc = create_patron_service(config, store).await;
    let publisher = create_publisher(store.gateway_publisher()).await;
    Box::new(CheckoutServiceImpl::new(config, checkout_repo,
                                      patron_svc, catalog_svc, publisher))
}

7.5.7 Data Transfer Objects

The library management application uses immutable data transfer objects when invoking a business service, a command or a method on controller so that these objects are free of side effects and can be safely shared with other modules in concurrent environment.

7.5.8 CQRS Pattern

The library management application uses command-query separation principle to bridge application services with the domain services. Each command handles a unique behavior implemented by the high-level modules for managing patrons, book-catalog, and checkout/hold behavior, e.g.,

#[derive(Debug, Deserialize)]
pub(crate) struct CheckoutBookCommandRequest {
    patron_id: String,
    book_id: String,
}

#[derive(Debug, Serialize)]
pub(crate) struct CheckoutBookCommandResponse {
    checkout: CheckoutDto,
}

#[async_trait]
impl Command<CheckoutBookCommandRequest, CheckoutBookCommandResponse> for CheckoutBookCommand {
    async fn execute(&self, req: CheckoutBookCommandRequest) -> Result<CheckoutBookCommandResponse, CommandError> {
        self.checkout_service.checkout(req.patron_id.as_str(), req.book_id.as_str())
            .await.map_err(CommandError::from).map(CheckoutBookCommandResponse::new)
    }
}

7.5.9 Application Services/Controller

The application services/controller layer defines remote APIs, which are built on top of AWS Lambda APIs, e.g.,

pub(crate) async fn checkout_book(
    State(state): State<AppState>,
    json: Json<Value>) -> Result<Json<CheckoutBookCommandResponse>, ServerError> {
    let req: CheckoutBookCommandRequest = serde_json::from_value(json.0).map_err(json_to_server_error)?;
    let svc = build_service(state).await;
    let res = CheckoutBookCommand::new(svc).execute(req).await?;
    Ok(Json(res))
}

7.5.10 Bounded Context

As, the Bounded Context defines the boundaries of the domain model, the library system is defines bounded context for managing library members (patrons), managing books (catalog), checkout and hold operations. In addition each domain also decomposes other subdomains that reflect business process within the problem space.

7.5.11 Monads and Error Handling

The library management application uses Result monad for returning results from a service, command or controller so that caller can handle errors properly. In addition, it uses Option monad is used for defining any optional data properties so that the compiler can enforce all type checking.

7.5.12 Main

The high-level modules define a main module, which instantiates the API controllers for remote invocation. The AWS Lambda requires that Rust based Lambda functions are deployed with the binary executable, which is spawned by the Lambda runtime, e.g.,

#[tokio::main]
async fn main() -> Result<(), Error> {
    setup_tracing();

    let state = if DEV_MODE {
        std::env::set_var("AWS_LAMBDA_FUNCTION_NAME", "_");
        std::env::set_var("AWS_LAMBDA_FUNCTION_MEMORY_SIZE", "4096"); // 200MB
        std::env::set_var("AWS_LAMBDA_FUNCTION_VERSION", "1");
        std::env::set_var("AWS_LAMBDA_RUNTIME_API", "http://[::]:9000/.rt");
        AppState::new("dev", RepositoryStore::LocalDynamoDB)
    } else {
        AppState::new("prod", RepositoryStore::DynamoDB)
    };

    let app = Router::new()
        .route("/catalog", post(controller::add_book))
        .route("/catalog/:id",
               get(controller::find_book_by_id).delete(controller::remove_book))
        .with_state(state);

    run(app).await
}

7.6 Code structure

Following tree structure shows the module and code structure for the library management application:

|--- books
|   |--- domain
|   |   |--- model.rs
|   |--- domain.rs
|   |--- dto.rs
|   |--- factory.rs
|   |--- repository
|   |   |--- ddb_book_repository.rs
|   |--- repository.rs
|--- books.rs
|--- catalog
|   |--- bin
|   |   |--- main.rs
|   |--- command
|   |   |--- add_book_cmd.rs
|   |   |--- get_book_cmd.rs
|   |   |--- remove_book_cmd.rs
|   |   |--- update_book_cmd.rs
|   |--- command.rs
|   |--- controller.rs
|   |--- domain
|   |   |--- service.rs
|   |--- domain.rs
|   |--- dto.rs
|   |--- factory.rs
|--- catalog.rs
|--- checkout
|   |--- bin
|   |   |--- main.rs
|   |--- command
|   |   |--- checkout_book_cmd.rs
|   |   |--- return_book_cmd.rs
|   |--- command.rs
|   |--- controller.rs
|   |--- domain
|   |   |--- model.rs
|   |   |--- service.rs
|   |--- domain.rs
|   |--- dto.rs
|   |--- factory.rs
|   |--- repository
|   |   |--- ddb_checkout_repository.rs
|   |--- repository.rs
|--- checkout.rs
|--- core
|   |--- command.rs
|   |--- controller.rs
|   |--- domain.rs
|   |--- events.rs
|   |--- library.rs
|   |--- repository.rs
|--- core.rs
|--- gateway
|   |--- ddb
|   |   |--- publisher.rs
|   |--- ddb.rs
|   |--- events.rs
|   |--- factory.rs
|   |--- logs.rs
|   |--- sns
|   |   |--- publisher.rs
|   |--- sns.rs
|--- gateway.rs
|--- hold
|   |--- bin
|   |   |--- main.rs
|   |--- command
|   |   |--- cancel_hold_book_cmd.rs
|   |   |--- checkout_hold_book_cmd.rs
|   |   |--- hold_book_cmd.rs
|   |--- command.rs
|   |--- controller.rs
|   |--- domain
|   |   |--- model.rs
|   |   |--- service.rs
|   |--- domain.rs
|   |--- dto.rs
|   |--- events.rs
|   |--- factory.rs
|   |--- repository
|   |   |--- ddb_hold_repository.rs
|   |--- repository.rs
|--- hold.rs
|--- lib.rs
|--- library.rs
|--- main.rs
|--- parties
|   |--- domain
|   |   |--- model.rs
|   |--- domain.rs
|   |--- events.rs
|   |--- factory.rs
|   |--- repository
|   |   |--- ddb_party_repository.rs
|   |--- repository.rs
|--- parties.rs
|--- patrons
|   |--- bin
|   |   |--- main.rs
|   |--- command
|   |   |--- add_patron_cmd.rs
|   |   |--- get_patron_cmd.rs
|   |   |--- remove_patron_cmd.rs
|   |   |--- update_patron_cmd.rs
|   |--- command.rs
|   |--- controller.rs
|   |--- domain
|   |   |--- service.rs
|   |--- domain.rs
|   |--- dto.rs
|   |--- factory.rs
|--- patrons.rs
|--- utils
|   |--- date.rs
|   |--- ddb.rs
|--- utils.rs

7.7 Building and Testing

7.7.1 Start locally

docker-compose -f ddb-docker-compose.yaml up

7.7.2 Start Lambda locally

cargo lambda watch

7.7.3 Build

cargo build --release
cargo lambda build --release

7.7.4 Testing catalog Lambdas

Add a book:

curl -H "Content-Type: application/json" http://localhost:9000/catalog -d '{"isbn": "123", "title": "my book"}'

which would return something like:

{
  "book": {
    "dewey_decimal_id": "749",
    "book_id": "a2b25506-2948-47bb-9c4a-cf9ad480c10b",
    "version": 0,
    "author_id": "623a01ca-8ba9-41cd-b8b6-85a5711f8453",
    "publisher_id": "f0cff296-9f6e-4b25-95e1-a783661bf91f",
    "language": "en",
    "isbn": "123",
    "title": "my book",
    "book_status": "Available",
    "restricted": false,
    "published_at": "2023-05-09T20:55:56.073008+00:00",
    "created_at": "2023-05-09T20:55:56.073027+00:00",
    "updated_at": "2023-05-09T20:55:56.073027+00:00"
  }
}

Finding the book by id:

curl -H "Content-Type: application/json" http://localhost:9000/catalog/f58ef32a-6f24-4314-8782-c7ebcad0ab59

that returns

{
  "book": {
    "dewey_decimal_id": "220",
    "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59",
    "version": 0,
    "author_id": "4c24b180-a146-410a-b68c-9d83c57adebc",
    "publisher_id": "88b47029-cad1-443f-8d67-aaf13863e924",
    "language": "en",
    "isbn": "123",
    "title": "my book",
    "book_status": "Available",
    "restricted": false,
    "published_at": "2023-05-09T22:18:25.436359+00:00",
    "created_at": "2023-05-09T22:18:25.436366+00:00",
    "updated_at": "2023-05-09T22:18:25.436371+00:00"
  }
}

7.7.5 Testing patrons Lambdas

Add a patron:

curl -v  -H "Content-Type: application/json" http://localhost:9000/patrons -d '{"email": "test-email@xyz.com"}'

that returns:

{
  "patron": {
    "patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e",
    "version": 0,
    "first_name": "",
    "last_name": "",
    "email": "test-email@xyz.com",
    "under_13": false,
    "group_roles": [],
    "num_holds": 0,
    "num_overdue": 0,
    "home_phone": null,
    "cell_phone": null,
    "work_phone": null,
    "street_address": null,
    "city": null,
    "zip_code": null,
    "state": null,
    "country": null,
    "created_at": "2023-05-09T22:20:28.898831",
    "updated_at": "2023-05-09T22:20:28.898833"
  }
}

Getting patron:

curl -H "Content-Type: application/json" http://localhost:9000/patrons/cf49007e-e7fa-42c3-ac56-e15b9530597e|jq '.'

that returns:

{
  "patron": {
    "patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e",
    "version": 0,
    "first_name": "",
    "last_name": "",
    "email": "test-email@xyz.com",
    "under_13": false,
    "group_roles": [],
    "num_holds": 0,
    "num_overdue": 0,
    "home_phone": "",
    "cell_phone": "",
    "work_phone": "",
    "street_address": null,
    "city": null,
    "zip_code": null,
    "state": null,
    "country": null,
    "created_at": "2023-05-09T22:21:35.142750",
    "updated_at": "2023-05-09T22:21:35.142757"
  }
}

7.7.6 Checkout book Lambda

Checkout a book:

curl -v  -H "Content-Type: application/json" http://localhost:9000/checkout -d '{"patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e", "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59"}'

that returns:

{
  "checkout": {
    "checkout_id": "4a7ea5c5-939d-4934-8715-071c7ab5bc71",
    "version": 0,
    "branch_id": "dev",
    "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59",
    "patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e",
    "checkout_status": "CheckedOut",
    "checkout_at": "2023-05-09T22:36:55.162807+00:00",
    "due_at": "2023-05-24T22:36:55.162808+00:00",
    "returned_at": null,
    "created_at": "2023-05-09T22:36:55.162812+00:00",
    "updated_at": "2023-05-09T22:36:55.162812+00:00"
  }
}

Returning a book:

curl -v  -H "Content-Type: application/json" http://localhost:9000/checkout/return -d '{"patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e", "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59"}'

that returns:

{
  "checkout": {
    "checkout_id": "6b432212-8136-45a5-a8c4-953da73ee24f",
    "version": 0,
    "branch_id": "dev",
    "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59",
    "patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e",
    "checkout_status": "Returned",
    "checkout_at": "1970-01-01T00:00:00+00:00",
    "due_at": "2023-05-09T22:36:59.145408+00:00",
    "returned_at": "2023-05-09T22:36:59.145607",
    "created_at": "2023-05-09T22:36:59.145415+00:00",
    "updated_at": "2023-05-09T22:36:59.145421+00:00"
  }
}

7.7.7 Hold book Lambda

Hold a book:

curl -v  -H "Content-Type: application/json" http://localhost:9000/hold -d '{"patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e", "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59"}'

that returns

{
  "hold": {
    "hold_id": "b6cbff12-fe0b-4be0-9566-5e221e52c8c5",
    "version": 0,
    "branch_id": "dev",
    "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59",
    "patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e",
    "hold_status": "OnHold",
    "hold_at": "2023-05-09T22:38:52.905822+00:00",
    "expires_at": "2023-05-24T22:38:52.905822+00:00",
    "canceled_at": null,
    "checked_out_at": null,
    "created_at": "2023-05-09T22:38:52.905825+00:00",
    "updated_at": "2023-05-09T22:38:52.905825+00:00"
  }
}

Canceling a hold:

curl -v  -H "Content-Type: application/json" http://localhost:9000/hold/cancel -d '{"patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e", "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59"}'
that returns:
{
  "hold": {
    "hold_id": "b6cbff12-fe0b-4be0-9566-5e221e52c8c5",
    "version": 0,
    "branch_id": "dev",
    "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59",
    "patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e",
    "hold_status": "Canceled",
    "hold_at": "2023-05-09T22:39:51.920045+00:00",
    "expires_at": "2023-05-09T22:39:51.920052+00:00",
    "canceled_at": "2023-05-09T22:39:51.920078",
    "checked_out_at": null,
    "created_at": "2023-05-09T22:39:51.920058+00:00",
    "updated_at": "2023-05-09T22:39:51.920063+00:00"
  }
}

Checking out a hold book:

curl -v  -H "Content-Type: application/json" http://localhost:9000/hold/checkout -d '{"patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e", "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59"}'

that returns:

{
  "hold": {
    "hold_id": "f5fdb835-5ea2-428d-af12-a81ffb1b3f35",
    "version": 0,
    "branch_id": "dev",
    "book_id": "f58ef32a-6f24-4314-8782-c7ebcad0ab59",
    "patron_id": "cf49007e-e7fa-42c3-ac56-e15b9530597e",
    "hold_status": "CheckedOut",
    "hold_at": "2023-05-09T22:40:54.705417+00:00",
    "expires_at": "2023-05-09T22:40:54.705424+00:00",
    "canceled_at": null,
    "checked_out_at": "2023-05-09T22:40:54.705443",
    "created_at": "2023-05-09T22:40:54.705430+00:00",
    "updated_at": "2023-05-09T22:40:54.705435+00:00"
  }
}

8. Deployment and Infrastructure as a Code

In order to fully automate deployment, the library management system uses AWS CDK to build Dynamo DB tables, CloudWatch and AWS Lambda functions along with other security policies. You can deploy the infrastructure as follows:

8.1 Install CDK

npm install -g typescript
npm install aws-cdk-lib
npm install -g aws-cdk

8.2 Deploy

cd cdk
cdk deploy

8.3 Destroy

If you need to remove all infrastructure, simply run:

cd cdk
cdk destroy

9. Summary

A sample library management system demonstrates how to apply domain driven design and hexagonal/clean architecture to build microservices. It is implemented in Rust and uses AWS Dynamo DB, AWS SNS, AWS CloudWatch and AWS Lambda to build modern microservices. The sample domain-driven application also uses AWS CDK to manage infrastructure as a code so that you can deploy services consistently across all environments. You can download the sample application from https://github.com/bhatti/ddd-sample-microservice.

PS: The library management system is a sample application to showcase the domain-driven and hexagonal/clean architecture but you can read Building a Secured Family-friendly Password Manager and Building a Hybrid Authorization System for Granular Access Control for learning these concepts on a bit larger open source applications available at https://github.com/bhatti/PlexPass and https://github.com/bhatti/PlexAuthZ.

February 14, 2020

Review of the “Database Internals”

Filed under: Design,Technology — admin @ 8:06 pm

The database internals is an excellent resource for deep dive into storage engines and distributed systems. The first chapter introduces OLTP, OLAP and HTAP databases. It reviews database architecture and components including transport, query processor, storage engine, transaction manager, lock manager, access methods, buffer manager and recovery manager. The storage may use in-memory store or disk store and some in-memory database use disk for backup, which is updated asynchronously. The chapter reviews row-oriented and column-oriented databases along with data files and index files.

The second chapter covers B-Trees that is often used with disk based storage engines. The chapter introduces binary search trees (BST) and balance trees. However, such BST data structures use add elements in random order and are not optimized for disk storage as parent and child nodes can be stored in different regions of memory. Also, height of BST may limit the search in O(log N) operations. The chapter reviews architecture of hard drives such as SSD and B-Tree data structures where each node can hold up to N keys and N + 1 pointers to the child nodes. The nodes are grouped into root-node, leaf-nodes and internal nodes where each node is used for fixed-size page. Keys in B-Tree nodes are called index entries, separator keys or divider cells and they split the tree into subtrees holding key ranges. B-Trees are based on N logarithm base and there are K times more nodes on each new level. During lookup at most logk (M) (where M is total number of items) pages are fetched to find a searched key. In order to insert a value, it finds target leaf and key/value are appended to it. The node may need to be split if there isn’t enough room. Similarly, deletions find target leaf and key/value are removed. The deletion may result in node merges if neighboring nodes are too few.

The chapter three covers file format for B-Trees for disk. It reviews binary encoding and primitive types, strings and general principles of file format such as header, page-data and trailer. The page format can be fixed or variable size but variable size may incur more overhead. Also, variable size page must reclaim space when records are removed and reference records in page without regard to their exact locations. The variable-size pages generally use slotted page structure that has headers, list of pointers and list of variable size cells where each cell stores flags, key/data size, page-id and byte data. Removing an item may just mark the cell as deleted and reclaim later. The insertion may use first-fit or best-fit strategy to find the free blocks. Also, headers may store version and checksum for data validation.

The chapter four shows how to implement B-Trees, e.g. page-header may store flags, number of cells, magic number, etc. Some implementations of B-Trees store sibling pointers (forward/backward) to locate neighboring nodes but it adds complexity in split/merge. BTrees also store one additional pointer to child pages than the number of keys:

+--------------------------+------------+
|                          |  Separator | ---> Ks >= K3
+--------------------------+------------+
|  K1       |  K2            |    K3    |
+--------------------------+------------+
Ks < K1    K1<=Ks<=K2     K2 <= Ks < K3

Alternatively, you can store rightmost pointer in the cell along with high key. Each node in B-Tree is designed to keep a specific number of items and resizing may require copying data so in order to avoid copying, they can use extension/overflow page and link it form the original page. B-Trees keeps keys in order so that they can use binary search and insertion point is index of the first element that is greater than the given key. Some implementations may store parent pointers in nodes or use breadcrumbs to store path of leaf node in case they need to split/merge. B-Tree implementations may postpone split/merge later, create a new right-most node or use other algorithms to improve re-balancing. B-Trees may also apply compression at various granularity levels and perform maintenance to fix fragmented data or garbage collect non-addressable data (vacuum).

The chapter five reviews transaction processing and introduces concepts of ACID and page caching so that modifications can be done in memory. The pages can be brought in if they are not in memory and evicted/flushed to disk when there isn’t enough memory (with O_DIRECT lag to bypass kernel cache). After page modifications, it’s marked as dirty so that it can be flushed for durability. These modifications are coordinated with the write-ahead-log (WAL) so that data can be recovered if the server crashes (referred as checkpoint). As splits/merge may require multiple writes, B-Tree  can lock pages that have high probability of being used, called pinning and pinned pages are kept in memory. The I/O operations can be buffered to reduce disk I/O. Based on available memory, B-Tree may need to evict old pages when new pages cannot fit in memory and there are a variety of algorithms for eviction policies (page replacement) such as FIFO, LRU, CLOCK (references in circular buffer), LFU, etc. B-Trees use write-ahead log (WAL) to buffer changes to page-contents. These changes to WAL are flushed with fsync, but due to certain error conditions in fsync it may not report errors if they were cleared and it can result in loss of data. B-Tree implementations may use seat/no-steal and force/no-force policies to determine when changes are flushed on disk and they impact undo/redo behavior. The steal policy allows flushing a page without committing a transaction and a force policy requires all pages modified by the transaction to be flushed before the transaction commits.  The chapter explains ARIES algorithm, which is steal/no-force recovery algorithm, uses physical redo to improve performance and logical undo to improve concurrency and uses WAL records to implement repeating history. ARIES uses LSN (log sequence numbers) to identify log records, track pages in dirty page table and use physical undo/logical undo. The chapter reviews concurrency controls such as optimistic concurrency control, multi-version concurrency control and pessimistic concurrency control (using lock and no lock). The chapter reviews transaction isolation and read/write anomalies such as dirty read (uncommitted updates), non-repeatable read (querying again), phantom read (range queries), lost update (last writer wins), dirty write (takes dirty reads), write skew (double spending). Th isolation include read-uncommitted that allows dirty, phantom and non-repeatable reads; read-committed that prevent dirty reads; repeatable that prevent non-repeatable reads but allow phantom reads; serializable level that executes transactions serially and prevent phantom reads. Serializable isolation is difficult to implement and some databases use snapshot isolation to observe all transaction committed since the start time. The snapshot isolation prevents lost update but it’s still susceptible to write skew. Optimistic concurrency control validates transaction before writing and works if retries can be prevented, but it still needs to manage a critical section. Multi-version concurrency control uses monotonically incremented transaction IDs or timestamps and is used to prevent access to uncommitted values. Pessimistic concurrency control can use locks or simple timestamps that it checks to ensure that no other transaction has been committed with higher timestamp. The database maintains max_read_timestamp/max_write_timestamp and read operations with older timestamp are aborted and write operations with lower than max_read_timestamp would conflict but write operations with older than max_write_timestamp are allowed (Thomas Write Rule). Lock-based concurrency control uses locks such as two-phase locking where growing phase all locks are acquired and shrinking phase, where all locks are released after the transaction. Locks can lead to deadlocks so you need timeout to abort long running transactions. The chapter describes distinction between locks and latches where locks are used to isolate and schedule overlapping transactions and latches guard physical B-tree contents (leaf/non-leaf). The latches can use reader-write locks (busy-wait/CAS) and latch crabbing determines to minimize holding time.

The chapter six goes over different types of B-Tree design and implementations. For example, some B-Trees use copy-on-write to copy contents in new shadow tree instead of using synchronization and latches and the pointer to top most page is atomically updated after the update (LMDB). In order to update the page on disk, the in memory representation is updated first using cached version, native pointers (unmanaged languages), language specific structures or using wrapper object to update disk as soon as B-Tree is updated. Lazy B-Trees reduce cost of updating, e.g. WiredTiger different format for in-memory and on-disk pages and updates are first saved in update buffer to reduce I/O. Lazy-Adaptive Tree group nodes into subtrees and attach buffer for batch operations to each subtree. FD-Trees append all changes to a small mutable head tree and multiple immutable sorted runs and use fractional cascading to maintain pointers between levels along with logarithmically sized sorted runs. In order to reduce write amplification, Buzzword-Tree (Bw) uses batch updates using append-only storage. Bw-Tree use compare-and-swap operations instead of synchronization. Cache-Oblivious B-Tree use cache-oblivious structures that give asymptotically optimal performance regardless of underlying memory structure. Cache-oblivious algorithms optimize two levels of hierarchy: page ache and disk and partition disk into blocks that page-location is cache aware. It uses platform parameters so that transfer between page-cache and disk is within constant factor.

The chapter seven discusses Log-Structured storage such as immutable LSM Trees that use append-only storage and merge trees. As B-Trees have high write amplification, LSM trees provide an alternative by using buffering and append-only storage. LSM Trees write immutable files and merge them together over time. LSM Trees use smaller in-memory buffer (memtable) and large disk. A separate write-ahead-log is appended and committed before in-memory operation is acknowledged to the client. After the disk flush, memory and disk sub-trees are discarded and replaced with the result of their merge. In LSM trees, redundant records are reconciled during the read and tombstones are used to mark deleted records. Some implementations use predicate deletes for range of keys to remove records. LSM may use compaction to optimize access such as leveled compaction used by RocksDB where level-0 tables are created by flushing memtable contents and then contents are merged later to create level-1. Some LSM trees use size-tiered compaction that group disk tables based on size or use time window for compaction (used by Cassandra). As opposed to B-Trees that are read-optimized, LSM trees do not require locating the record on disk during write but reads are more expensive with default configuration. The chapter then reviews sorted string tables (SSTables) that are often used to implement disk-resident tables. SSTables consists of index files and data files where index files use B-Trees or hash tables and data files holds data in key order and uses hash tables or other similar data structures for lookup/range queries. During compaction, data files can be read sequentially and merge iteration is order preserving so merge table can be created in a single run. The chapter introduces bloom filters test whether an element is a member of the set. The chapter then reviews Skiplist for keeping sorted data and use probabilistic balancing. A skip list builds hierarchy of linked-list at different heights where each node has more than one successor that point to nodes at lower-levels.

The chapter eight is part of second half of the book that focuses on distributed system. It introduces concepts of concurrency and parallelism where concurrent executions can interleave and shared state must be protected whereas parallel operations are executed by multiple processors. The chapter defines system reliability in terms of presence of fault tolerance and discusses fallacies of distributed computing (published by Peter Deutsch). In real applications, processing and latency time is not instantaneous and queue capacity is not infinite that also requires back-pressure. The queue size is determined by measuring task processing time and average time each task spends in the queue. Distributed system also have to deal with clock/time differences on multiple machines and state consistency such as read-time data repair or eventually consistent systems. Detecting failures in distributed systems is hard and requires heartbeat protocols and network partitions can result in partial failures. The chapter explains cascading failures that can propagate from one part of the system to another. You can use exponential backoff strategy and jitter to avoid amplifying problems. The messages can get lost, delayed or reordered in a distributed systems and sender may retry but it does not know if the message is already delivered, e.g. in fair-loss link a sender keeps retrying send infinitely; finite duplication won’t send messages finitely; and no-creation link won’t send the message the was never sent. Distributed systems use acknowledgments to notify the sender using sequence numbers and sender may re-transmit in absence of ack (stubborn link resend messages indefinitely). In order to prevent duplicate processing as a result of re-transmission, you can use idempotent operations. In distributed systems, messages can arrive out of order and recipient may use sequence to detect out of order message and put it in a buffer until earlier message arrives. The perfect link guarantees reliable delivery without duplication and no-creation (only deliver messages that were actually sent). Exactly-once delivery is very hard in distributed systems and most real applications use at-least-once delivery (at-most-once is not reliable). The chapter describes two-general’s problem to show link failures when communication is asynchronous even with perfect delivery as participants may not be alive or connected. This problem shows that no matter how many ACK you use, you can never be sure if message was delivered to both parties. This was further proved by FLP Impossibility problem that you can never guarantee consensus in a bounded time with asynchronous communication. The chapter finally discusses failure models such as crash faults, omission faults (skips execution of certain steps), arbitrary faults (byzantine faults), etc and you can use process groups and redundancy to mask these failures from user.

The chapter nine discusses failure detection, where a failure detector identifies failed or unreachable processes to exclude them  from the algorithm and guarantee liveness while preserving safety. Most distributed systems use heartbeats to detect failures, where the process notify its status to peers in response to heartbeat. Each process maintains a list of other processes and updates it with last response time. Some distributed systems use a deadline failure detector that uses heartbeat to detect if a process has failed to register within a fixed time interval. Alternatively, other systems use outsourced heartbeat to improve reliability using information from external perspective. Phi-Accural failure detector use phi-accrual failure detector to calculate probability of process’s crash based on sampling arrival time. Other approaches gossips by maintaining a heartbeat counter and sending heartbeat counter to random neighbor periodically. Another approach arranges active processes into groups where a process failure is detected by participants and the failure is propagated as a group failure.

The chapter ten goes over leader election while maintaining liveness, stability and safety. It starts with bully algorithm that uses process rank (e.g. biggest ip-address) to identify the new leader. However, it can be subjected to split brain and create problems if highest rank node is unstable. Next-in-line failover is another alternative where leader provides a list of failover nodes and next highest-ranked node is selected in case of leader failure. Candidate/Ordinary algorithm splits nodes into groups of candidates and ordinary, where one of the candidate node becomes a leader (picking highest-ranked alive node). Invitation algorithm allows processes to invite other processes to join their groups and smaller groups merged with bigger groups. Ring algorithm use ring topology where each process contacts its successor passing a set of nodes until one of the nodes respond. The highest-ranked node from live set is chosen as a leader. Lastly, you may use consensus algorithms to elect a leader along with failure detection algorithm.

The chapter eleven examines replication and consistency properties such as availability, fault tolerance and redundancy. The chapter reviews CAP theorem where availability requires non failing nodes to deliver results and linearizable consistency preserves the original operation order. In asynchronous system, you cannot guarantee both consistency and availability in presence of network partition so you either have to choose best effort availability or best effort consistency (or sacrifice latency). Also, Cap theorem discusses network partition where a node may serve incorrect response and not node crashes that doesn’t respond at all. The chapter reviews concepts of harvest and yield in context of CAP conjecture where harvest may return partial results and yield compares the number of requests succeeded  against the number of requests attempted. Thus, these properties focus on trade-offs as opposed to the absolute numbers. The distributed systems may abstract message passing and represent state as a shared memory where each unit of storage is called a register.  Each operation is tracked with invocation and completion event and the operation is considered failed if the process crashes before completing the operation. Also, some operations may overlap with other operations and are called concurrent operations. The registers can be categorized into safe (dirty/non-repeatable read), regular (repeatable), atomic (linearizable). The consistency model provide different semantics and guarantees from the perspective of state and operations in distributed/concurrent systems. For example, strict consistency provides complete replication transparency as if you hold a global lock but it’s impractical in real-life. Linarizability guarantees visibility of the writes to all readers exactly once without exposing partial state. If two operations overlap, all read operations occur after write operation can observe the effect of the operation. It provides total order of operations running concurrently so that every read of the shared value returns latest value written to the shared variable. The linearization point provides atomic guarantee such that the effect of operation becomes visible. Linearizability is expensive to implement as it requireds coordination and ordering but you can use compare-and-swap where you first prepare result and then use CAS for swapping pointers and publish the state. Sequential consistency is a step below Linearizability that executes operations in some sequential order where operations of each individual processes are executed in the same order. In causal consistency, all process see causally related operations in the same order. It can add logical clocks with each message and the operation is processed only if preceding operation is completed. The chapter defines vector clock as a structure for establishing a partial order between the events. Processes maintain vectors of logical clocks, with one clock per process and is incremented every time a new event arrives. In order to resolve conflict, you check duplicate value with same key and append a new version to the version vector and establish the causal relationships. The chapter discusses session models that evaluate consistency from the perspective of client and assume all client operations are sequential. It may use read-own-writes consistency model and monotonic read model that guarantees that you cannot read old value once you have seen new value. The monotonic write model guarantees that write of v2 follows write of v1. The write-follows-read ensures that writes ordered after writes that were observed by previous read operations. Eventual consistency propagates updates asynchronously and latest value is resolved using lat-write-wins or vector clocks. The eventually consistent systems provide parameters to tweak availability and consistency such as replication-factor (N), write-consistency (W) and read-consistency (R) and you can guarantee most recent value by using (R+W > N). You can optimize replication by grouping nodes into copy and witness subsets where witness replicas may store updates if copy replicas are running behind. The chapter ends with discussion of strong eventual consistency and CRDTs that are specialized data structures to guarantee consistency in any order. However, allowed operations have to be side-effect free, commutative, and causally ordered.

The chapter twelve discusses anti-entropy and dissemination of updates in context of broadcast, peer-to-peer and cooperative broadcast. The broadcast to all processes is expensive with large number of nodes and unreliable from a single process. The anti-entropy brings nodes back in sync in case of failures. Entropy measure disorder in the system and anti-entropy brings the nodes back up-to-date when delivery fails. The read repair detects and eliminate inconsistencies. It can be implemented as a blocking or asynchronous operation. Blocking read repairs ensures read monotonicity for quorum reads. Instead of issuing full read request from each node, the coordinator can issue one full read and send digest request to other replicas and then repair reads in case of inconsistencies. Another alternative is hinted-handoff, which is write-side repair mechanism where write coordinator stores hint record and replays to target node when it comes back. Some databases use sloppy quorum along with hinted-handoff where write operations use additional nodes that update crashed node when it comes back. Merkle Trees provide a compact hash representation of the local data. The replicas compare root-level hashes to check for inconsistency. Bitmap version vectors can also be used to resolve data conflict based on recency where logs of operations are kept on each node and are compared with other nodes and missing data is replicated to the target node. Gossip Dissemination use gossip protocols that are probabilistic communication procedure based on how rumors/diseases are spread. It use cooperative propagation to disseminate information where infective node spreads to susceptible nodes, which randomly update neighboring processes. Message redundancy metric is used to capture the overhead of repeated delivery and amount of time to reach convergence is called latency. Push/lazy-push multicast trees make a trade-off between epidemic and tree-based broadcast primitive by creating a spanning tree overlay of nodes to actively distribute messages with least overhead. It sends full message to a small subset of nodes and just message-id to rest and the node can query peer if it doesn’t have the data.

The chapter thirteen reviews distributed transactions. In order to make operations appear atomic, you may use atomic commitment algorithm that provides prepare, commit or rollback operations along with a transaction manager. For example, two-phase commit execute in two phases: prepare and commit/abort where a coordinator collects votes and rest of nodes called cohorts operate over disjoint datasets. In case of cohort failure, the coordinator will replicate decision values based on log. In case of coordinator failure, cohorts will not be able to learn the final decision. In order to make atomic commitment more robust against coordinator failure, three-phase commit adds extra step: propose, prepare and commit/abort. The transaction is aborted in case of coordinator failure or operation time-out. Next, the chapter reviews Calvin approach that uses deterministic transaction order to remove the need for coordination (as opposed to non-deterministic transaction in most databases that use two-phase or optimistic locking). For example, Calvin uses a sequencer that determines the order of transactions and establishes a global transaction input sequence and it may split time into epochs to minimize contention. The chapter discusses data partitioning and consistent hashing that map hashes to a ring and each node get its own position on the ring and becomes responsible for the range of values. If serializability is not required, you may use snapshot isolation that guarantees that all reads made within the same transaction are consistent with a snapshot of the database and only first committer wins when there is a write-write conflict. Lastly, the chapter discusses mechanisms to avoid coordination by preserving data integrity constraints.

The chapter fourteen discusses consensus that focus agreement, validity and termination (reach the decision). The chapter introduces concept of broadcast communication However, it may result in in-consistent state if the coordinator crashes while in the middle of broadcast. Atomic broadcast guarantee reliable delivery (atomicity) and total order. For example, virtual synchrony framework organizes processes into groups and messages to all its members are delivered in the same order. In Zookeeper atomic broadcast, a process takes the role of leader or follower and protocol splits timeline into epochs identified with monotonically increasing sequence number. The atomic broadcast is equivalent to consensus in asynchronous systems with crash failure. Paxos is commonly used algorithms that defines three roles: proposers, acceptors, and learners. It is split into two phases: voting (proposers compete for the leadership) and replication (proposer distributes values to acceptors). When acceptor receives prepare request, it can accept the proposal, respond with previously accepted message, notify proposer if local sequence number is higher. During replication phase, proposer can start the replication by sending Accept message to all acceptors. Paxos use quorum to make sure some participants can fail but still proceed as long as minimus number of votes required for the operation are available. Liveness is guaranteed in the presence of f failed processes and so that given 2f + 1 processes, f processes can fail and f + 1 processes can proceed. Multi-Paxos algorithm introduces role of a leader, a distinguished proposer to improve efficiency. The leader periodically contacts the participants to notify them it’s still alive with a lease timeout so that participants won’t select other leader until lease expires. Fast Paxos algorithm reduces a number of messages and let any proposer contact accepts directly rather than voting through the leader with total 3f + 1 processes. Egalitarian Paxos partitions the system into smaller segments and uses a leader for the commit of a specific command. Flexible Paxoes uses intersection of nodes that are used in propose and accept phase, .e.g given N participants, Q1 nodes for the propose phase to succeed and Q2 nodes for the accept phase to succeed, wen can ensure that Q1 + Q1 > N and Q2 can contain N/2 acceptors and Q1 = N – Q2 + 1. Next, the chapter discusses raft algorithm that makes concept of leader a first-class citizen that coordinates state machine manipulation and replication similar to atomic broadcast and Multi-Paxos that replicates multiple values instead of just one (a single leader makes atomic decisions and establishes message order). Each participant in Raft take the role of candidate, leader (for a term) and follower (similar to acceptor/learner). It divides time into terms/epochs to guarantee global partial order without relying on clock synchronization. Terms are monotonically increasing and each command is uniquely identified by the term number. During leader election, candidates send RequestVote message to other processes including candidate’s term and ID of the last log entry it observed. After collecting a majority of votes, the candidate is selected as the leader for the term. The Raft protocol uses periodic heartbeat to ensure the liveness of the participants and it may start new election after election timeout. The leader repeatedly append new values to the replicated log by sending AppendEntries message that include leader’s term, index and term of the log entry. A leader is elected only if it has the higher term ID than the follower. In case of split vote, Raft uses randomized timers to reduce the probability of multiple subsequent election ending up in a split vote. The leader sends heartbeat to the followers to detect failures and new election can be initiated if leader is down. The leader does not remove or reorder its log contents; it only appends new messages to it.

The chapter then reviews Byzantine consensus where distributed systems are deployed in adversarial environments that is prone to byzantine failures such as ill intentions, bugs, misconfiguration and data corruption. Most Byzantine consensus algorithms require N^2 messages to complete an algorithm step, where N is the size of the quorum. It discusses Practical Byzantine Fault Tolerance (PBFT) that assumes independent node failure but entire system cannot be taken over at once. All communication is encrypted and replicas know one another’s public keys to verify identities. PBFT guarantees both safety and liveness, no more than (n – 1) / 3 replicas can be faulty. For a system to sustain f compromised nodes, it is required to have at least n = 3f + 1 nodes. To distinguish between cluster configuration, PBFT uses view where in each view, one of the replica is a primary and the rest are backup. All nodes are numbered consecutively and the index of the primary node is v mod N where v is the view id and N is the number of nodes. The view can change when the primary fails. Clients execute their operations against the primary that broadcasts the request to the backup, which execute the request and send a response back to the client. The client waits for f + 1 replicas to respond with the same result for any operation to succeed. Replicas save accepted messages in a stable log and it is kept until it has been executed by at least f + 1 nodes. This log can be used for recovery in case of network partition but it is verified to prevent the attack vector. After every N requests, the primary makes a stable checkpoint, where it broadcasts the latest sequence number and waits for 2f+1 replicas to respond, which constitutes a proof for this checkpoint.

March 6, 2018

Tips from the second edition of “Release It!”

Filed under: Design,Methodologies,Technology — admin @ 4:19 pm

The first edition of “Release It!” has been one of most influential books that I have read and it introduced a number of methods for writing fault tolerant systems such as circuit-breaker, bulkhead patterns so I was excited to read the second edition of the book when it came out. Here are a few tips from the book:

In the first chapter, the author defines stability in terms of robustness, i.e., “A robust system keeps processing transactions, even when transient impulses, persistent stresses, or component failures disrupt normal processing.” He recommends focusing on longevity bugs such as resource leaking, e.g. you can set timeout when invoking a network or database operation or detect dead connections before reading/writing. The author cautions against tightly coupled systems that can easily propagate failures to other parts of the system. Similarly, high number of integration points can increase probability of failure in one of those dependencies. The author suggests looking into the blocking calls that can lead to deadlocks when using multiple threads or scrutinizing resource pool that gets exhausted from blocking operations. Another cause of instability is chain reaction from failure of one of servers that increases load on the remaining servers, which can be remedied using bulkhead or circuit-breaker patterns. High memory usage on the server can also constrain resources and the author recommends using external caching systems such as Redis, memcache, etc. In order to monitor health of the system, the author suggests using a mock transaction to ensure it’s working as expected and keeping metrics on errors especially login errors, high latency warnings. One common self-inflicting failure can be caused by self-denial attacks by marketing campaign that can be mitigated by using shared-nothing architecture or reducing fan-in shared resources. The servers can also be subjected to dogpile effect, where resources after upgrade, cronjob or config change spike up that can be mitigated by using random clock skew, adding random jitter and using exponential backoff. Finally, the author recommends monitoring slow responses, removing unbounded result sets and failing fast.

Here is a summary of stability patterns that the author recommends:

  • Apply timeout with integration points and delayed retries when failure occurs.
  • Apply circuit-breaker to prevent cascading failures (along with timeout).
  • Apply bulk-head pattern to partition the system in the event of chain reaction failure.
  • Steady state
  • Data purging
  • Log files
  • Fail fast, restart fast and reintegrate
  • Let it crash with limited granularity, e.g. boundary of actor.
  • Supervision for monitoring, restarts, etc.
  • Shed Load
  • Back-pressure – queues must be finite for finite response time
  • Governor – create governor to slow rate of actions (from automated response) so that humans can review it.

Next few chapters focus on network, machines and processes for building and deploying the code. The author offers several mechanisms for scaling such as load balancing with DNS, using service registry for upgrading and fail-over, configuration, transparency, collecting logs and metrics, etc. The author recommends load shedding when under high load or use HTTP 503 to notify load balancer. For example, queue length can be calculated as: (max-wait-time / mean-processing-time + 1) * processing-threads * 1.5. You can use listen reject queue to return 503 error to prevent clients from reconnecting immediately. The control-panel chapter recommends tools for administration. It recommends postmortem template such as what happened, apologize, commit to improvement and emphasizes system failures (as opposed to human errors). The author recommends adding indicators such as traffic indicators, business transaction, users, resource pool health, database connection health, data consumption, integration point health, cache health, etc.

The security chapter offers standard best practices from OWASP such as using parameterized queries to protect against SQL injection, using high entropy random session-ids/storing cookies for exchanging session-ids to protect against session hijacking/fixation. In order to protect against XSS (when user’s input is rendered in HTML without escaping) by filtering input and escaping it when rendering. The author recommends using a random nonce and strict SameSite policy to protect against CSRF. Similarly, author recommends using the principle of least privilege, access control, etc. The admin tools can offer tools for resetting circuit breakers, adjust connection pool sizes, disabling specific outbound integrations, reloading configuration, stopping accepting load, toggling feature flags.

For ease of deployment, the author recommends automation, immutable infrastructure, continuous deployment, and rolling changes incrementally.

The author suggests several recommendations on process and organization such as OODA loop for fast learning, functional/autonomous teams, evolutionary architecture, asynchrony patterns, loose clustering and creating options for future.

Lastly, the author offers chaos engineering as a way to test resilience of your system using Simian army or writing your own chaos monkey. In the end, the new edition offers a few additional chapters on scaling, deployment, security, and chaos engineering and more war stories from author’s consulting work.

January 25, 2009

Review of Clean Code

Filed under: Design — admin @ 8:54 pm

I just finished Clean Code: A Handbook of Agile Software Craftsmanship book, which is a compilation of patterns on writing clean/maintainable code and is written by uncle Bob and other folks from his consulting company Object Mentors including Tim Ottinger, Michael Feathers (author of Working Effectively with Legacy Code. This book is similar to Implementation Patterns by Kent Beck that I recently read on similar coding practices. Though, this book is a lot thicker with seventeen chapters, however there are plenty of pages filled with tedious listing of Java code.

The first chapter shares thoughts on good code from a number innovators and authors such as Kent Beck, Bjarne Stroustrup, Grady Booch, Dave Thomas, etc. They mention various attributes of good code such as easy to read, efficient, DRY, focused, literate, minimal, error handling and warn of bad code that leads to messy code or broken windows mentality.

The chapter 2 talks about golden advice of using intention-revealing names to improve the readability and following principle of least surprise. This chapter denounces use of hungarian notation or member prefix such as _ or m_.

The chapter 3 describes functions and methods and advises to use small, cohesive functions. It suggests using one level of abstraction to facilitate reading code from top to bottom. It also recommends using polymorphic methods as opposed to switch statement, if-else or functions that take flag/boolean arguments. The chapter discourages functions with side effects or the one that create temporal coupling. This chapter describes an aged old advice of command query separation, though it skips its roots from Bertrand Meyer’s design by contract. Finally, this chapter urges use of exceptions as opposed to error codes.

The chapter 4 describes writing good comments that focus on intent and dissuades against redundant and misleading comments.

The chapter 5 illustrates use of good formatting such as indentation, horizontal, vertical spacing, etc.

The chapter 6 describes use of objects and data structures. It encompasses advice on polymorphism, law of demeter, encapsulation, DTO/value objects, etc.

The chapter 7 discusses error handling. Again it encourages use of exceptions rather than return codes or error codes. It prohibits use of checked exceptions as it violates open/closed principle. It also discourages returning or passing null and recommends exceptions or empty collection instead of returning null.

The chapter 8 explains defining boundaries between components.

The chapter 9 describes advice on unit testing such as TDD, keeping tests clean, one assert per test, single concept per test, etc. Though, such advice should be taken with caution as one assert per test may not capture a unit of testing properly.

The chapter 10 is similar to chapter 6 and contains recommendations on writing classes such as encapsulation, classes should be small, single responsibility principle, cohesion, dependency inversion principle, etc.

The chapter 11 encompasses advice on building systems and managing complexity. It suggests dependency injection and use of factories. It also advocates use of AOP for managing cross cutting concerns.

The chapter 12 talks about emergent design, the concept I first heard from Andy Hunt. This chapter describes advice from Kent Beck: run all tests, refactoring, no duplication, express intents of the programmer and minimize the number of classes and methods. This chapter advocates use of template methods to remove duplication and command/visitor patterns to express design to other developers.

The chapter 13 discusses concurrency and suggests use of single-responsbility principle to keep concurrent code separate from other code, limiting scope of data, keeping threads independent, and use of immutable oobjects or copies of data. It recommends keeping critical section small. The chapter also holds advice on testing threaded code and suggests making threaded code plugable.

The chapter 14 shows an excercise on how to incrementally improve code.

The chapter 15 describes JUnit framework and walks reader through improving tests.

The chapter 16 walks reader through another refactor exercise.

The chapter 17 covers a number of smells, heuristics and anti-patterns. It deters use of poorly written comments, builds/tests that require more than one step. The chapter prohibits passing too many arguments to functions or use of output/flag arguments to functions. It encourages testing boundary conditions and respecting overridden safeties. It also proscribes writing code at wrong level of abstraction, i.e., exposing low-level logic through interface. The chapter also forbids exposing derivatives to base classes, having too much information in interface. Other smells include artificial coupling (sharing constants), feature envy (coupling formatting logic), flag arguments. This chapter also holds advice of Kent Beck about using explanatory (local/temporary) variables. The chapter recommends use of constants, structure over convention, encapsulate conditionals, avoiding temporal couplings, keeping functions at same level of abstraction and keeping configurable data at high levels. The chapter also contains Java specific is advice such as avoiding wildcards in imports, avoiding use of interface to inherit constants, etc. Finally, the chapter cautions against insufficient tests, use of coverage tool, testing boundary conditions, etc.

In conclusion, this book contains advice from wide range of authors on writing good maintainable code. As a coder with over 15 years of experience, I can attest that writing good code requires a lot of micro decisions and detailed attention to details and you generally have to continually improve and refactor your code for good design. Sometime it’s hard to maintain a good design when delivering features under tight deadlines so you may have to take shortcuts. Kent often talks about courage in XP and programmers needs to have courage to fight for writing maintainable code and take time to refactor existing code. Finally, I would caution against using these practices arbitrarily as another favorite rule of mine is that every practice or pattern depends on the situation. Unfortunately, there are lot of people in software industry including uncle bob who use these rules in draconian fashion such as always stressing on 100% test coverage or interfaces separate from implementation (DIP). Such advice may bring more money from consulting work or publication, but is disingenuous for practical use in real projects.

Powered by WordPress