architecture « Shahzad Bhatti

March 24, 2025

K8 Highlander: Managing Stateful and Singleton Processes in Kubernetes

Filed under: Technology — Tags: architecture, distributed — admin @ 2:22 pm

Introduction

Kubernetes has revolutionized how we deploy, scale, and manage applications in the cloud. I’ve been using Kubernetes for many years to build scalable, resilient, and maintainable services. However, Kubernetes was primarily designed for stateless applications – services that can scale horizontally. While such shared-nothing architecture is must-have for most modern microservices but it presents challenges for use-cases such as:

Stateful/Singleton Processes: Applications that must run as a single instance across a cluster to avoid conflicts, race conditions, or data corruption. Examples include:
- Legacy applications not designed for distributed operation
- Batch processors that need exclusive access to resources
- Job schedulers that must ensure jobs run exactly once
- Applications with sequential ID generators
Active/Passive Disaster Recovery: High-availability setups where you need a primary instance running with hot standbys ready to take over instantly if the primary fails.

Traditional Kubernetes primitives like StatefulSets provide stable network identities and ordered deployment but don’t solve the “exactly-one-active” problem. DaemonSets ensure one pod per node, but don’t address the need for a single instance across the entire cluster. This gap led me to develop K8 Highlander – a solution that ensures “there can be only one” active instance of your workloads while maintaining high availability through automatic failover.

Architecture

K8 Highlander implements distributed leader election to ensure only one controller instance is active at any time, with others ready to take over if the leader fails. The name “Highlander” refers to the tagline from the 1980s movie & show: “There can be only one.”

Core Components

The system consists of several key components:

Leader Election: Uses distributed locking (via Redis or a database) to ensure only one controller is active at a time. The leader periodically renews its lock, and if it fails, another controller can acquire the lock and take over.
Workload Manager: Manages different types of workloads in Kubernetes, ensuring they’re running and healthy when this controller is the leader.
Monitoring Server: Provides real-time metrics and status information about the controller and its workloads.
HTTP Server: Serves a dashboard and API endpoints for monitoring and management.

How Leader Election Works

The leader election process follows these steps:

Each controller instance attempts to acquire a distributed lock with a TTL (Time-To-Live)
Only one instance succeeds and becomes the leader
The leader periodically renews its lock to maintain leadership
If the leader fails to renew (due to crash, network issues, etc.), the lock expires
Another instance acquires the lock and becomes the new leader
The new leader starts managing workloads

This approach ensures high availability while preventing split-brain scenarios where multiple instances might be active simultaneously.

Workload Types

K8 Highlander supports four types of workloads:

Process Workloads: Single-instance processes running in pods
CronJob Workloads: Scheduled tasks that run at specific intervals
Service Workloads: Continuously running services using Deployments
Persistent Workloads: Stateful applications with persistent storage using StatefulSets

Each workload type is managed to ensure exactly one instance is running across the cluster, with automatic recreation if terminated unexpectedly.

Deploying and Using K8 Highlander

Let me walk through how to deploy and use K8 Highlander for your singleton workloads.

Prerequisites

Kubernetes cluster (v1.16+)
Redis server or PostgreSQL database for leader state storage
kubectl configured to access your cluster

Installation Using Docker

The simplest way to install K8 Highlander is using the pre-built Docker image:

# Create a namespace for k8-highlander
kubectl create namespace k8-highlander

# Create a ConfigMap with your configuration
kubectl create configmap k8-highlander-config \
  --from-file=config.yaml=./config/config.yaml \
  -n k8-highlander

# Deploy k8-highlander
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: k8-highlander
  namespace: k8-highlander
spec:
  replicas: 2  # Run multiple instances for HA
  selector:
    matchLabels:
      app: k8-highlander
  template:
    metadata:
      labels:
        app: k8-highlander
    spec:
      containers:
      - name: controller
        image: plexobject/k8-highlander:latest
        env:
        - name: HIGHLANDER_REDIS_ADDR
          value: "redis:6379"
        - name: HIGHLANDER_TENANT
          value: "default"
        - name: HIGHLANDER_NAMESPACE
          value: "default"
        - name: CONFIG_PATH
          value: "/etc/k8-highlander/config.yaml"
        ports:
        - containerPort: 8080
          name: http
        volumeMounts:
        - name: config-volume
          mountPath: /etc/k8-highlander
      volumes:
      - name: config-volume
        configMap:
          name: k8-highlander-config
---
apiVersion: v1
kind: Service
metadata:
  name: k8-highlander
  namespace: k8-highlander
spec:
  selector:
    app: k8-highlander
  ports:
  - port: 8080
    targetPort: 8080
EOF

HTML
​x
 
# Create a namespace for k8-highlanderkubectl create namespace k8-highlander​# Create a ConfigMap with your configurationkubectl create configmap k8-highlander-config \  --from-file=config.yaml=./config/config.yaml \  -n k8-highlander​# Deploy k8-highlanderkubectl apply -f - <<EOFapiVersion: apps/v1kind: Deploymentmetadata:  name: k8-highlander  namespace: k8-highlanderspec:  replicas: 2  # Run multiple instances for HA  selector:    matchLabels:      app: k8-highlander  template:    metadata:      labels:        app: k8-highlander    spec:      containers:      - name: controller        image: plexobject/k8-highlander:latest        env:        - name: HIGHLANDER_REDIS_ADDR          value: "redis:6379"        - name: HIGHLANDER_TENANT          value: "default"        - name: HIGHLANDER_NAMESPACE          value: "default"        - name: CONFIG_PATH          value: "/etc/k8-highlander/config.yaml"        ports:        - containerPort: 8080          name: http        volumeMounts:        - name: config-volume          mountPath: /etc/k8-highlander      volumes:      - name: config-volume        configMap:          name: k8-highlander-config---apiVersion: v1kind: Servicemetadata:  name: k8-highlander  namespace: k8-highlanderspec:  selector:    app: k8-highlander  ports:  - port: 8080    targetPort: 8080EOF​

This deploys K8 Highlander with your configuration, ensuring high availability with multiple replicas while maintaining the singleton behavior for your workloads.

Using K8 Highlander Locally for Testing

You can also run K8 Highlander locally for testing:

docker run -d --name k8-highlander \
  -v $(pwd)/config.yaml:/etc/k8-highlander/config.yaml \
  -e HIGHLANDER_REDIS_ADDR=redis-host:6379 \
  -p 8080:8080 \
  plexobject/k8-highlander:latest

HTML
 
docker run -d --name k8-highlander \  -v $(pwd)/config.yaml:/etc/k8-highlander/config.yaml \  -e HIGHLANDER_REDIS_ADDR=redis-host:6379 \  -p 8080:8080 \  plexobject/k8-highlander:latest​

Basic Configuration

K8 Highlander uses a YAML configuration file to define its behavior and workloads. Here’s a simple example:

id: "controller-1"
tenant: "default"
port: 8080
namespace: "default"

# Storage configuration
storageType: "redis"
redis:
  addr: "redis:6379"
  password: ""
  db: 0

# Cluster configuration
cluster:
  name: "primary"
  kubeconfig: ""  # Uses in-cluster config if empty

# Workloads configuration
workloads:
  # Process workload example
  processes:
    - name: "data-processor"
      image: "mycompany/data-processor:latest"
      script:
        commands:
          - "echo 'Starting data processor'"
          - "/app/process-data.sh"
        shell: "/bin/sh"
      env:
        DB_HOST: "postgres.example.com"
      resources:
        cpuRequest: "200m"
        memoryRequest: "256Mi"
      restartPolicy: "OnFailure"

HTML
 
id: "controller-1"tenant: "default"port: 8080namespace: "default"​# Storage configurationstorageType: "redis"redis:  addr: "redis:6379"  password: ""  db: 0​# Cluster configurationcluster:  name: "primary"  kubeconfig: ""  # Uses in-cluster config if empty​# Workloads configurationworkloads:  # Process workload example  processes:    - name: "data-processor"      image: "mycompany/data-processor:latest"      script:        commands:          - "echo 'Starting data processor'"          - "/app/process-data.sh"        shell: "/bin/sh"      env:        DB_HOST: "postgres.example.com"      resources:        cpuRequest: "200m"        memoryRequest: "256Mi"      restartPolicy: "OnFailure"​

Example Workload Configurations

Let’s look at examples for each workload type:

Process Workload

Use this for single-instance processes that need to run continuously:

processes:
  - name: "sequential-id-generator"
    image: "mycompany/id-generator:latest"
    script:
      commands:
        - "echo 'Starting ID generator'"
        - "/app/run-id-generator.sh"
      shell: "/bin/sh"
    env:
      DB_HOST: "postgres.example.com"
    resources:
      cpuRequest: "200m"
      memoryRequest: "256Mi"
    restartPolicy: "OnFailure"

HTML
 
processes:  - name: "sequential-id-generator"    image: "mycompany/id-generator:latest"    script:      commands:        - "echo 'Starting ID generator'"        - "/app/run-id-generator.sh"      shell: "/bin/sh"    env:      DB_HOST: "postgres.example.com"    resources:      cpuRequest: "200m"      memoryRequest: "256Mi"    restartPolicy: "OnFailure"​

CronJob Workload

For scheduled tasks that should run exactly once at specified intervals:

cronJobs:
  - name: "daily-report"
    schedule: "0 0 * * *"  # Daily at midnight
    image: "mycompany/report-generator:latest"
    script:
      commands:
        - "echo 'Generating daily report'"
        - "/app/generate-report.sh"
      shell: "/bin/sh"
    env:
      REPORT_TYPE: "daily"
    restartPolicy: "OnFailure"

HTML
 
cronJobs:  - name: "daily-report"    schedule: "0 0 * * *"  # Daily at midnight    image: "mycompany/report-generator:latest"    script:      commands:        - "echo 'Generating daily report'"        - "/app/generate-report.sh"      shell: "/bin/sh"    env:      REPORT_TYPE: "daily"    restartPolicy: "OnFailure"​

Service Workload

For continuously running services that need to be singleton but highly available:

services:
  - name: "admin-api"
    image: "mycompany/admin-api:latest"
    replicas: 1
    ports:
      - name: "http"
        containerPort: 8080
        servicePort: 80
    env:
      LOG_LEVEL: "info"
    resources:
      cpuRequest: "100m"
      memoryRequest: "128Mi"

HTML
 
services:  - name: "admin-api"    image: "mycompany/admin-api:latest"    replicas: 1    ports:      - name: "http"        containerPort: 8080        servicePort: 80    env:      LOG_LEVEL: "info"    resources:      cpuRequest: "100m"      memoryRequest: "128Mi"​

Persistent Workload

For stateful applications with persistent storage:

persistentSets:
  - name: "message-queue"
    image: "ibmcom/mqadvanced-server:latest"
    replicas: 1
    ports:
      - name: "mq"
        containerPort: 1414
        servicePort: 1414
    persistentVolumes:
      - name: "data"
        mountPath: "/var/mqm"
        size: "10Gi"
    env:
      LICENSE: "accept"
      MQ_QMGR_NAME: "QM1"

HTML
 
persistentSets:  - name: "message-queue"    image: "ibmcom/mqadvanced-server:latest"    replicas: 1    ports:      - name: "mq"        containerPort: 1414        servicePort: 1414    persistentVolumes:      - name: "data"        mountPath: "/var/mqm"        size: "10Gi"    env:      LICENSE: "accept"      MQ_QMGR_NAME: "QM1"​

High Availability Setup

For production environments, run multiple instances of K8 Highlander to ensure high availability:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: k8-highlander
  namespace: k8-highlander
spec:
  replicas: 3  # Run multiple instances for HA
  selector:
    matchLabels:
      app: k8-highlander
  template:
    metadata:
      labels:
        app: k8-highlander
    spec:
      containers:
      - name: controller
        image: plexobject/k8-highlander:latest
        env:
        - name: HIGHLANDER_REDIS_ADDR
          value: "redis:6379"
        - name: HIGHLANDER_TENANT
          value: "production"

HTML
 
apiVersion: apps/v1kind: Deploymentmetadata:  name: k8-highlander  namespace: k8-highlanderspec:  replicas: 3  # Run multiple instances for HA  selector:    matchLabels:      app: k8-highlander  template:    metadata:      labels:        app: k8-highlander    spec:      containers:      - name: controller        image: plexobject/k8-highlander:latest        env:        - name: HIGHLANDER_REDIS_ADDR          value: "redis:6379"        - name: HIGHLANDER_TENANT          value: "production"​

API and Monitoring Capabilities

K8 Highlander provides comprehensive monitoring, metrics, and API endpoints for observability and management.

Dashboard

Access the built-in dashboard at http://<controller-address>:8080/ to see the status of the controller and its workloads in real-time.

The dashboard shows:

Current leader status
Workload health and status
Redis/database connectivity
Failover history
Resource usage

API Endpoints

K8 Highlander exposes several HTTP endpoints for monitoring and integration:

GET /status: Returns the current status of the controller
GET /api/workloads: Lists all managed workloads and their status
GET /api/workloads/{name}: Gets the status of a specific workload
GET /healthz: Liveness probe endpoint
GET /readyz: Readiness probe endpoint

Example API response:

{
  "status": "success",
  "data": {
    "isLeader": true,
    "leaderSince": "2023-05-01T12:34:56Z",
    "lastLeaderTransition": "2023-05-01T12:34:56Z",
    "uptime": "1h2m3s",
    "leaderID": "controller-1",
    "workloadStatus": {
      "processes": {
        "data-processor": {
          "active": true,
          "namespace": "default"
        }
      }
    }
  }
}

HTML
 
{  "status": "success",  "data": {    "isLeader": true,    "leaderSince": "2023-05-01T12:34:56Z",    "lastLeaderTransition": "2023-05-01T12:34:56Z",    "uptime": "1h2m3s",    "leaderID": "controller-1",    "workloadStatus": {      "processes": {        "data-processor": {          "active": true,          "namespace": "default"        }      }    }  }}​

Prometheus Metrics

K8 Highlander exposes Prometheus metrics at /metrics for monitoring and alerting:

# HELP k8_highlander_is_leader Indicates if this instance is currently the leader (1) or not (0)
# TYPE k8_highlander_is_leader gauge
k8_highlander_is_leader 1
# HELP k8_highlander_leadership_transitions_total Total number of leadership transitions
# TYPE k8_highlander_leadership_transitions_total counter
k8_highlander_leadership_transitions_total 1
# HELP k8_highlander_workload_status Status of managed workloads (1=active, 0=inactive)
# TYPE k8_highlander_workload_status gauge
k8_highlander_workload_status{name="data-processor",namespace="default",type="process"} 1

HTML
 
# HELP k8_highlander_is_leader Indicates if this instance is currently the leader (1) or not (0)# TYPE k8_highlander_is_leader gaugek8_highlander_is_leader 1# HELP k8_highlander_leadership_transitions_total Total number of leadership transitions# TYPE k8_highlander_leadership_transitions_total counterk8_highlander_leadership_transitions_total 1# HELP k8_highlander_workload_status Status of managed workloads (1=active, 0=inactive)# TYPE k8_highlander_workload_status gaugek8_highlander_workload_status{name="data-processor",namespace="default",type="process"} 1​

Key metrics include:

Leadership status and transitions
Workload health and status
Redis/database operations
Failover events and duration
System resource usage

Grafana Dashboard

A Grafana dashboard is available for visualizing K8 Highlander metrics. Import the dashboard from the dashboards directory in the repository.

Advanced Features

Multi-Tenant Support

K8 Highlander supports multi-tenant deployments, where different teams or environments can have their own isolated leader election and workload management:

# Tenant A configuration
id: "controller-1"
tenant: "tenant-a"
namespace: "tenant-a"

HTML
 
# Tenant A configurationid: "controller-1"tenant: "tenant-a"namespace: "tenant-a"​

# Tenant B configuration
id: "controller-2"
tenant: "tenant-b"
namespace: "tenant-b"

HTML
 
# Tenant B configurationid: "controller-2"tenant: "tenant-b"namespace: "tenant-b"​

Each tenant has its own leader election process, so one controller can be the leader for tenant A while another is the leader for tenant B.

Multi-Cluster Deployment

For disaster recovery scenarios, K8 Highlander can be deployed across multiple Kubernetes clusters with a shared Redis or database:

# Primary cluster
id: "controller-1"
tenant: "production"
cluster:
  name: "primary"
  kubeconfig: "/path/to/primary-kubeconfig"

HTML
 
# Primary clusterid: "controller-1"tenant: "production"cluster:  name: "primary"  kubeconfig: "/path/to/primary-kubeconfig"​

# Secondary cluster
id: "controller-2"
tenant: "production"
cluster:
  name: "secondary"
  kubeconfig: "/path/to/secondary-kubeconfig"

HTML
 
# Secondary clusterid: "controller-2"tenant: "production"cluster:  name: "secondary"  kubeconfig: "/path/to/secondary-kubeconfig"​

If the primary cluster fails, a controller in the secondary cluster can become the leader and take over workload management.

Summary

K8 Highlander fills a critical gap in Kubernetes’ capabilities by providing reliable singleton workload management with automatic failover. It’s ideal for:

Legacy applications that don’t support horizontal scaling
Processes that need exclusive access to resources
Scheduled jobs that should run exactly once
Active/passive high-availability setups

The solution ensures high availability without sacrificing the “exactly one active” constraint that many applications require. By handling the complexity of leader election and workload management, K8 Highlander allows you to run stateful workloads in Kubernetes with confidence.

Where to Go from Here

Check out the GitHub repository for the latest code and documentation
Read the API Reference for detailed endpoint information
Explore the Configuration Reference for all available options
Review the Metrics Reference for monitoring capabilities
See the Deployment Strategies for production deployment patterns

K8 Highlander is an open-source project with MIT license, and contributions are welcome! Feel free to submit issues, feature requests, or pull requests to help improve the project.

Comments (0)

April 19, 2024

Effective Load Shedding and Throttling Strategies for Managing Traffic Spikes and DDoS Attacks

Filed under: Design — Tags: architecture — admin @ 9:48 pm

Online services experiencing rapid growth often encounter abrupt surges in traffic and may become targets of Distributed Denial of Service (DDoS) attacks orchestrated by malicious actors or inadvertently due to self-induced bugs. Mitigating these challenges to ensure high availability requires meticulous architectural practices, including implementing caching mechanisms, leveraging Content Delivery Networks (CDNs), Web Application Firewalls (WAFs), deploying queuing systems, employing load balancing strategies, implementing robust monitoring and alerting systems, and incorporating autoscaling capabilities. However, in this context, we will focus specifically on techniques related to load shedding and throttling to manage various traffic shapes effectively.

1. Traffic Patterns and Shapes

Traffic patterns refer to the manner in which user requests or tasks interact with your online service throughout a given period. These requests or tasks can vary in characteristics, including the rate of requests (TPS), concurrency, and the patterns of request flow, such as bursts of traffic. These patterns must be analyzed for scaling your service effectively and providing high availability.

Here’s a breakdown of some common traffic shapes:

Normal Traffic: defines baseline level of traffic pattern that a service receives most of the time based on regular user activity.
Peak Traffic: defines recurring period of high traffic based on daily or weekly user activity patterns. Auto-scaling rules can be set up to automatically allocate pre-provisioned additional resources in response to anticipated peaks in traffic.
Off-Peak Traffic: refers to periods of low or minimal traffic, such as during late-night hours or weekends. Auto-scaling rules can be set to scale down or consolidating resources during periods of low demand help minimize operational costs while maintaining adequate performance levels.
Burst Traffic: defines sudden, short-lived spikes in traffic that might be caused by viral contents or promotional campaigns. Auto-scaling rules can be configured to allocate extra resources in reaction to burst traffic. However, scaling resources might not happen swiftly enough to match the duration of the burst traffic. Therefore, it’s typically recommended to maintain surplus capacity to effectively handle burst traffic situations.
Seasonal Traffic: defines traffic patterns based on specific seasons, holidays or events such as Black Friday or back-to-school periods. This requires strategies similar to peak traffic for allocating pre-provisioned additional resources.
Steady Growth: defines gradual and consistent increase in traffic over time based on organic growth or marketing campaigns. This requires proactive monitoring to ensure resources keep pace with demand.

Classifying Requests

Incoming requests or tasks can be identified and categorized based on various contextual factors, such as the identity of the requester, the specific operation being requested, or other relevant parameters. This classification enables the implementation of appropriate measures, such as throttling or load shedding policies, to manage the flow of requests effectively.

Additional Considerations:

Traffic Patterns Can Combine: Real-world traffic patterns are often a combination of these shapes, requiring flexible and adaptable scaling strategies.
Monitoring and Alerting: Continuously monitor traffic patterns to identify trends early and proactively adjust your scaling strategy. Set up alerts and notifications to inform about sudden traffic surges or potential DDoS attacks so you can take timely action.
Incident Response Plan: Develop a well-defined incident response plan that outlines the steps for communication protocols, mitigation strategies, engaging stakeholders, and recovery procedures.
Cost-Effectiveness: Balance scaling needs with cost optimization to avoid over-provisioning resources during low traffic periods.

2. Throttling and Rate Limiting

Throttling controls the rate of traffic flow or resource consumption within a system to prevent overload or degradation of service. Throttling enforces quota limits and protects system overload by limiting the amount of resources (CPU, memory, network bandwidth) a single user or client can consume within a specific time frame. Throttling ensures efficient resource utilization, allowing the service to handle more users in a predictable manner. This ensures better fairness and stability while preventing a noisy neighbor problem where unpredictable spikes or slowdowns caused by heavy users. Throttling can be implemented by API Rate Limiting on the number of API requests a client can make with a given time window; by limiting maximum bandwidth allowed for various network traffic; by limiting rate of read/write; or by limiting the number of concurrent connections for a server to prevent overload.

These throttling and rate limiting measures can be applied to both anonymous and authenticated requests as follows:

Anonymous Requests:
- Rate limiting: Implement rate limiting based on client IP addresses or other identifiers within a specific time window, preventing clients from overwhelming the system.
- Concurrency limits: Set limits on the maximum number of concurrent connections or requests that can be processed simultaneously.
- Server-side throttling: Apply throttling mechanisms at the server level, such as queue-based rate limiting or token bucket algorithms, to control the overall throughput of incoming requests.
Authenticated Requests:
- User-based rate limiting: Implement rate limiting based on user identities or API keys, ensuring that authenticated users cannot exceed specified request limits.
- Prioritized throttling: Apply different throttling rules or limits based on user roles, subscription tiers, or other criteria, allowing higher priority requests to be processed first during peak loads.
- Circuit breakers: Implement circuit breakers to temporarily disable or throttle load from specific services or components that are experiencing high latency or failures, preventing cascading failures.

2.1 Error Response and Headers

When a request exceeds the rate limit, the server typically returns a 429 HTTP status code indicating that the request has been throttled or rate-limited due to Too Many Requests. The server may also return HTTP headers such as Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Reset, and X-RateLimit-Resource.

3. Load Shedding

Load shedding is used to prioritize and manage system resources during periods of high demand or overload. It may discard or defer non-critical tasks or requests to ensure the continued operation of essential functions. Load shedding helps maintain system stability and prevents cascading failures by reallocating resources to handle the most critical tasks first. Common causes of unexpected events that require shedding to prevent overloading system resources include:

Traffic Spikes: sudden and significant increases in the volume of incoming traffic due to various reasons, such as viral content, marketing campaigns, sudden popularity, or events.
DDoS (Distributed Denial of Service): deliberate attempts to disrupt the normal functioning of a targeted server, service, or network by overwhelming it with a flood of traffic. A DDoS attack can be orchestrated by an attacker who commands a vast botnet comprising thousands of compromised devices, including computers, IoT devices, or servers. Additionally, misconfigurations, software bugs, or unforeseen interactions among system components such as excessive retries without exponential delays that can also lead to accidental DDoS attacks.

Here is how excessive load for anonymous and authenticated requests can be shed:

Anonymous Requests: Drop requests during extreme load conditions or when server capacity is reached, drop a percentage of incoming requests to protect the system from overload. This can be done randomly or based on specific criteria such as request types, and headers. Alternatively, service can degrade non-critical features or functionalities temporarily to reduce the overall system load and prioritize essential services.
Authenticated Requests: Apply load shedding rules based on user roles, subscription tiers, or other criteria, prioritizing requests from high-value users or critical services.

3.1 Error Response

When a request exceeds the rate limit, the server typically returns a 503 HTTP status code indicating that the request has been throttled or rate-limited due to Too Many Requests. The server may also return HTTP headers such as Retry-After, other headers specifically employed for throttling are less prevalent in the context of load shedding. Unlike throttling errors, which fall under user-errors with 4XX error codes, load shedding is categorized as a server error with 5XX error codes. Consequently, load shedding requires more aggressive monitoring and alerting compared to throttling errors. Throttling errors, on the other hand, can be considered expected behavior as a means to address noisy neighbor problems and maintain high availability.

4. Additional Techniques for Throttling and Load Shedding

Throttling, rate-limiting and load shedding measures described above can be used to handle high traffic and to prevent resource exhaustion in distributed systems. Here are common techniques that can be used to implement these measures:

Admission Control: Set up thresholds for maximum concurrent requests or resource utilization.
Request Classification and Prioritization: Classify requests based on priority, user type, or criticality and then dropping low-priority requests when the thresholds for capacity are exceeded.
Backpressure and Queue Management: Use a fixed-length queues to buffer incoming requests during high loads and applying back-pressure by rejecting requests when queues reach their limits.
Fault Isolation and Containment: Partition the system into isolated components or cells to limit the blast radius of failures.
Redundancy and Failover: Build redundancy into your infrastructure and implement failover mechanisms to ensure that your services remain available even if parts of your infrastructure are overwhelmed.
Simplicity and Modularity: Design systems with simple, modular components that can be easily understood, maintained, and replaced. Avoid complex dependencies and tight coupling between components.
Circuit Breaker: Monitor the health and performance of downstream services or components and stop forwarding requests if a service is overloaded or unresponsive. Periodically attempt to re-establish the connection (close the circuit breaker).
Noisy Neighbors: Throttle and apply rate limits to customer traffic to prevent them from consuming resources excessively, thereby ensuring fair access for all customers.
Capacity Planning and Scaling: Continuously monitor resource utilization and plan for capacity growth. Implement auto-scaling mechanisms to dynamically adjust resources based on demand.
Communication Optimization: Employ communication optimization techniques like compression, quantization to minimize network traffic and bandwidth requirements.
Privacy and Security Considerations: Incorporate privacy-preserving mechanisms like secure aggregation, differential privacy, and secure multi-party computation to ensure data privacy and model confidentiality.
Graceful Degradation: Identify and disable non-critical features or functionality during high loads.
Monitoring and Alerting: Monitor system metrics (CPU, memory, request rates, latency, etc.) to detect overload scenarios and sending alerts when thresholds are exceeded.
Defense in Depth: Implement multi-layered defense strategy to detect, mitigate, and protect customer workloads from malicious attacks, like blacklisting IP addresses or employing Geo-location filters, at the Edge Layer using CDN, Load Balancer, or API Gateway. Constrain network bandwidth and requests per second (RPS) for individual tenants at the Network Layer. Applying resource quota, prioritization and admission control at the Application Layer based on account information, request attributes and system metrics. Isolating tenants’ data in separate partitions at the Storage Layer. Each dependent service may use similar multi-layered defense to throttle based on the usage patterns and resource constraints.
Adaptive Scaling: Automatically scale resources up or down based on demand and multi-tenant fairness policies. Employ predictive auto-scaling or load-based scaling.
Fault Tolerance and Checkpointing: Incorporate fault tolerance mechanisms, redundant computation and checkpointing to ensure reliable and resilient task processing in the face of potential resource failures. The fault tolerance mechanisms can be used to handle potential failures or stragglers (slow or unresponsive devices).
Web Application Firewall (WAF): Inspects incoming traffic and blocks malicious requests, including DDoS attacks, based on predefined rules and patterns.
Load Balancing: By distributing incoming traffic across multiple servers or instances, load balancing helps prevent any single server from becoming overwhelmed.
Content Delivery Network (CDN): Distribute your content across multiple geographic locations, reducing the strain on your origin servers.
Cost-Aware Scaling: Implements a cost-aware scaling strategy like like cost modeling and performance prediction that considers the cost of different resource types.
Security Mechanisms: Incorporate various security mechanisms such as secure communication channels, code integrity verification, and runtime security monitoring to protect against potential vulnerabilities and attacks in multi-tenant environments.
SOPs and Run books: Develop well-defined procedures that outlines the steps for detecting traffic spikes, pinpointing source of malicious attack, analyzing the logs and monitoring metrics, mitigation strategies, engaging stakeholders, and recovery procedures.

5. Pitfalls with Use of Throttling and Load Shedding

Here are some potential challenges to consider when implementing throttling and load shedding:

Autoscaling Failures: If your throttling policies are too aggressive, they may prevent your application from generating enough load to trigger autoscaling policies. This can lead to under-provisioning of resources and performance degradation. Conversely, if your throttling policies are too lenient, your application may scale up unnecessarily, leading to overspending.
Load Balancer Health Checks: Some load balancers use synthetic health checks to determine the health of backend instances. If your throttling policies block these health checks, it can cause instances to be marked as unhealthy and removed from the load balancer, even though they are still capable of serving traffic.
Unhealthy Instance Overload: When instances are marked as unhealthy by a load balancer, the remaining healthy instances may become overloaded if throttling policies are not properly configured. This can lead to a cascading failure scenario where more and more instances are marked as unhealthy due to the increased load.
Sticky Sessions: If your application uses sticky sessions (session affinity) for user sessions, and your throttling policies are not consistently applied across all instances, it can lead to inconsistent user experiences or session loss.
Cache Invalidation: Aggressive throttling or load shedding policies can lead to more frequent cache invalidations, which can impact performance and increase the load on your backend systems.
Upstream Service Overload: If your application relies on upstream services or APIs, and your throttling policies are not properly coordinated with those services, you may end up overloading those systems and causing cascading failures.
Insufficient capacity of the Failover: The failover servers must possess adequate capacity to manage the entire expected traffic load from the primary servers.
Monitoring Challenges: Throttling and load shedding policies can make it more difficult to monitor and troubleshoot performance issues, as the metrics you’re observing may be skewed by the throttling mechanisms.
Delays in Updating Throttling Policies: The policy adjustments for throttling and load shedding should be capable of updating at runtime swiftly to adapt to various traffic patterns..
Balancing Load based on number of connections: When directing incoming traffic based on the host with the least number of connections, there’s a risk of unhealthy hosts will have fewer connections due to their quick error responses. Consequently, the load balancer may direct more traffic towards these hosts, resulting in a majority of requests failing. To counteract this, it’s essential to employ robust Layer 7 health checks that comprehensively assess the application’s functionality and dependencies. Layer 4 health checks, which are susceptible to false positives, should be avoided. The unhealthy host should be removed from the available pool as quickly as possible. Additionally, ensuring that error responses from the service have similar latency to successful responses can serve as another effective mitigation strategy.

To mitigate these issues, it’s essential to carefully coordinate your throttling and load shedding policies with the autoscaling, load balancing, caching, and monitoring strategies. This may involve tuning thresholds, implementing consistent policies across all components, and closely monitoring the interaction between these systems. Additionally, it’s crucial to thoroughly test your configurations under various load conditions to identify and address potential issues before they impact your production environment.

6. Monitoring Metrics and Notifications

Here are some common metrics and alarms to consider for throttling and load shedding:

6.1 Network Traffic Metrics:

Incoming/Outgoing Bandwidth: Monitor the total network bandwidth to detect abnormal traffic patterns.
Packets per Second (PPS): Track the number of packets processed per second to identify potential DDoS attacks or traffic bursts.
Connections per Second: Monitor the rate of new connections being established to detect potential connection exhaustion or DDoS attacks.

6.2 Application Metrics:

Request Rate: Track the number of requests per second to identify traffic spikes or bursts.
Error Rate: Monitor the rate of errors or failed requests, which can indicate overloading or application issues.
Response Time: Measure the application’s response time to detect performance degradation or latency issues.
Queue Saturation: Monitor the lengths of queues or buffers to identify potential bottlenecks or resource exhaustion.

6.3 System Metrics:

CPU Utilization: Monitor CPU usage to detect resource contention or overloading.
Memory Utilization: Track memory usage to identify potential memory leaks or resource exhaustion.
Disk I/O: Monitor disk read/write operations to detect storage bottlenecks or performance issues.

6.4 Load Balancer Metrics:

Active Connections: Monitor the number of active connections to the load balancer to identify potential connection exhaustion.
Unhealthy Hosts: Track the number of unhealthy or unresponsive hosts to ensure load balancing efficiency.
Request/Response Errors: Monitor errors related to requests or responses to identify issues with backend services.

6.5 Alarms and Notifications:

Set up alarms for critical metrics, such as high CPU utilization, memory exhaustion or excessive error rates. For example, send alarms when error rate > 5% or response code of 5XX for consecutive 5 seconds or data points.
Set up alarms for high latency, e.g., P90 latency exceeds 50ms for more than 30 seconds.
Establish fine-grained alarms for detecting breaches in customer service level agreements (SLAs). Configure the alarm thresholds to trigger below the customer SLAs and ensure they can identify the affected customers.

6.6 Autoscaling Policies:

CPU Utilization-based Scaling: Scale out or in based on CPU usage thresholds to handle traffic bursts or DDoS attacks.
Memory Utilization-based Scaling: Scale resources based on memory usage to prevent memory exhaustion.
Network Traffic-based Scaling: Scale resources based on incoming or outgoing network traffic patterns to handle traffic spikes.
Request Rate-based Scaling: Scale resources based on the rate of incoming requests to maintain optimal performance.

6.7 Throttling / Load Shedding Overhead:

Monitor the processing time for throttling and load shedding, accounting for any communication overhead if the target host is unhealthy. Keep track of the time to ascertain priority, identify delays in processing, and ensure that high delays only impact denied requests.
Monitor the system’s utilization and identify when it reaches its capacity.
Monitor the observed target throughput at the time of the request.
Monitor the time taken to determine if load shedding is necessary and track when the percentage of denied traffic exceeds X% of incoming traffic.

It’s essential to tailor these metrics and alarms to your specific application, infrastructure, and traffic patterns.

7. Summary

Throttling and Load Shedding offer effective means for managing traffic for online services to maintain high availability. Traffic patterns may vary in characteristics like rate of requests, concurrency, and flow patterns. Understanding these shapes, including normal, peak, off-peak, burst, and seasonal traffic, is crucial for scaling and ensuring high availability. Requests can be classified based on contextual factors, enabling appropriate measures such as throttling or load shedding.

Throttling manages traffic flow or resource usage to avoid overload, whereas load shedding prioritizes tasks during periods of high demand. These methods can complement other strategies such as admission control, request classification, backpressure management, and redundancy. However, their implementation requires careful monitoring, notification, and thorough testing to ensure effectiveness.

8. References

Comments (0)

March 26, 2023

Elegant Implementation Patterns

Filed under: Computing,Languages — Tags: architecture — admin @ 7:47 pm

Patterns are typical solutions to common problems in various phases of the software development lifecycle and you may find many books and resources on various types of patterns such as:

Analysis patterns
Design patterns
Integration patterns
Architecture, API, Cloud and other kind of patterns

However, you may also find low-level implementation patterns that developers often apply to common coding problems. Following are a few of these coding patterns that I have found particularly fascinating and pragmatic in my experience:

Functional Options Pattern

The options pattern is used to pass configuration options to a method, e.g. you can pass a struct that holds configuration settings. These configuration properties may have default values that are used when they are not explicitly defined. You may implement options patter using builder pattern to initialize config properties but it requires explicitly building the configuration options even when there is nothing to override. In addition, error handling with the builder pattern poses more complexity when chaining methods. The functional options pattern on the other hand defines each configuration option as a function (referenced in Functional Options in Go and 100 Go Mistakes and How to Avoid), which can validate the configuration option and return an error for invalid data. For example:

type options struct {
  port *int
  timeout *time.Duration
}
type Option func(opts *options) error

func WithPort(port int) Option {
  return func(opts *options) error {
    if port < 0 {
      return errors.New("port shold be positive")
    }
    options.port = &port
    return nil
  }
}

func NewServer(addr string, opts ...Option) (*http.Server, error) {
  var options options
  for _, opt := range opts {
    err := opt(&options)
    if err != nl {
      return nil, err
    }
  }
  var port int
  if options.port == nil {
    port = defaultHTTPPort
  } else {
    if *options.port == 0 {
      port = randomPort()
    } else {
      port = *options.port
    }
  }
  ...
}

Go
 
type options struct {  port *int  timeout *time.Duration}type Option func(opts *options) error​func WithPort(port int) Option {  return func(opts *options) error {    if port < 0 {      return errors.New("port shold be positive")    }    options.port = &port    return nil  }}​func NewServer(addr string, opts ...Option) (*http.Server, error) {  var options options  for _, opt := range opts {    err := opt(&options)    if err != nl {      return nil, err    }  }  var port int  if options.port == nil {    port = defaultHTTPPort  } else {    if *options.port == 0 {      port = randomPort()    } else {      port = *options.port    }  }  ...}​

You can then pass configuration options as follows:

server, err := NewServer(
  WithPort(8080),
  WithTimeout(time.Second)
  )

Go
 
server, err := NewServer(  WithPort(8080),  WithTimeout(time.Second)  )  

Above solution allows handling errors when overriding default values fails and passing empty list of options. In other languages where errors can be implicitly passed to the calling code may still use builder pattern for configuration such as:

const DefaultHttpPort: i32 = 8080;

#[derive(Debug, PartialEq)]
struct Options {
   port: i32,
   timeout: i32,
}

#[derive(Debug)]
pub enum OptionsError {
    Validation(String),
}

struct OptionsBuilder {
   port: Option<i32>,
   timeout: Option<i32>,
}

impl OptionsBuilder {
    fn new() -> Self {
        OptionsBuilder {
            port: Some(DefaultHttpPort),
            timeout: Some(1000),
        }
    }

    pub fn with_port(mut self, port: i32) -> Result<Self, OptionsError> {
        if port < 0 {
            return Err(OptionsError::Validation("port shold be positive".to_string()));
        }
        if port == 0 {
           self.port = Some(randomPort());
        } else {
           self.port = Some(port);
        }
        Ok(self)
    }

    pub fn with_timeout(mut self, timeout: i32) -> Result<Self, OptionsError> {
        if timeout <= 0 {
            return Err(OptionsError::Validation("timeout shold be positive".to_string()));
        }
        self.timeout = Some(timeout);
        Ok(self)
    }

    pub fn build(self) -> Options {
        Options { port: self.port.unwrap(), timeout: self.port.unwrap() }
    }
}

fn new_server(addr: &str, opts: Options) -> Result<Server, OptionsError> {
    Ok(Server::new(addr, opts.port, opts.timeout))
}

fn main() -> Result<(), OptionsError> {
    let _ = new_server("127.0.0.1", OptionsBuilder::new().build());
    let _ = new_server("127.0.0.1", OptionsBuilder::new().with_port(8000)?.with_timeout(2000)?.build());
    Ok(())
}

Rust
 
const DefaultHttpPort: i32 = 8080;​#[derive(Debug, PartialEq)]struct Options {   port: i32,   timeout: i32,}​#[derive(Debug)]pub enum OptionsError {    Validation(String),}​struct OptionsBuilder {   port: Option<i32>,   timeout: Option<i32>,}​impl OptionsBuilder {    fn new() -> Self {        OptionsBuilder {            port: Some(DefaultHttpPort),            timeout: Some(1000),        }    }​    pub fn with_port(mut self, port: i32) -> Result<Self, OptionsError> {        if port < 0 {            return Err(OptionsError::Validation("port shold be positive".to_string()));        }        if port == 0 {           self.port = Some(randomPort());        } else {           self.port = Some(port);        }        Ok(self)    }​    pub fn with_timeout(mut self, timeout: i32) -> Result<Self, OptionsError> {        if timeout <= 0 {            return Err(OptionsError::Validation("timeout shold be positive".to_string()));        }        self.timeout = Some(timeout);        Ok(self)    }​    pub fn build(self) -> Options {        Options { port: self.port.unwrap(), timeout: self.port.unwrap() }    }}​fn new_server(addr: &str, opts: Options) -> Result<Server, OptionsError> {    Ok(Server::new(addr, opts.port, opts.timeout))}​fn main() -> Result<(), OptionsError> {    let _ = new_server("127.0.0.1", OptionsBuilder::new().build());    let _ = new_server("127.0.0.1", OptionsBuilder::new().with_port(8000)?.with_timeout(2000)?.build());    Ok(())}

However, above solution still requires building config options even when no properties are overridden.

State Pattern with Enum

The state pattern is part of GoF design patterns, which is used to implement finite-state machines or strategy pattern. The state pattern can be easily implemented using “sum” (alternative) algebraic data types with use of union or enums constructs. For example, here is an implementation of state pattern in Rust:

use std::{error::Error, fmt};

#[derive(Debug)]
struct JobError {
    reason: String,
}

impl Error for JobError {}

impl fmt::Display for JobError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "state error {}", self.reason)
    }
}

enum JobState {
    Pending { timeout: std::time::Duration },
    Executing { percentage_completed: f32 },
    Completed { completion_time: std::time::Duration },
    Failed { cause: JobError },
}

struct JobStateMachine {
    state: JobState,
}

impl JobStateMachine {
    fn new(timeout: std::time::Duration) -> Self {
        JobStateMachine {
            state: JobState::Pending { timeout }
        }
    }
    fn to_executing(&mut self) {
        self.state = match self.state {
            JobState::Pending { .. } => JobState::Executing { percentage_completed: 0.0 },
            _ => panic!("Invalid state transition!"),
        }
    }
    fn to_succeeded(&mut self, completion_time: std::time::Duration) {
        self.state = match self.state {
            JobState::Executing { .. } => JobState::Completed { completion_time: completion_time },
            _ => panic!("Invalid state transition!"),
        }
    }
    // ...
}

fn main() {
    let mut job_state_machine = JobStateMachine::new(std::time::Duration::new(1000, 0));
    job_state_machine.to_executing();
}

Rust
 
use std::{error::Error, fmt};​#[derive(Debug)]struct JobError {    reason: String,}​impl Error for JobError {}​impl fmt::Display for JobError {    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {        write!(f, "state error {}", self.reason)    }}​enum JobState {    Pending { timeout: std::time::Duration },    Executing { percentage_completed: f32 },    Completed { completion_time: std::time::Duration },    Failed { cause: JobError },}​struct JobStateMachine {    state: JobState,}​impl JobStateMachine {    fn new(timeout: std::time::Duration) -> Self {        JobStateMachine {            state: JobState::Pending { timeout }        }    }    fn to_executing(&mut self) {        self.state = match self.state {            JobState::Pending { .. } => JobState::Executing { percentage_completed: 0.0 },            _ => panic!("Invalid state transition!"),        }    }    fn to_succeeded(&mut self, completion_time: std::time::Duration) {        self.state = match self.state {            JobState::Executing { .. } => JobState::Completed { completion_time: completion_time },            _ => panic!("Invalid state transition!"),        }    }    // ...}​fn main() {    let mut job_state_machine = JobStateMachine::new(std::time::Duration::new(1000, 0));    job_state_machine.to_executing();}

However, above implementation relies on runtime to validate state transitions. Alternatively, you can use struct to check valid transitions at compile time, e.g.,

struct Pending {
    timeout: std::time::Duration,
}

impl Pending {
    fn new(timeout: std::time::Duration) -> Self {
        Pending { timeout }
    }

    fn to_executing(self) -> Executing {
        Executing::new()
    }
}

struct Executing {
    percentage_completed: f32,
}

impl Executing {
    fn new() -> Self {
        Executing { percentage_completed: 0.0 }
    }

    fn to_succeeded(self, completion_time: std::time::Duration) -> Executing {
        Executing { percentage_completed: 0.0 }
    }
}

struct Succeeded {
    completion_time: std::time::Duration,
}

impl Succeeded {
    fn new(completion_time: std::time::Duration) -> Self {
        Succeeded { completion_time }
    }
}

// ...

fn main() {
    let pending = Pending::new(std::time::Duration::new(1000, 0));
    let executing = pending.to_executing();
}

Rust
 
struct Pending {    timeout: std::time::Duration,}​impl Pending {    fn new(timeout: std::time::Duration) -> Self {        Pending { timeout }    }​    fn to_executing(self) -> Executing {        Executing::new()    }}​struct Executing {    percentage_completed: f32,}​impl Executing {    fn new() -> Self {        Executing { percentage_completed: 0.0 }    }​    fn to_succeeded(self, completion_time: std::time::Duration) -> Executing {        Executing { percentage_completed: 0.0 }    }}​struct Succeeded {    completion_time: std::time::Duration,}​impl Succeeded {    fn new(completion_time: std::time::Duration) -> Self {        Succeeded { completion_time }    }}​// ...​fn main() {    let pending = Pending::new(std::time::Duration::new(1000, 0));    let executing = pending.to_executing();}

Tail Recursion with Trampolines and Thunks

The recursion uses divide and conquer to solve complex problems where a function calls itself to break down a problem into smaller problems. However, each recursion requires adding stack frame to the call stack so many functional languages converts recursive implementation into an iterative solution by eliminating tail-call where recursive call is the final action of a function. In languages that don’t support tail-call optimization, you can use thunks and trampolines to implement it. A thunk is a no-argument function that is evaluated lazily, which in turn may produce another thunk for next function call. Trampolines define a Computation data structure to return result of a computation. For example, following code illustrates an implementation of a Trampoline in Rust:

trait FnThunk {
    type Out;
    fn call(self: Box<Self>) -> Self::Out;
}

pub struct Thunk<'a, T> {
    fun: Box<dyn FnThunk<Out=T> + 'a>,
}

impl<T, F> FnThunk for F where F: FnOnce() -> T {
    type Out = T;
    fn call(self: Box<Self>) -> T { (*self)() }
}

impl<'a, T> Thunk<'a, T> {
    pub fn new(fun: impl FnOnce() -> T + 'a) -> Self {
        Self { fun: Box::new(fun) }
    }
    pub fn compute(self) -> T {
        self.fun.call()
    }
}

pub enum Computation<'a, T> {
    Done(T),
    Call(Thunk<'a, Computation<'a, T>>),
}

pub fn compute<T>(mut res: Computation<T>) -> T {
    loop {
        match res {
            Computation::Done(x) => break x,
            Computation::Call(thunk) => res = thunk.compute(),
        }
    }
}

fn factorial(n: u128) -> u128 {
    fn fac_with_acc(n: u128, acc: u128) -> Computation<'static, u128> {
        if n > 1 {
            Computation::Call(Thunk::new(move || fac_with_acc(n-1, acc * n)))
        } else {
            Computation::Done(acc)
        }
    }
    compute(fac_with_acc(n, 1))
}

fn main() {
    println!("factorial result {}", factorial(5));
}

Rust
 
trait FnThunk {    type Out;    fn call(self: Box<Self>) -> Self::Out;}​pub struct Thunk<'a, T> {    fun: Box<dyn FnThunk<Out=T> + 'a>,}​impl<T, F> FnThunk for F where F: FnOnce() -> T {    type Out = T;    fn call(self: Box<Self>) -> T { (*self)() }}​impl<'a, T> Thunk<'a, T> {    pub fn new(fun: impl FnOnce() -> T + 'a) -> Self {        Self { fun: Box::new(fun) }    }    pub fn compute(self) -> T {        self.fun.call()    }}​pub enum Computation<'a, T> {    Done(T),    Call(Thunk<'a, Computation<'a, T>>),}​pub fn compute<T>(mut res: Computation<T>) -> T {    loop {        match res {            Computation::Done(x) => break x,            Computation::Call(thunk) => res = thunk.compute(),        }    }}​fn factorial(n: u128) -> u128 {    fn fac_with_acc(n: u128, acc: u128) -> Computation<'static, u128> {        if n > 1 {            Computation::Call(Thunk::new(move || fac_with_acc(n-1, acc * n)))        } else {            Computation::Done(acc)        }    }    compute(fac_with_acc(n, 1))}​fn main() {    println!("factorial result {}", factorial(5));}

Memoization

The memoization allows caching results of expensive function calls so that repeated invocation of the same function returns the cached results when the same input is used. It can be implemented using thunk pattern described above. For example, following implementation shows a Rust based implementation:

use std::borrow::Borrow;
use std::marker::PhantomData;
use std::ops::{Deref, DerefMut};


enum Memoized<I: 'static, O: Clone, Func: Fn(I) -> O> {
    UnInitialized(PhantomData<&'static I>, Box<Func>),
    Processed(O),
}

impl<I: 'static, O: Clone, Func: Fn(I) -> O> Memoized<I, O, Func> {
    fn new(lambda: Func) -> Memoized<I, O, Func> {
        Memoized::UnInitialized(PhantomData, Box::new(lambda))
    }
    fn fetch(&mut self, data: I) -> O {
        let (flag, val) = match self {
            &mut Memoized::Processed(ref x) => (false, x.clone()),
            &mut Memoized::UnInitialized(_, ref z) => (true, z(data))
        };
        if flag {
            *self = Memoized::Processed(val.clone());
        }
        val
    }
    fn is_initialized(&self) -> bool {
        match self {
            &Memoized::Processed(_) => true,
            _ => false
        }
    }
}

impl<I: 'static, O: Clone, Func: Fn(I) -> O> Deref for Memoized<I, O, Func> {
    type Target = O;
    fn deref(&self) -> &Self::Target {
        match self {
            &Memoized::Processed(ref x) => x,
            _ => panic!("Attempted to derefence uninitalized memoized value")
        }
    }
}

impl<I: 'static, O: Clone, Func: Fn(I) -> O> DerefMut for Memoized<I, O, Func> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        //self.get()
        if self.is_initialized() {
            match self {
                &mut Memoized::Processed(ref mut x) => return x,
                _ => unreachable!()
            };
        } else {
            *self = Memoized::Processed(unsafe { std::mem::zeroed() });
            match self {
                &mut Memoized::Processed(ref mut x) => return x,
                _ => unreachable!()
            };
        }
    }
}

impl<I: 'static, O: Clone, Func: Fn(I) -> O> Borrow<O> for Memoized<I, O, Func> {
    fn borrow(&self) -> &O {
        match self {
            &Memoized::Processed(ref x) => x,
            _ => panic!("Attempted to borrow uninitalized memoized value")
        }
    }
}


enum Memoized<I: 'static, O: Clone, Func: Fn(I) -> O> {
    UnInitialized(PhantomData<&'static I>, Box<Func>),
    Processed(O),
}

impl<I: 'static, O: Clone, Func: Fn(I) -> O> Memoized<I, O, Func> {
    fn new(lambda: Func) -> Memoized<I, O, Func> {
        Memoized::UnInitialized(PhantomData, Box::new(lambda))
    }
    fn fetch(&mut self, data: I) -> O {
        let (flag, val) = match self {
            &mut Memoized::Processed(ref x) => (false, x.clone()),
            &mut Memoized::UnInitialized(_, ref z) => (true, z(data))
        };
        if flag {
            *self = Memoized::Processed(val.clone());
        }
        val
    }
    fn is_initialized(&self) -> bool {
        match self {
            &Memoized::Processed(_) => true,
            _ => false
        }
    }
}

impl<I: 'static, O: Clone, Func: Fn(I) -> O> Deref for Memoized<I, O, Func> {
    type Target = O;
    fn deref(&self) -> &Self::Target {
        match self {
            &Memoized::Processed(ref x) => x,
            _ => panic!("Attempted to derefence uninitalized memoized value")
        }
    }
}

impl<I: 'static, O: Clone, Func: Fn(I) -> O> DerefMut for Memoized<I, O, Func> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        //self.get()
        if self.is_initialized() {
            match self {
                &mut Memoized::Processed(ref mut x) => return x,
                _ => unreachable!()
            };
        } else {
            *self = Memoized::Processed(unsafe { std::mem::zeroed() });
            match self {
                &mut Memoized::Processed(ref mut x) => return x,
                _ => unreachable!()
            };
        }
    }
}

impl<I: 'static, O: Clone, Func: Fn(I) -> O> Borrow<O> for Memoized<I, O, Func> {
    fn borrow(&self) -> &O {
        match self {
            &Memoized::Processed(ref x) => x,
            _ => panic!("Attempted to borrow uninitalized memoized value")
        }
    }
}


mod test {
    use super::Memoized;

    #[test]
    fn test_memoized() {
        let lambda = |x: i32| -> String {
            x.to_string()
        };
        let mut dut = Memoized::new(lambda);
        assert_eq!(dut.is_initialized(), false);
        assert_eq!(&dut.fetch(5), "5");
        assert_eq!(dut.is_initialized(), true);
        assert_eq!(&dut.fetch(2000), "5");
        let x: &str = &dut;
        assert_eq!(x, "5");
    }
}

Rust
 
use std::borrow::Borrow;use std::marker::PhantomData;use std::ops::{Deref, DerefMut};​​enum Memoized<I: 'static, O: Clone, Func: Fn(I) -> O> {    UnInitialized(PhantomData<&'static I>, Box<Func>),    Processed(O),}​impl<I: 'static, O: Clone, Func: Fn(I) -> O> Memoized<I, O, Func> {    fn new(lambda: Func) -> Memoized<I, O, Func> {        Memoized::UnInitialized(PhantomData, Box::new(lambda))    }    fn fetch(&mut self, data: I) -> O {        let (flag, val) = match self {            &mut Memoized::Processed(ref x) => (false, x.clone()),            &mut Memoized::UnInitialized(_, ref z) => (true, z(data))        };        if flag {            *self = Memoized::Processed(val.clone());        }        val    }    fn is_initialized(&self) -> bool {        match self {            &Memoized::Processed(_) => true,            _ => false        }    }}​impl<I: 'static, O: Clone, Func: Fn(I) -> O> Deref for Memoized<I, O, Func> {    type Target = O;    fn deref(&self) -> &Self::Target {        match self {            &Memoized::Processed(ref x) => x,            _ => panic!("Attempted to derefence uninitalized memoized value")        }    }}​impl<I: 'static, O: Clone, Func: Fn(I) -> O> DerefMut for Memoized<I, O, Func> {    fn deref_mut(&mut self) -> &mut Self::Target {        //self.get()        if self.is_initialized() {            match self {                &mut Memoized::Processed(ref mut x) => return x,                _ => unreachable!()            };        } else {            *self = Memoized::Processed(unsafe { std::mem::zeroed() });            match self {                &mut Memoized::Processed(ref mut x) => return x,                _ => unreachable!()            };        }    }}​impl<I: 'static, O: Clone, Func: Fn(I) -> O> Borrow<O> for Memoized<I, O, Func> {    fn borrow(&self) -> &O {        match self {            &Memoized::Processed(ref x) => x,            _ => panic!("Attempted to borrow uninitalized memoized value")        }    }}​​enum Memoized<I: 'static, O: Clone, Func: Fn(I) -> O> {    UnInitialized(PhantomData<&'static I>, Box<Func>),    Processed(O),}​impl<I: 'static, O: Clone, Func: Fn(I) -> O> Memoized<I, O, Func> {    fn new(lambda: Func) -> Memoized<I, O, Func> {        Memoized::UnInitialized(PhantomData, Box::new(lambda))    }    fn fetch(&mut self, data: I) -> O {        let (flag, val) = match self {            &mut Memoized::Processed(ref x) => (false, x.clone()),            &mut Memoized::UnInitialized(_, ref z) => (true, z(data))        };        if flag {            *self = Memoized::Processed(val.clone());        }        val    }    fn is_initialized(&self) -> bool {        match self {            &Memoized::Processed(_) => true,            _ => false        }    }}​impl<I: 'static, O: Clone, Func: Fn(I) -> O> Deref for Memoized<I, O, Func> {    type Target = O;    fn deref(&self) -> &Self::Target {        match self {            &Memoized::Processed(ref x) => x,            _ => panic!("Attempted to derefence uninitalized memoized value")        }    }}​impl<I: 'static, O: Clone, Func: Fn(I) -> O> DerefMut for Memoized<I, O, Func> {    fn deref_mut(&mut self) -> &mut Self::Target {        //self.get()        if self.is_initialized() {            match self {                &mut Memoized::Processed(ref mut x) => return x,                _ => unreachable!()            };        } else {            *self = Memoized::Processed(unsafe { std::mem::zeroed() });            match self {                &mut Memoized::Processed(ref mut x) => return x,                _ => unreachable!()            };        }    }}​impl<I: 'static, O: Clone, Func: Fn(I) -> O> Borrow<O> for Memoized<I, O, Func> {    fn borrow(&self) -> &O {        match self {            &Memoized::Processed(ref x) => x,            _ => panic!("Attempted to borrow uninitalized memoized value")        }    }}​​mod test {    use super::Memoized;​    #[test]    fn test_memoized() {        let lambda = |x: i32| -> String {            x.to_string()        };        let mut dut = Memoized::new(lambda);        assert_eq!(dut.is_initialized(), false);        assert_eq!(&dut.fetch(5), "5");        assert_eq!(dut.is_initialized(), true);        assert_eq!(&dut.fetch(2000), "5");        let x: &str = &dut;        assert_eq!(x, "5");    }}​

Type Conversion

The type conversion allows converting an object from one type to another, e.g. following interface defined in Spring shows an example:

public interface Converter<S, T> {
	@Nullable
	T convert(S source);
	default <U> Converter<S, U> andThen(Converter<? super T, ? extends U> after) {
		Assert.notNull(after, "'after' Converter must not be null");
		return (S s) -> {
			T initialResult = convert(s);
			return (initialResult != null ? after.convert(initialResult) : null);
		};
	}
}

Java
 
public interface Converter<S, T> {    @Nullable    T convert(S source);    default <U> Converter<S, U> andThen(Converter<? super T, ? extends U> after) {        Assert.notNull(after, "'after' Converter must not be null");        return (S s) -> {            T initialResult = convert(s);            return (initialResult != null ? after.convert(initialResult) : null);        };    }}

This kind of type conversion looks very similar to the map/reduce primitives defined in functional programming languages, e.g. Java 8 added Function interface for such as transformation. In addition, Scala also supports implicit conversion from one type to another, e.g.,

object Conversions:
  given fromStringToUser: Conversion[String, User] = (name: String) => User(name)

Scala
 
object Conversions:  given fromStringToUser: Conversion[String, User] = (name: String) => User(name)

Rust also supports From and Into traits for converting types, e.g.,

use std::convert::From;

#[derive(Debug)]
struct Number {
    value: i32,
}

impl From<i32> for Number {
    fn from(item: i32) -> Self {
        Number { value: item }
    }
}

fn main() {
    let num1: Number = Number::from(10);
    let num2: Number = 20.into();
    
    println!("{:?} {:?}", num1, num2);
}

Rust
 
use std::convert::From;​#[derive(Debug)]struct Number {    value: i32,}​impl From<i32> for Number {    fn from(item: i32) -> Self {        Number { value: item }    }}​fn main() {    let num1: Number = Number::from(10);    let num2: Number = 20.into();        println!("{:?} {:?}", num1, num2);}

Comments (0)

March 24, 2022

Architecture Patterns and Practices for Sustainable Software Delivery Pipelines

Filed under: Project Management,Technology — Tags: agile, architecture, complexity, patterns, technical debt — admin @ 10:31 pm

Abstract

Software is eating the world and today’s businesses demand shipping software features at a higher velocity to enable learning at a greater pace without compromising the quality. However, each new feature increases viscosity of existing code, which can add more complexity and technical debt so the time to market for new features becomes longer. Managing a sustainable pace for software delivery requires continuous improvements to the software development architecture and practices.

Software Architecture

The Software Architecture defines guiding principles and structure of the software systems. It also includes quality attribute such as performance, sustainability, security, scalability, and resiliency. The software architecture is then continuously updated through iterative software development process and feedback cycle from the the actual use in production environment. The software architecture decays if itâ€™s ignored that results in a higher complexity and technical debt. In order to reduce technical debt, you can build a backlog of technical and architecture related changes so that you can prioritize along with the product development. In order to maintain consistent architecture throughout your organization, you can document the architecture principles to define high-level guidelines for best practices, documentation templates, review process and guidance for the architecture decisions.

Quality Attributes

Following are major quality attributes of the software architecture:

Availability â€” It defines percentage of time, the system is available, e.g. available-for-use-time / total-time. It is generally referred in percentiles such as P99.99, which indicates a down time of 52 minutes in a year. It can also be calculated in terms of as mean-time between failure (MTBF) and mean-time to recover (MTRR) using MTBF/(MTBF+MTRR). The availability will depend not only on the service that you are providing but also its dependent services, e.g. P-service * P-dep-service-1 * P-dep-service-2. You can improve availability with redundant services, which can be calculated as Max-availability - (100 - Service-availability) ** Redundancy-factor. In order to further improve availability, you can detect faults and use redundancy and state synchronization for fault recovery. The system should also handle exceptions gracefully so that it doesn’t crash or goes into a bad state.
Capacity â€” Capacity defines how the system scales by adding hardware resources.
Extensibility â€” Extensibility defines how the system meets future business requirements without significantly changing existing design and code.
Fault Tolerance â€” Fault tolerance prevents a single point of failure and allows the system to continue operating even when parts of the system fail.
Maintainability â€” Higher quality code allows building robust software with higher stability and availability. This improves software delivery due to modular and loosely coupled design.
Performance â€” It is defined in terms of latency of an operation under normal or peak load. A performance may degrade with consumptions of resources, which affects throughput and scalability of the system. You can measure user’s response-time, throughput and utilization of computational resources by stress testing the system. A number of tactics can be used to improve performance such as prioritization, reducing overhead, rate-limiting, asynchronicity, caching, etc. Performance testing can be integrated with continuous delivery process that use load and stress testing to measure performance metrics and resource utilization.
Resilience â€” Resilience accepts the fact that faults and failure will occur so instead system components resist them by retrying, restarting, limiting error propagation or other measures. A failure is when a system deviates from its expected behavior as a result of accidental fault, misconfigurations, transient network issues or programming error. Two metrics related to resilience are mean-time between failure (MTBF) and mean-time to recover (MTTR), however resilient systems pay more attention to recovery or a shorter MTTR for fast recovery.
Recovery â€” Recovery looks at system recover in relation with availability and resilience. Two metrics related to recovery are recovery point objective (RPO) and recovery time objective (RTO), where RPO determines data that can be lost in case of failure and RTO defines wait time for the system recovery.
Reliability â€” Reliability looks at the probability of failure or failure rate.
Reproducibility â€” Reproducibility uses version control for code, infrastructure, configuration so that you can track and audit changes easily.
Reusability â€” It encourages code reuse to improve reliability, productivity and cost savings from the duplicated effort.
Scalability â€” It defines ability of the system to handle increase in workload without performance degradation. It can be expressed in terms of vertical or horizontal scalability, where horizontal reduces impact of isolated failure and improves workload availability. Cloud computing offers elastic and auto-scaling features for adding additional hardware when higher request rate is detected by the load balancer.
Security â€” Security primarily looks at confidentiality, integrity, availability (CIA) and is critical in building distributed systems. Building secure systems depends on security practices such as a strong identity management, defense in depth, zero trust networks, auditing, ad protecting while data in motion or at rest. You can adopt DevSecOps that shifts security left to earlier in software development lifecycle with processes such as Security by Design (SbD), STRIDE (Spoofing, Tampering, Repudiation, Disclosure, Denial of Service, Elevation of privilege), PASTA (Process for Attack Simulation and Threat Analysis), VAST (Visual, Agile and Simple Threat), CAPEC (Common Attack pattern Enumeration and Classification), and OCTAGE (Operationally Critical Threat, and Vulnerability Evaluation).
Testability â€” It encourages building systems in a such way it’s easier to test them.
Usability â€” It defines user experience of user interface and information architecture.

Architecture Patterns

Following is a list of architecture patterns that help building a high quality software:

Asynchronicity

Synchronous services are difficult to scale and recover from failures because they require low-latency and can easily overwhelm the services. Messaging-based asynchronous communication based on point-to-point or publish/subscribe are more suitable for handling faults or high load. This improves resilience because service components can restart in case of failure while messages remain in the queue.

Admission Control

The admission control adds a authentication, authorization or validation check in front event queue so that service can handle the load and prevent overload when demand exceeds the server capacity.

Back Pressure

When a producer is generating workload faster than the server can process, it can result in long request queues. Back pressure signals clients that servers are overloaded and clients need to slow down. However, rogue clients may ignore these signals so servers often employ other tactics such as admission control, load shedding, rate limiting or throttling requests.

Big fleet in front of small fleet

You should look at all transitive dependencies when scaling a service with a large fleet of hosts so that you don’t drive a large network traffic that needs to invoke dependent services with a smaller fleet. You can use use load testing to find the bottlenecks and update SLAs for the dependent services so that they are aware of network load from your APIs.

Blast Radius

A blast radius defines impact of failure on overall system when an error occurs. In order to limit the blast radius, the system should eliminate a single point of failure, rolling deploy changes using canaries and stop cascading failures using circuit breakers, retry and timeout.

Bulkheads

Bulkheads isolate faults from one component to another, e.g. you may use different thread pool for different workloads or use multiple regions/availability-zones to isolate failures in a specific datacenter.

Caching

Caching can be implemented at a several layers to improve performance such as database-cache, application-cache, proxy/edge cache, pre-compute cache and client-side cache.

Circuit Breakers

The circuit breaker is defined as a state machine with three states: normal, checking and tripped. It can be used to detect persistent failures in a dependent service and trip its state to disable invocation of the service temporarily with some default behavior. It can be later changed to the checking state for detecting success, which changes its state to normal after a successful invocation of the dependent service.

CQRS / Event Sourcing

Command and Query Responsibility Segregation (CQRS) separates read and update operations in the database. It’s often implemented using event-sourcing that records changes in an append-only store for maintaining consistency and audit trails.

Default Values

Default values provide a simple way to provide limited or degraded behavior in case of failure in dependent configuration or control service.

Disaster Recovery

Disaster recovery (DR) enables business continuity in the event of large-scale failure of data centers. Based on cost, availability and RTO/RPO constraints, you can deploy services to multiple regions for hot site; only replicate data from one region to another region while keeping servers as standby for warm site; or use backup/restore for cold site. It is essential to periodically test and verify these DR procedures and processes.

Distributed Saga

Maintaining data consistency in a distributed system where data is stored in multiple databases can be hard and using 2-phase-commit may incur high complexity and performance. You can use distributed Saga for implementing long-running transactions. It maintains state of the transaction and applies compensating transactions in case of a failure.

Failing Fast

You can fail fast if the workload cannot serve the request due to unavailability of resources or dependent services. In some cases, you can queue requests, however it’s best to keep those queues short so that you are not spending resources to serve stale requests.

Function as a Service

Function as a service (FaaS) offers serverless computing to simplify managing physical resources. Cloud vendors offer APIs for AWS Lambda, Google Cloud Functions and Azure Functions to build serverless applications for scalable workloads. These functions can be easily scaled to handle load spikes, however you have to be careful scaling these functions so that any services that they depend on can support the workload. Each function should be designed with a single responsibility, idempotency and shared nothing principles that can be executed concurrently. The serverless applications generally use event-based architecture for triggering functions and as the serverless functions are more granular, they incur more communication overhead. In addition, chaining functions within the code can result in tightly coupled applications, instead use a state machine or a workflow to orchestrate the communication flow. There is also an open source support for FaaS based serverless computing such as OpenFaas and OpenWhisk on top of Kubernetes or OpenShift, which prevents locking into a specific cloud provider.

Graceful Degradation

Instead of failing a request when dependent components are unhealthy, a service may use circuit-breaker pattern to return a predefined or default response.

Health Checks

Health checks runs a dummy or synthetic transaction that performs the action without affecting real data to verify the system component and its dependencies.

Idempotency

Idempotent services completes API request only exactly once so resending same request due to retries has no side effect. Idempotent APIs typically uses a client-generated identifier or token and the idempotent service returns same response if duplicate request is received.

Layered Architecture

The layered architecture separates software into different concerns such as:

Presentation Layer
Business Logic Layer
Service Layer
Domain Model Layer
Data Access Layer

Load Balancer

Load balancer allows distributing traffic among groups of resources so that a single resource is not overloaded. These load balancers also monitors health of servers and you can setup a load balancer for each group of resources to ensure that requests are not routed to unhealthy or unavailable resources.

Load Shedding

Load shedding allows rejection the work at the edge when server side exceeds its capacity, e.g. a server may return HTTP 429 error to signal clients that they can retry at a slower rate.

Loosely coupled dependencies

Using queuing systems, streaming systems, and workflows isolate behavior of dependent components and increases resiliency with asynchronous communication.

MicroServices

Microservices evolved from service oriented architecture (SOA) and support both point to point protocols such as REST/gRPC and asynchronous protocols based on messaging/event bus. You can apply bounded-context of domain-driven design (DDD) to design loosely coupled services.

Model-View Controller

It decouples user interface from the data model and application functionality so that each component can be independently tested. Other variations of this pattern include modelâ€“viewâ€“presenter (MVP) and modelâ€“viewâ€“viewmodel (MVVM).

NoSQL

NoSQL database technology provide support for high availability and variable/write-heavy workloads that can be easily scaled with additional hardware. NoSQL optimizes CAP and PACELC tradeoffs of consistency, availability, partition tolerance and latency, A number of cloud vendors provide managed NoSQL database solutions, however they can create latency issues if services accessing these databases are not colocated.

No Single Point of Failure

In order to eliminate single-points of failures for providing high availability and failover, you can deploy redundant services to multiple regions and availability zones.

Ports and Adapters

Ports and Adapters (Hexagon) separates interface (ports) from implementation (adapters). The business logic is encapsulated in the Hexagon that is invoked by the implementation (adapters) when actors operate on capabilities offered by the interface (port).

Rate Limiting and Throttling

Rate-limiting defines the rate at which clients can access the services based on the license policy. The throttling can be used to restrict access as a result of unexpected increase in demand. For example, the server can return HTTP 429 to notify clients that they can backoff or retry at a slower rate.

Retries with Backoff and Jitter

A remote operation can be retried if it fails due to transient failure or a server overload, however each retry should use a capped exponential backoff so that retries don’t cause additional load on the server. In a layered architecture, retry should be performed at a single point to minimize multifold retries. Retries can use circuit-breakers and rate-limiting to throttle requests. In some cases, requests may timeout for clients but succeed on the server side so the APIs must be designed with idempotency so that they are safe to retry. In order to avoid retries at the same time, a small random jitter can be added with retries.

Rollbacks

The software should be designed with rollbacks in mind so that all code, database schema and configurations can be easily rolled back. A production environment might be running multiple versions of same service so care must be taken to design the APIs that are both backward and forward compatibles.

Stateless and Shared nothing

Shared nothing architecture helps building stateless and loosely decoupled services that can be easily horizontally scaled for providing high availability. This architecture allows recovering from isolated failures and support auto-scaling by shrinking or expanding resources based on the traffic patterns.

Startup dependencies

Upon start of services, they may need to connect to certain configuration or bootstrap services so care must be taken to avoid thundering herd problems that can overwhelm those dependent services in the event of a wide region outage.

Timeouts

Timeouts help building resilient systems by throttling invocation of external services and preventing the thundering herd problem. Timeouts can also be used when retrying a failed operation after a transient failure or a server overload. A timeout can also add a small jitter to randomly spread the load on the server. Jitter can also be applied to timers of scheduled jobs or delayed work.

Watchdogs and Alerts

A watchdogs monitors a system component for a specific action such as latency, traffic, errors, saturation and SLOs. It then send an alert based on the monitoring configuration that triggers an email, on-call paging or an escalation.

Virtualization and Containers

Virtualization allows abstracting computing resources using virtual machines or containers so that you don’t depend on physical implementation. A virtual machine is a complete operating system on top of hypervisors whereas container is an isolated, lightweight environment for running applications. Virtualization allows building immutable infrastructure that are specially design to meet application requirements and can be easily deployed on a variety of hardware resources.

Architecture Practices

Following are best practices for sustainable software delivery:

Automation

Automation builds pipelines for continuous integration, continuous testing and continuous delivery to improve speed and agility of the software delivery. Any kind of operation procedures for deployment and monitoring can be stored in version control and then automatically applied with CI/CD procedures. In addition, automated procedures can be defined to track failures based on key performance indicators and trigger recovery or repair for the erroneous components.

Automated Testing

Automated testing allows building software with a suite of unit, integration, functional, load and security tests that verify the behavior and ensures that it can meet production demand. These automated tests will run as part of CI/CD pipelines and will stop deployment if any of the tests fail. In order to run end-to-end and load tests, the deployment scripts will create a new environment and setup a tests data. These tests may replicate synthetic transactions based on production traffic and benchmark the performance metrics.

Capacity Planning

Using load testing, monitoring production traffic patterns and demand with workload utilization help forecast the resources needed for future growth. This can be further strengthened with a capacity model that calculates unit-price of resources and growth forecast so that you can automate addition or removal of resources based on demand.

Cloud Computing

Adopting cloud computing simplifies resource provisioning and its elasticity allows organizations to grow or shrink those resources based on the demand. You can also add automation to optimize utilization of the resources and reduce costs when allocating more resources.

Continuous Delivery

Continuous delivery automates production deployment of small and frequent changes by developers. Continuous delivery relies on continuous integration that runs automated tests and automated deployment without any manual interventions. During a software development process, a developer picks a feature, works on changes and then commits changes to the source control after peer code-review. The automated build system will run a pipeline to create a container image based on the commit and then deploy it to a test or QA environment. The test environment will run automated unit, integration and regression tests using a test data in the database. The code is then promoted to the main branch and the automated build system tags and build the image on the head commit of main-branch, which that is pushed to the container registry. The pre-prod environment pulls the image, restarts the pre-prod container and runs more comprehensive tests with a larger set of test data in the database including performance tests. You may need multiple stages of pre-prod deployment such as alpha, beta and gamma environments, where each environment may require deployment to a unique datacenter. After successful testing, the production systems are updated with the new image using rolling updates, blue/green deployments or canary deployments to minimize disruption to end users. The monitoring system watches for error rates at each stage of the deployment and automatically rollbacks changes if a problem occurs.

Deploy over Multiple Zones and Regions

In order to provide high availability, compliance and reduced latency, you can deploy to multiple availability zones and regions. Global load balancers can be used to route traffic based on geographic proximity to the closest region. This also helps implementing business continuity as applications can easily failover to another region with minimal data.

Service Mesh

In order to easily build distributed systems, a number of platforms based on service-mesh pattern have emerged to abstract a common set of problems such as network communication, security, observability, etc:

Dapr – Distributed Application Runtime

The Distributed Application Runtime (Dapr) provides a variety of communication protocols, encryption, observability and secret management for building secured and resilient distributed services.

Envoy

Envoy is a service proxy for building cloud native application with builtin support for networking protocols and observability.

Istio service mesh

Istio is built on top of Kubernetes and Envoy to build service mesh with builtin support for networking, traffic management, observability and security. A service mesh also addresses features such as A/B testing, canary deployments, rate limiting, access control, encryption, and end-to-end authentication.

Linkerd

Linkerd is a service mesh for Kubernetes and consists of control-plane and data-plane with builtin support for networking, observability and security. The control-plane allows controlling services and data-plane acts as a sidecar container that handles network traffic and communicate with the control-plane for configuration.

WebAssembly

The WebAssembly is a stack-based virtual machine that can run at the edge or in cloud. A number of WebAssembly platforms have adopted Actor model to build a platform for writing distributed applications such as wasmCloud and Lunatic.

Documentation

The architecture document defines goals and constraints of the software system and provides various perspectives such as use-cases, logical, data, processes, and physical deployment. It also includes non-functional or quality attributes such as performance, growth, scalability, etc. You can document these aspects using standards such as 4+1, C4, and ERD as well as document the broader enterprise architecture using methodologies like TOGAF, Zachman, and EA.

Incident management

Incident management defines process of root-cause analysis and actions that organization can take when an incident occurs affecting production environment. It defines best practices such as clear ownership, reducing time to detect/mitigate, blameless postmortems and prevention measures. The organization can then implement preventing measures and share lessons learned from all operational events and failures across teams. You can also use pre-mortem to identify potential areas that can be improved or mitigated. Another way to simulate potential problems is using chaos engineering or setting up game days to test the workloads for various scenarios and outage.

Infrastructure as Code

Infrastructure as code uses declarative language to define development, test and production environment, which is managed by the source code management software. These provisioning and configuration logic can be used by CI/CD pipelines to automatically deploy and test environments. Following is a list of frameworks for building infrastructure from code:

Azure Resource Manager

Azure cloud offer Azure Resource Manager (ARM) templates based on JSON format to declaratively define the infrastructure that you intend to deploy.

AWS Cloud Development Kit

The Cloud Development Kit (CDK) supports high-level programming languages to construct cloud resources on Amazon Web Services so that you can easily build cloud applications.

Hashicorp Terraform

Terraform uses HCL based configurations to describe computing resources that can be deployed to multiple cloud providers.

Monitoring

Monitoring measures key performance indicators (KPI) and service-level objectives (SLO) that are defined at the infrastructure, applications, services and end-to-end levels. These include both business and technical metrics such as number of errors, hot spots, call graphs, which are visible to the entire team for monitoring trends and reacting quickly to failures.

Multi-tenancy

If your system is consumed by a different groups or tenants of users, you will need to design your system and services so that it isolates data and computing resources for secure and reliable fashion. Each layer of the system can be designed to treat tenant context as a first-class construct, which is tied to the user identity. You can capture usage metrics per tenant to identify bottlenecks, estimate cost and analyze the resource utilization for capacity planning and growth projections. The operational dashboards can also use these metrics to construct tenant-based operational views and proactively respond to unexpected load.

Security Review

In order to minimize the security risk, the development teams can adopt shift-left on security and DevSecOps practices to closely collaborate with the InfoSec team and integrate security review into every phase of the software development lifecycle.

Version Control Systems

Version control systems such as Git or Mercurial help track code changes, configurations and scripts over time. You can adopt workflows such as gitflow or trunk-based development for check-in process. Other common practices include smaller commits, testing code and running static analysis or linters/profiling tools before checkin.

Summary

The software complexity is a major reason for missed deadlines and slow/buggy software. This complexity can be essential complexity within the business domain but it’s often result of accidental complexity as a result of technical debt, poor architecture and development practices. Another source of incidental complexity comes from distributed computing where you need handle security, rate-limiting, observability, etc. that needs to be applied consistently across the distributed systems. For example, virtualization helps building immutable infrastructures and adopting infrastructure as a code; functions as a service simplifies building micro-services; and distributed platforms such as Istio, Linkerd remove a lot of cruft such as security, observability, traffic management and communication protocols when building distributed systems. The goal of a good architecture is to simplify building, testing, deploying and operating a software. You need to continually improve the systems architecture and its practices to build sustainable software delivery pipelines that can meet both current and future demands of users.

Comments (0)

Shahzad Bhatti Welcome to my ramblings and rants!

March 24, 2025

K8 Highlander: Managing Stateful and Singleton Processes in Kubernetes

Introduction

Architecture

Core Components

How Leader Election Works

Workload Types

Deploying and Using K8 Highlander

Prerequisites

Installation Using Docker

Using K8 Highlander Locally for Testing

Basic Configuration

Example Workload Configurations

Process Workload

CronJob Workload

Service Workload

Persistent Workload

High Availability Setup

API and Monitoring Capabilities

Dashboard

API Endpoints

Prometheus Metrics

Grafana Dashboard

Advanced Features

Multi-Tenant Support

Multi-Cluster Deployment

Summary

Where to Go from Here

April 19, 2024

Effective Load Shedding and Throttling Strategies for Managing Traffic Spikes and DDoS Attacks

1. Traffic Patterns and Shapes

Classifying Requests

2. Throttling and Rate Limiting

2.1 Error Response and Headers

3. Load Shedding

3.1 Error Response

4. Additional Techniques for Throttling and Load Shedding

5. Pitfalls with Use of Throttling and Load Shedding

6. Monitoring Metrics and Notifications

6.1 Network Traffic Metrics:

6.2 Application Metrics:

6.3 System Metrics:

6.4 Load Balancer Metrics:

6.5 Alarms and Notifications:

6.6 Autoscaling Policies:

6.7 Throttling / Load Shedding Overhead:

7. Summary

8. References

March 26, 2023

Elegant Implementation Patterns

Functional Options Pattern

State Pattern with Enum

Tail Recursion with Trampolines and Thunks

Memoization

Type Conversion

March 24, 2022

Architecture Patterns and Practices for Sustainable Software Delivery Pipelines

Abstract

Software Architecture

Quality Attributes

Architecture Patterns

Asynchronicity

Admission Control

Back Pressure

Big fleet in front of small fleet

Blast Radius

Bulkheads

Caching

Circuit Breakers

CQRS / Event Sourcing

Default Values

Disaster Recovery

Distributed Saga

Failing Fast

Function as a Service

Graceful Degradation

Health Checks

Idempotency

Layered Architecture