Building Robust Error Handling with gRPC and REST APIs

August 15, 2025

Building Robust Error Handling with gRPC and REST APIs

Filed under: Computing,Web Services — admin @ 2:23 pm

Introduction

Error handling is often an afterthought in API development, yet it’s one of the most critical aspects of a good developer experience. For example, a cryptic error message like { "error": "An error occurred" } can lead to hours of frustrating debugging. In this guide, we will build a robust, production-grade error handling framework for a Go application that serves both gRPC and a REST/HTTP proxy based on industry standards like RFC9457 (Problem Details for HTTP APIs) and RFC7807 (obsoleted).

Tenets

Following are tenets of a great API error:

Structured: machine-readable, not just a string.
Actionable: explains the developer why the error occurred and, if possible, how to fix it.
Consistent: all errors, from validation to authentication to server faults, follow the same format.
Secure: never leaks sensitive internal information like stack traces or database schemas.

Our North Star for HTTP errors will be the Problem Details for HTTP APIs (RFC 9457/7807):

{
  "type": "https://example.com/docs/errors/validation-failed",
  "title": "Validation Failed",
  "status": 400,
  "detail": "The request body failed validation.",
  "instance": "/v1/todos",
  "invalid_params": [
    {
      "field": "title",
      "reason": "must not be empty"
    }
  ]
}

We will adapt this model for gRPC by embedding a similar structure in the gRPC status details, creating a single source of truth for all errors.

API Design

Let’s start by defining our TODO API in Protocol Buffers:

syntax = "proto3";

package todo.v1;

import "google/api/annotations.proto";
import "google/api/field_behavior.proto";
import "google/api/resource.proto";
import "google/protobuf/timestamp.proto";
import "google/protobuf/field_mask.proto";
import "buf/validate/validate.proto";

option go_package = "github.com/bhatti/todo-api-errors/api/proto/todo/v1;todo";

// TodoService provides task management operations
service TodoService {
  // CreateTask creates a new task
  rpc CreateTask(CreateTaskRequest) returns (Task) {
    option (google.api.http) = {
      post: "/v1/tasks"
      body: "*"
    };
  }

  // GetTask retrieves a specific task
  rpc GetTask(GetTaskRequest) returns (Task) {
    option (google.api.http) = {
      get: "/v1/{name=tasks/*}"
    };
  }

  // ListTasks retrieves all tasks
  rpc ListTasks(ListTasksRequest) returns (ListTasksResponse) {
    option (google.api.http) = {
      get: "/v1/tasks"
    };
  }

  // UpdateTask updates an existing task
  rpc UpdateTask(UpdateTaskRequest) returns (Task) {
    option (google.api.http) = {
      patch: "/v1/{task.name=tasks/*}"
      body: "task"
    };
  }

  // DeleteTask removes a task
  rpc DeleteTask(DeleteTaskRequest) returns (DeleteTaskResponse) {
    option (google.api.http) = {
      delete: "/v1/{name=tasks/*}"
    };
  }

  // BatchCreateTasks creates multiple tasks at once
  rpc BatchCreateTasks(BatchCreateTasksRequest) returns (BatchCreateTasksResponse) {
    option (google.api.http) = {
      post: "/v1/tasks:batchCreate"
      body: "*"
    };
  }
}

// Task represents a TODO item
message Task {
  option (google.api.resource) = {
    type: "todo.example.com/Task"
    pattern: "tasks/{task}"
    singular: "task"
    plural: "tasks"
  };

  // Resource name of the task
  string name = 1 [
    (google.api.field_behavior) = IDENTIFIER,
    (google.api.field_behavior) = OUTPUT_ONLY
  ];

  // Task title
  string title = 2 [
    (google.api.field_behavior) = REQUIRED,
    (buf.validate.field).string = {
      min_len: 1
      max_len: 200
    }
  ];

  // Task description
  string description = 3 [
    (google.api.field_behavior) = OPTIONAL,
    (buf.validate.field).string = {
      max_len: 1000
    }
  ];

  // Task status
  Status status = 4 [
    (google.api.field_behavior) = REQUIRED
  ];

  // Task priority
  Priority priority = 5 [
    (google.api.field_behavior) = OPTIONAL
  ];

  // Due date for the task
  google.protobuf.Timestamp due_date = 6 [
    (google.api.field_behavior) = OPTIONAL,
    (buf.validate.field).timestamp = {
      gt_now: true
    }
  ];

  // Task creation time
  google.protobuf.Timestamp create_time = 7 [
    (google.api.field_behavior) = OUTPUT_ONLY
  ];

  // Task last update time
  google.protobuf.Timestamp update_time = 8 [
    (google.api.field_behavior) = OUTPUT_ONLY
  ];

  // User who created the task
  string created_by = 9 [
    (google.api.field_behavior) = OUTPUT_ONLY
  ];

  // Tags associated with the task
  repeated string tags = 10 [
    (buf.validate.field).repeated = {
      max_items: 10
      items: {
        string: {
          pattern: "^[a-z0-9-]+$"
          max_len: 50
        }
      }
    }
  ];
}

// Task status enumeration
enum Status {
  STATUS_UNSPECIFIED = 0;
  STATUS_PENDING = 1;
  STATUS_IN_PROGRESS = 2;
  STATUS_COMPLETED = 3;
  STATUS_CANCELLED = 4;
}

// Task priority enumeration
enum Priority {
  PRIORITY_UNSPECIFIED = 0;
  PRIORITY_LOW = 1;
  PRIORITY_MEDIUM = 2;
  PRIORITY_HIGH = 3;
  PRIORITY_CRITICAL = 4;
}

// CreateTaskRequest message
message CreateTaskRequest {
  // Task to create
  Task task = 1 [
    (google.api.field_behavior) = REQUIRED,
    (buf.validate.field).required = true
  ];
}

// GetTaskRequest message
message GetTaskRequest {
  // Resource name of the task
  string name = 1 [
    (google.api.field_behavior) = REQUIRED,
    (google.api.resource_reference) = {
      type: "todo.example.com/Task"
    },
    (buf.validate.field).string = {
      pattern: "^tasks/[a-zA-Z0-9-]+$"
    }
  ];
}

// ListTasksRequest message
message ListTasksRequest {
  // Maximum number of tasks to return
  int32 page_size = 1 [
    (buf.validate.field).int32 = {
      gte: 0
      lte: 1000
    }
  ];

  // Page token for pagination
  string page_token = 2;

  // Filter expression
  string filter = 3;

  // Order by expression
  string order_by = 4;
}

// ListTasksResponse message
message ListTasksResponse {
  // List of tasks
  repeated Task tasks = 1;

  // Token for next page
  string next_page_token = 2;

  // Total number of tasks
  int32 total_size = 3;
}

// UpdateTaskRequest message
message UpdateTaskRequest {
  // Task to update
  Task task = 1 [
    (google.api.field_behavior) = REQUIRED,
    (buf.validate.field).required = true
  ];

  // Fields to update
  google.protobuf.FieldMask update_mask = 2 [
    (google.api.field_behavior) = REQUIRED,
    (buf.validate.field).required = true
  ];
}

// DeleteTaskRequest message
message DeleteTaskRequest {
  // Resource name of the task
  string name = 1 [
    (google.api.field_behavior) = REQUIRED,
    (google.api.resource_reference) = {
      type: "todo.example.com/Task"
    }
  ];
}

// DeleteTaskResponse message
message DeleteTaskResponse {
  // Confirmation message
  string message = 1;
}

// BatchCreateTasksRequest message
message BatchCreateTasksRequest {
  // Tasks to create
  repeated CreateTaskRequest requests = 1 [
    (google.api.field_behavior) = REQUIRED,
    (buf.validate.field).repeated = {
      min_items: 1
      max_items: 100
    }
  ];
}

// BatchCreateTasksResponse message
message BatchCreateTasksResponse {
  // Created tasks
  repeated Task tasks = 1;
}

syntax = "proto3";

package errors.v1;

import "google/protobuf/timestamp.proto";
import "google/protobuf/any.proto";

option go_package = "github.com/bhatti/todo-api-errors/api/proto/errors/v1;errors";

// ErrorDetail provides a structured, machine-readable error payload.
// It is designed to be embedded in the `details` field of a `google.rpc.Status` message.
message ErrorDetail {
  // A unique, application-specific error code.
  string code = 1;
  // A short, human-readable summary of the problem type.
  string title = 2;
  // A human-readable explanation specific to this occurrence of the problem.
  string detail = 3;
  // A list of validation errors, useful for INVALID_ARGUMENT responses.
  repeated FieldViolation field_violations = 4;
  // Optional trace ID for request correlation
  string trace_id = 5;
  // Optional timestamp when the error occurred
  google.protobuf.Timestamp timestamp = 6;
  // Optional instance path where the error occurred
  string instance = 7;
  // Optional extensions for additional error context
  map<string, google.protobuf.Any> extensions = 8;
}

// Describes a single validation failure.
message FieldViolation {
  // The path to the field that failed validation, e.g., "title".
  string field = 1;
  // A developer-facing description of the validation rule that failed.
  string description = 2;
  // Application-specific error code for this validation failure
  string code = 3;
}

// AppErrorCode defines a list of standardized, application-specific error codes.
enum AppErrorCode {
  APP_ERROR_CODE_UNSPECIFIED = 0;

  // Validation failures
  VALIDATION_FAILED = 1;
  REQUIRED_FIELD = 2;
  TOO_SHORT = 3;
  TOO_LONG = 4;
  INVALID_FORMAT = 5;
  MUST_BE_FUTURE = 6;
  INVALID_VALUE = 7;
  DUPLICATE_TAG = 8;
  INVALID_TAG_FORMAT = 9;
  OVERDUE_COMPLETION = 10;
  EMPTY_BATCH = 11;
  BATCH_TOO_LARGE = 12;
  DUPLICATE_TITLE = 13;

  // Resource errors
  RESOURCE_NOT_FOUND = 1001;
  RESOURCE_CONFLICT = 1002;

  // Authentication and authorization
  AUTHENTICATION_FAILED = 2001;
  PERMISSION_DENIED = 2002;

  // Rate limiting and service availability
  RATE_LIMIT_EXCEEDED = 3001;
  SERVICE_UNAVAILABLE = 3002;

  // Internal errors
  INTERNAL_ERROR = 9001;
}

Error Handling Implementation

Now let’s implement our error handling framework:

package errors

import (
	"fmt"

	errorspb "github.com/bhatti/todo-api-errors/api/proto/errors/v1"
	"google.golang.org/genproto/googleapis/rpc/errdetails"
	"google.golang.org/grpc/codes"
	"google.golang.org/grpc/status"
	"google.golang.org/protobuf/types/known/anypb"
	"google.golang.org/protobuf/types/known/timestamppb"
)

// AppError is our custom error type using protobuf definitions.
type AppError struct {
	GRPCCode        codes.Code
	AppCode         errorspb.AppErrorCode
	Title           string
	Detail          string
	FieldViolations []*errorspb.FieldViolation
	TraceID         string
	Instance        string
	Extensions      map[string]*anypb.Any
	CausedBy        error // For internal logging
}

func (e *AppError) Error() string {
	return fmt.Sprintf("gRPC Code: %s, App Code: %s, Title: %s, Detail: %s", e.GRPCCode, e.AppCode, e.Title, e.Detail)
}

// ToGRPCStatus converts our AppError into a gRPC status.Status.
func (e *AppError) ToGRPCStatus() *status.Status {
	st := status.New(e.GRPCCode, e.Title)

	errorDetail := &errorspb.ErrorDetail{
		Code:            e.AppCode.String(),
		Title:           e.Title,
		Detail:          e.Detail,
		FieldViolations: e.FieldViolations,
		TraceId:         e.TraceID,
		Timestamp:       timestamppb.Now(),
		Instance:        e.Instance,
		Extensions:      e.Extensions,
	}

	// For validation errors, we also attach the standard BadRequest detail
	// so that gRPC-Gateway and other standard tools can understand it.
	if e.GRPCCode == codes.InvalidArgument && len(e.FieldViolations) > 0 {
		br := &errdetails.BadRequest{}
		for _, fv := range e.FieldViolations {
			br.FieldViolations = append(br.FieldViolations, &errdetails.BadRequest_FieldViolation{
				Field:       fv.Field,
				Description: fv.Description,
			})
		}
		st, _ = st.WithDetails(br, errorDetail)
		return st
	}

	st, _ = st.WithDetails(errorDetail)
	return st
}

// Helper functions for creating common errors

func NewValidationFailed(violations []*errorspb.FieldViolation, traceID string) *AppError {
	return &AppError{
		GRPCCode:        codes.InvalidArgument,
		AppCode:         errorspb.AppErrorCode_VALIDATION_FAILED,
		Title:           "Validation Failed",
		Detail:          fmt.Sprintf("The request contains %d validation errors", len(violations)),
		FieldViolations: violations,
		TraceID:         traceID,
	}
}

func NewNotFound(resource string, id string, traceID string) *AppError {
	return &AppError{
		GRPCCode: codes.NotFound,
		AppCode:  errorspb.AppErrorCode_RESOURCE_NOT_FOUND,
		Title:    "Resource Not Found",
		Detail:   fmt.Sprintf("%s with ID '%s' was not found.", resource, id),
		TraceID:  traceID,
	}
}

func NewConflict(resource, reason string, traceID string) *AppError {
	return &AppError{
		GRPCCode: codes.AlreadyExists,
		AppCode:  errorspb.AppErrorCode_RESOURCE_CONFLICT,
		Title:    "Resource Conflict",
		Detail:   fmt.Sprintf("Conflict creating %s: %s", resource, reason),
		TraceID:  traceID,
	}
}

func NewInternal(message string, traceID string, causedBy error) *AppError {
	return &AppError{
		GRPCCode: codes.Internal,
		AppCode:  errorspb.AppErrorCode_INTERNAL_ERROR,
		Title:    "Internal Server Error",
		Detail:   message,
		TraceID:  traceID,
		CausedBy: causedBy,
	}
}

func NewPermissionDenied(resource, action string, traceID string) *AppError {
	return &AppError{
		GRPCCode: codes.PermissionDenied,
		AppCode:  errorspb.AppErrorCode_PERMISSION_DENIED,
		Title:    "Permission Denied",
		Detail:   fmt.Sprintf("You don't have permission to %s %s", action, resource),
		TraceID:  traceID,
	}
}

func NewServiceUnavailable(message string, traceID string) *AppError {
	return &AppError{
		GRPCCode: codes.Unavailable,
		AppCode:  errorspb.AppErrorCode_SERVICE_UNAVAILABLE,
		Title:    "Service Unavailable",
		Detail:   message,
		TraceID:  traceID,
	}
}

func NewRequiredField(field, message string, traceID string) *AppError {
	return &AppError{
		GRPCCode: codes.InvalidArgument,
		AppCode:  errorspb.AppErrorCode_VALIDATION_FAILED,
		Title:    "Validation Failed",
		Detail:   "The request contains validation errors",
		FieldViolations: []*errorspb.FieldViolation{
			{
				Field:       field,
				Code:        errorspb.AppErrorCode_REQUIRED_FIELD.String(),
				Description: message,
			},
		},
		TraceID: traceID,
	}
}

Validation Framework

Let’s implement validation that returns all errors at once:

package validation

import (
	"errors"
	"fmt"
	"regexp"
	"strings"

	"buf.build/gen/go/bufbuild/protovalidate/protocolbuffers/go/buf/validate"
	"buf.build/go/protovalidate"
	errorspb "github.com/bhatti/todo-api-errors/api/proto/errors/v1"
	todopb "github.com/bhatti/todo-api-errors/api/proto/todo/v1"
	apperrors "github.com/bhatti/todo-api-errors/internal/errors"
	"google.golang.org/protobuf/proto"
)

var pv protovalidate.Validator

func init() {
	var err error
	pv, err = protovalidate.New()
	if err != nil {
		panic(fmt.Sprintf("failed to initialize protovalidator: %v", err))
	}
}

// ValidateRequest checks a proto message and returns an AppError with all violations.
func ValidateRequest(req proto.Message, traceID string) error {
	if err := pv.Validate(req); err != nil {
		var validationErrs *protovalidate.ValidationError
		if errors.As(err, &validationErrs) {
			var violations []*errorspb.FieldViolation
			for _, violation := range validationErrs.Violations {
				fieldPath := ""
				if violation.Proto.GetField() != nil {
					fieldPath = formatFieldPath(violation.Proto.GetField())
				}

				ruleId := violation.Proto.GetRuleId()
				message := violation.Proto.GetMessage()

				violations = append(violations, &errorspb.FieldViolation{
					Field:       fieldPath,
					Description: message,
					Code:        mapConstraintToCode(ruleId),
				})
			}
			return apperrors.NewValidationFailed(violations, traceID)
		}
		return apperrors.NewInternal("Validation failed", traceID, err)
	}
	return nil
}

// ValidateTask performs additional business logic validation
func ValidateTask(task *todopb.Task, traceID string) error {
	var violations []*errorspb.FieldViolation

	// Proto validation first
	if err := ValidateRequest(task, traceID); err != nil {
		if appErr, ok := err.(*apperrors.AppError); ok {
			violations = append(violations, appErr.FieldViolations...)
		}
	}

	// Additional business rules
	if task.Status == todopb.Status_STATUS_COMPLETED && task.DueDate != nil {
		if task.UpdateTime != nil && task.UpdateTime.AsTime().After(task.DueDate.AsTime()) {
			violations = append(violations, &errorspb.FieldViolation{
				Field:       "due_date",
				Code:        errorspb.AppErrorCode_OVERDUE_COMPLETION.String(),
				Description: "Task was completed after the due date",
			})
		}
	}

	// Validate tags format
	for i, tag := range task.Tags {
		if !isValidTag(tag) {
			violations = append(violations, &errorspb.FieldViolation{
				Field:       fmt.Sprintf("tags[%d]", i),
				Code:        errorspb.AppErrorCode_INVALID_TAG_FORMAT.String(),
				Description: fmt.Sprintf("Tag '%s' must be lowercase letters, numbers, and hyphens only", tag),
			})
		}
	}

	// Check for duplicate tags
	tagMap := make(map[string]bool)
	for i, tag := range task.Tags {
		if tagMap[tag] {
			violations = append(violations, &errorspb.FieldViolation{
				Field:       fmt.Sprintf("tags[%d]", i),
				Code:        errorspb.AppErrorCode_DUPLICATE_TAG.String(),
				Description: fmt.Sprintf("Tag '%s' appears multiple times", tag),
			})
		}
		tagMap[tag] = true
	}

	if len(violations) > 0 {
		return apperrors.NewValidationFailed(violations, traceID)
	}

	return nil
}

// ValidateBatchCreateTasks validates batch operations
func ValidateBatchCreateTasks(req *todopb.BatchCreateTasksRequest, traceID string) error {
	var violations []*errorspb.FieldViolation

	// Check batch size
	if len(req.Requests) == 0 {
		violations = append(violations, &errorspb.FieldViolation{
			Field:       "requests",
			Code:        errorspb.AppErrorCode_EMPTY_BATCH.String(),
			Description: "Batch must contain at least one task",
		})
	}

	if len(req.Requests) > 100 {
		violations = append(violations, &errorspb.FieldViolation{
			Field:       "requests",
			Code:        errorspb.AppErrorCode_BATCH_TOO_LARGE.String(),
			Description: fmt.Sprintf("Batch size %d exceeds maximum of 100", len(req.Requests)),
		})
	}

	// Validate each task
	for i, createReq := range req.Requests {
		if createReq.Task == nil {
			violations = append(violations, &errorspb.FieldViolation{
				Field:       fmt.Sprintf("requests[%d].task", i),
				Code:        errorspb.AppErrorCode_REQUIRED_FIELD.String(),
				Description: "Task is required",
			})
			continue
		}

		// Validate task
		if err := ValidateTask(createReq.Task, traceID); err != nil {
			if appErr, ok := err.(*apperrors.AppError); ok {
				for _, violation := range appErr.FieldViolations {
					violation.Field = fmt.Sprintf("requests[%d].task.%s", i, violation.Field)
					violations = append(violations, violation)
				}
			}
		}
	}

	// Check for duplicate titles
	titleMap := make(map[string][]int)
	for i, createReq := range req.Requests {
		if createReq.Task != nil && createReq.Task.Title != "" {
			titleMap[createReq.Task.Title] = append(titleMap[createReq.Task.Title], i)
		}
	}

	for title, indices := range titleMap {
		if len(indices) > 1 {
			for _, idx := range indices {
				violations = append(violations, &errorspb.FieldViolation{
					Field:       fmt.Sprintf("requests[%d].task.title", idx),
					Code:        errorspb.AppErrorCode_DUPLICATE_TITLE.String(),
					Description: fmt.Sprintf("Title '%s' is used by multiple tasks in the batch", title),
				})
			}
		}
	}

	if len(violations) > 0 {
		return apperrors.NewValidationFailed(violations, traceID)
	}

	return nil
}

// Helper functions
func formatFieldPath(fieldPath *validate.FieldPath) string {
	if fieldPath == nil {
		return ""
	}

	// Build field path from elements
	var parts []string
	for _, element := range fieldPath.GetElements() {
		if element.GetFieldName() != "" {
			parts = append(parts, element.GetFieldName())
		} else if element.GetFieldNumber() != 0 {
			parts = append(parts, fmt.Sprintf("field_%d", element.GetFieldNumber()))
		}
	}

	return strings.Join(parts, ".")
}

func mapConstraintToCode(ruleId string) string {
	switch {
	case strings.Contains(ruleId, "required"):
		return errorspb.AppErrorCode_REQUIRED_FIELD.String()
	case strings.Contains(ruleId, "min_len"):
		return errorspb.AppErrorCode_TOO_SHORT.String()
	case strings.Contains(ruleId, "max_len"):
		return errorspb.AppErrorCode_TOO_LONG.String()
	case strings.Contains(ruleId, "pattern"):
		return errorspb.AppErrorCode_INVALID_FORMAT.String()
	case strings.Contains(ruleId, "gt_now"):
		return errorspb.AppErrorCode_MUST_BE_FUTURE.String()
	case ruleId == "":
		return errorspb.AppErrorCode_VALIDATION_FAILED.String()
	default:
		return errorspb.AppErrorCode_INVALID_VALUE.String()
	}
}

var validTagPattern = regexp.MustCompile(`^[a-z0-9-]+$`)

func isValidTag(tag string) bool {
	return len(tag) <= 50 && validTagPattern.MatchString(tag)
}

Error Handler Middleware

Now let’s create middleware to handle errors consistently:

package middleware

import (
	"context"
	"errors"
	"log"

	apperrors "github.com/bhatti/todo-api-errors/internal/errors"
	"google.golang.org/grpc"
	"google.golang.org/grpc/status"
)

// UnaryErrorInterceptor translates application errors into gRPC statuses.
func UnaryErrorInterceptor(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
	resp, err := handler(ctx, req)
	if err == nil {
		return resp, nil
	}

	var appErr *apperrors.AppError
	if errors.As(err, &appErr) {
		if appErr.CausedBy != nil {
			log.Printf("ERROR: %s, Original cause: %v", appErr.Title, appErr.CausedBy)
		}
		return nil, appErr.ToGRPCStatus().Err()
	}

	if _, ok := status.FromError(err); ok {
		return nil, err // Already a gRPC status
	}

	log.Printf("UNEXPECTED ERROR: %v", err)
	return nil, apperrors.NewInternal("An unexpected error occurred", "", err).ToGRPCStatus().Err()
}

package middleware

import (
	"context"
	"encoding/json"
	"net/http"
	"runtime/debug"
	"time"

	errorspb "github.com/bhatti/todo-api-errors/api/proto/errors/v1"
	apperrors "github.com/bhatti/todo-api-errors/internal/errors"
	"github.com/google/uuid"
	"github.com/grpc-ecosystem/grpc-gateway/v2/runtime"
	"go.opentelemetry.io/otel/trace"
	"google.golang.org/grpc/status"
	"google.golang.org/protobuf/encoding/protojson"
)

// HTTPErrorHandler handles errors for HTTP endpoints
func HTTPErrorHandler(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		// Add trace ID to context
		traceID := r.Header.Get("X-Trace-ID")
		if traceID == "" {
			traceID = uuid.New().String()
		}
		ctx := context.WithValue(r.Context(), "traceID", traceID)
		r = r.WithContext(ctx)

		// Create response wrapper to intercept errors
		wrapped := &responseWriter{
			ResponseWriter: w,
			request:        r,
			traceID:        traceID,
		}

		// Handle panics
		defer func() {
			if err := recover(); err != nil {
				handlePanic(wrapped, err)
			}
		}()

		// Process request
		next.ServeHTTP(wrapped, r)
	})
}

// responseWriter wraps http.ResponseWriter to intercept errors
type responseWriter struct {
	http.ResponseWriter
	request    *http.Request
	traceID    string
	statusCode int
	written    bool
}

func (w *responseWriter) WriteHeader(code int) {
	if !w.written {
		w.statusCode = code
		w.ResponseWriter.WriteHeader(code)
		w.written = true
	}
}

func (w *responseWriter) Write(b []byte) (int, error) {
	if !w.written {
		w.WriteHeader(http.StatusOK)
	}
	return w.ResponseWriter.Write(b)
}

// handlePanic converts panics to proper error responses
func handlePanic(w *responseWriter, recovered interface{}) {
	// Log stack trace
	debug.PrintStack()

	appErr := apperrors.NewInternal("An unexpected error occurred. Please try again later.", w.traceID, nil)
	writeErrorResponse(w, appErr)
}

// CustomHTTPError handles gRPC gateway error responses
func CustomHTTPError(ctx context.Context, mux *runtime.ServeMux,
	marshaler runtime.Marshaler, w http.ResponseWriter, r *http.Request, err error) {

	// Extract trace ID
	traceID := r.Header.Get("X-Trace-ID")
	if traceID == "" {
		if span := trace.SpanFromContext(ctx); span.SpanContext().IsValid() {
			traceID = span.SpanContext().TraceID().String()
		} else {
			traceID = uuid.New().String()
		}
	}

	// Convert gRPC error to HTTP response
	st, _ := status.FromError(err)

	// Check if we have our custom error detail in status details
	for _, detail := range st.Details() {
		if errorDetail, ok := detail.(*errorspb.ErrorDetail); ok {
			// Update the error detail with current request context
			errorDetail.TraceId = traceID
			errorDetail.Instance = r.URL.Path

			// Convert to JSON and write response
			w.Header().Set("Content-Type", "application/problem+json")
			w.WriteHeader(runtime.HTTPStatusFromCode(st.Code()))

			// Create a simplified JSON response that matches RFC 7807
			response := map[string]interface{}{
				"type":      getTypeForCode(errorDetail.Code),
				"title":     errorDetail.Title,
				"status":    runtime.HTTPStatusFromCode(st.Code()),
				"detail":    errorDetail.Detail,
				"instance":  errorDetail.Instance,
				"traceId":   errorDetail.TraceId,
				"timestamp": errorDetail.Timestamp,
			}

			// Add field violations if present
			if len(errorDetail.FieldViolations) > 0 {
				violations := make([]map[string]interface{}, len(errorDetail.FieldViolations))
				for i, fv := range errorDetail.FieldViolations {
					violations[i] = map[string]interface{}{
						"field":   fv.Field,
						"code":    fv.Code,
						"message": fv.Description,
					}
				}
				response["errors"] = violations
			}

			// Add extensions if present
			if len(errorDetail.Extensions) > 0 {
				extensions := make(map[string]interface{})
				for k, v := range errorDetail.Extensions {
					// Convert Any to JSON
					if jsonBytes, err := protojson.Marshal(v); err == nil {
						var jsonData interface{}
						if err := json.Unmarshal(jsonBytes, &jsonData); err == nil {
							extensions[k] = jsonData
						}
					}
				}
				if len(extensions) > 0 {
					response["extensions"] = extensions
				}
			}

			if err := json.NewEncoder(w).Encode(response); err != nil {
				http.Error(w, `{"error": "Failed to encode error response"}`, 500)
			}
			return
		}
	}

	// Fallback: create new error response
	fallbackErr := apperrors.NewInternal(st.Message(), traceID, nil)
	fallbackErr.GRPCCode = st.Code()
	writeAppErrorResponse(w, fallbackErr, r.URL.Path)
}

// Helper functions
func getTypeForCode(code string) string {
	switch code {
	case errorspb.AppErrorCode_VALIDATION_FAILED.String():
		return "https://api.example.com/errors/validation-failed"
	case errorspb.AppErrorCode_RESOURCE_NOT_FOUND.String():
		return "https://api.example.com/errors/resource-not-found"
	case errorspb.AppErrorCode_RESOURCE_CONFLICT.String():
		return "https://api.example.com/errors/resource-conflict"
	case errorspb.AppErrorCode_PERMISSION_DENIED.String():
		return "https://api.example.com/errors/permission-denied"
	case errorspb.AppErrorCode_INTERNAL_ERROR.String():
		return "https://api.example.com/errors/internal-error"
	case errorspb.AppErrorCode_SERVICE_UNAVAILABLE.String():
		return "https://api.example.com/errors/service-unavailable"
	default:
		return "https://api.example.com/errors/unknown"
	}
}

func writeErrorResponse(w http.ResponseWriter, err error) {
	if appErr, ok := err.(*apperrors.AppError); ok {
		writeAppErrorResponse(w, appErr, "")
	} else {
		http.Error(w, err.Error(), http.StatusInternalServerError)
	}
}

func writeAppErrorResponse(w http.ResponseWriter, appErr *apperrors.AppError, instance string) {
	statusCode := runtime.HTTPStatusFromCode(appErr.GRPCCode)

	response := map[string]interface{}{
		"type":      getTypeForCode(appErr.AppCode.String()),
		"title":     appErr.Title,
		"status":    statusCode,
		"detail":    appErr.Detail,
		"traceId":   appErr.TraceID,
		"timestamp": time.Now(),
	}

	if instance != "" {
		response["instance"] = instance
	}

	if len(appErr.FieldViolations) > 0 {
		violations := make([]map[string]interface{}, len(appErr.FieldViolations))
		for i, fv := range appErr.FieldViolations {
			violations[i] = map[string]interface{}{
				"field":   fv.Field,
				"code":    fv.Code,
				"message": fv.Description,
			}
		}
		response["errors"] = violations
	}

	w.Header().Set("Content-Type", "application/problem+json")
	w.WriteHeader(statusCode)
	json.NewEncoder(w).Encode(response)
}

Service Implementation

Now let’s implement our TODO service with proper error handling:

package service

import (
	"context"
	"fmt"
	todopb "github.com/bhatti/todo-api-errors/api/proto/todo/v1"
	"github.com/bhatti/todo-api-errors/internal/errors"
	"github.com/bhatti/todo-api-errors/internal/repository"
	"github.com/bhatti/todo-api-errors/internal/validation"
	"github.com/google/uuid"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/trace"
	"google.golang.org/protobuf/types/known/fieldmaskpb"
	"google.golang.org/protobuf/types/known/timestamppb"
	"strings"
)

var tracer = otel.Tracer("todo-service")

// TodoService implements the TODO API
type TodoService struct {
	todopb.UnimplementedTodoServiceServer
	repo repository.TodoRepository
}

// NewTodoService creates a new TODO service
func NewTodoService(repo repository.TodoRepository) (*TodoService, error) {
	return &TodoService{
		repo: repo,
	}, nil
}

// CreateTask creates a new task
func (s *TodoService) CreateTask(ctx context.Context, req *todopb.CreateTaskRequest) (*todopb.Task, error) {
	ctx, span := tracer.Start(ctx, "CreateTask")
	defer span.End()

	// Get trace ID for error responses
	traceID := span.SpanContext().TraceID().String()

	// Validate request
	if req.Task == nil {
		return nil, errors.NewRequiredField("task", "Task object is required", traceID)
	}

	// Validate task fields using the new validation package
	if err := validation.ValidateTask(req.Task, traceID); err != nil {
		span.SetAttributes(attribute.String("validation.error", err.Error()))
		return nil, err
	}

	// Check for duplicate title
	existing, err := s.repo.GetTaskByTitle(ctx, req.Task.Title)
	if err != nil && !repository.IsNotFound(err) {
		span.RecordError(err)
		return nil, s.handleRepositoryError(err, traceID)
	}

	if existing != nil {
		return nil, errors.NewConflict("task", "A task with this title already exists", traceID)
	}

	// Generate task ID
	taskID := uuid.New().String()
	task := &todopb.Task{
		Name:        fmt.Sprintf("tasks/%s", taskID),
		Title:       req.Task.Title,
		Description: req.Task.Description,
		Status:      req.Task.Status,
		Priority:    req.Task.Priority,
		DueDate:     req.Task.DueDate,
		Tags:        req.Task.Tags,
		CreateTime:  timestamppb.Now(),
		UpdateTime:  timestamppb.Now(),
		CreatedBy:   s.getUserFromContext(ctx),
	}

	// Set defaults
	if task.Status == todopb.Status_STATUS_UNSPECIFIED {
		task.Status = todopb.Status_STATUS_PENDING
	}
	if task.Priority == todopb.Priority_PRIORITY_UNSPECIFIED {
		task.Priority = todopb.Priority_PRIORITY_MEDIUM
	}

	// Save to repository
	if err := s.repo.CreateTask(ctx, task); err != nil {
		span.RecordError(err)
		return nil, s.handleRepositoryError(err, traceID)
	}

	span.SetAttributes(
		attribute.String("task.id", taskID),
		attribute.String("task.title", task.Title),
	)

	return task, nil
}

// GetTask retrieves a specific task
func (s *TodoService) GetTask(ctx context.Context, req *todopb.GetTaskRequest) (*todopb.Task, error) {
	ctx, span := tracer.Start(ctx, "GetTask")
	defer span.End()

	traceID := span.SpanContext().TraceID().String()

	// Validate request using the new validation package
	if err := validation.ValidateRequest(req, traceID); err != nil {
		return nil, err
	}

	// Extract task ID
	parts := strings.Split(req.Name, "/")
	if len(parts) != 2 || parts[0] != "tasks" {
		return nil, errors.NewRequiredField("name", "Task name must be in format 'tasks/{id}'", traceID)
	}

	taskID := parts[1]
	span.SetAttributes(attribute.String("task.id", taskID))

	// Get from repository
	task, err := s.repo.GetTask(ctx, taskID)
	if err != nil {
		if repository.IsNotFound(err) {
			return nil, errors.NewNotFound("Task", taskID, traceID)
		}
		span.RecordError(err)
		return nil, s.handleRepositoryError(err, traceID)
	}

	// Check permissions
	if !s.canAccessTask(ctx, task) {
		return nil, errors.NewPermissionDenied("task", "read", traceID)
	}

	return task, nil
}

// ListTasks retrieves all tasks
func (s *TodoService) ListTasks(ctx context.Context, req *todopb.ListTasksRequest) (*todopb.ListTasksResponse, error) {
	ctx, span := tracer.Start(ctx, "ListTasks")
	defer span.End()

	traceID := span.SpanContext().TraceID().String()

	// Validate request using the new validation package
	if err := validation.ValidateRequest(req, traceID); err != nil {
		return nil, err
	}

	// Default page size
	pageSize := req.PageSize
	if pageSize == 0 {
		pageSize = 50
	}
	if pageSize > 1000 {
		pageSize = 1000
	}

	span.SetAttributes(
		attribute.Int("page.size", int(pageSize)),
		attribute.String("filter", req.Filter),
	)

	// Parse filter
	filter, err := s.parseFilter(req.Filter)
	if err != nil {
		return nil, errors.NewRequiredField("filter", fmt.Sprintf("Failed to parse filter: %v", err), traceID)
	}

	// Get tasks from repository
	tasks, nextPageToken, err := s.repo.ListTasks(ctx, repository.ListOptions{
		PageSize:  int(pageSize),
		PageToken: req.PageToken,
		Filter:    filter,
		OrderBy:   req.OrderBy,
		UserID:    s.getUserFromContext(ctx),
	})

	if err != nil {
		span.RecordError(err)
		return nil, s.handleRepositoryError(err, traceID)
	}

	// Get total count
	totalSize, err := s.repo.CountTasks(ctx, filter, s.getUserFromContext(ctx))
	if err != nil {
		// Log but don't fail the request
		span.RecordError(err)
		totalSize = -1
	}

	return &todopb.ListTasksResponse{
		Tasks:         tasks,
		NextPageToken: nextPageToken,
		TotalSize:     int32(totalSize),
	}, nil
}

// UpdateTask updates an existing task
func (s *TodoService) UpdateTask(ctx context.Context, req *todopb.UpdateTaskRequest) (*todopb.Task, error) {
	ctx, span := tracer.Start(ctx, "UpdateTask")
	defer span.End()

	traceID := span.SpanContext().TraceID().String()

	// Validate request
	if req.Task == nil {
		return nil, errors.NewRequiredField("task", "Task object is required", traceID)
	}

	if req.UpdateMask == nil || len(req.UpdateMask.Paths) == 0 {
		return nil, errors.NewRequiredField("update_mask", "Update mask must specify which fields to update", traceID)
	}

	// Extract task ID
	parts := strings.Split(req.Task.Name, "/")
	if len(parts) != 2 || parts[0] != "tasks" {
		return nil, errors.NewRequiredField("task.name", "Invalid task name format", traceID)
	}

	taskID := parts[1]
	span.SetAttributes(attribute.String("task.id", taskID))

	// Get existing task
	existing, err := s.repo.GetTask(ctx, taskID)
	if err != nil {
		if repository.IsNotFound(err) {
			return nil, errors.NewNotFound("Task", taskID, traceID)
		}
		return nil, s.handleRepositoryError(err, traceID)
	}

	// Check permissions
	if !s.canModifyTask(ctx, existing) {
		return nil, errors.NewPermissionDenied("task", "update", traceID)
	}

	// Apply updates based on field mask
	updated := s.applyFieldMask(existing, req.Task, req.UpdateMask)
	updated.UpdateTime = timestamppb.Now()

	// Validate updated task using the new validation package
	if err := validation.ValidateTask(updated, traceID); err != nil {
		return nil, err
	}

	// Save to repository
	if err := s.repo.UpdateTask(ctx, updated); err != nil {
		span.RecordError(err)
		return nil, s.handleRepositoryError(err, traceID)
	}

	return updated, nil
}

// DeleteTask removes a task
func (s *TodoService) DeleteTask(ctx context.Context, req *todopb.DeleteTaskRequest) (*todopb.DeleteTaskResponse, error) {
	ctx, span := tracer.Start(ctx, "DeleteTask")
	defer span.End()

	traceID := span.SpanContext().TraceID().String()

	// Validate request using the new validation package
	if err := validation.ValidateRequest(req, traceID); err != nil {
		return nil, err
	}

	// Extract task ID
	parts := strings.Split(req.Name, "/")
	if len(parts) != 2 || parts[0] != "tasks" {
		return nil, errors.NewRequiredField("name", "Invalid task name format", traceID)
	}

	taskID := parts[1]
	span.SetAttributes(attribute.String("task.id", taskID))

	// Get existing task to check permissions
	existing, err := s.repo.GetTask(ctx, taskID)
	if err != nil {
		if repository.IsNotFound(err) {
			return nil, errors.NewNotFound("Task", taskID, traceID)
		}
		return nil, s.handleRepositoryError(err, traceID)
	}

	// Check permissions
	if !s.canModifyTask(ctx, existing) {
		return nil, errors.NewPermissionDenied("task", "delete", traceID)
	}

	// Delete from repository
	if err := s.repo.DeleteTask(ctx, taskID); err != nil {
		span.RecordError(err)
		return nil, s.handleRepositoryError(err, traceID)
	}

	return &todopb.DeleteTaskResponse{
		Message: fmt.Sprintf("Task %s deleted successfully", req.Name),
	}, nil
}

// BatchCreateTasks creates multiple tasks at once
func (s *TodoService) BatchCreateTasks(ctx context.Context, req *todopb.BatchCreateTasksRequest) (*todopb.BatchCreateTasksResponse, error) {
	ctx, span := tracer.Start(ctx, "BatchCreateTasks")
	defer span.End()

	traceID := span.SpanContext().TraceID().String()

	// Validate batch request using the new validation package
	if err := validation.ValidateBatchCreateTasks(req, traceID); err != nil {
		span.SetAttributes(attribute.String("validation.error", err.Error()))
		return nil, err
	}

	// Process each task
	var created []*todopb.Task
	var batchErrors []string

	for i, createReq := range req.Requests {
		task, err := s.CreateTask(ctx, createReq)
		if err != nil {
			// Collect errors for batch response
			batchErrors = append(batchErrors, fmt.Sprintf("Task %d: %s", i, err.Error()))
			continue
		}
		created = append(created, task)
	}

	// If all tasks failed, return error
	if len(created) == 0 && len(batchErrors) > 0 {
		return nil, errors.NewInternal("All batch operations failed", traceID, nil)
	}

	// Return partial success
	response := &todopb.BatchCreateTasksResponse{
		Tasks: created,
	}

	// Add partial errors to response metadata if any
	if len(batchErrors) > 0 {
		span.SetAttributes(
			attribute.Int("batch.total", len(req.Requests)),
			attribute.Int("batch.success", len(created)),
			attribute.Int("batch.failed", len(batchErrors)),
		)
	}

	return response, nil
}

// Helper methods

func (s *TodoService) handleRepositoryError(err error, traceID string) error {
	if repository.IsConnectionError(err) {
		return errors.NewServiceUnavailable("Unable to connect to the database. Please try again later.", traceID)
	}

	// Log internal error details
	span := trace.SpanFromContext(context.Background())
	if span != nil {
		span.RecordError(err)
	}

	return errors.NewInternal("An unexpected error occurred while processing your request", traceID, err)
}

func (s *TodoService) getUserFromContext(ctx context.Context) string {
	// In a real implementation, this would extract user info from auth context
	if user, ok := ctx.Value("user").(string); ok {
		return user
	}
	return "anonymous"
}

func (s *TodoService) canAccessTask(ctx context.Context, task *todopb.Task) bool {
	// In a real implementation, check if user can access this task
	user := s.getUserFromContext(ctx)
	return user == task.CreatedBy || user == "admin"
}

func (s *TodoService) canModifyTask(ctx context.Context, task *todopb.Task) bool {
	// In a real implementation, check if user can modify this task
	user := s.getUserFromContext(ctx)
	return user == task.CreatedBy || user == "admin"
}

func (s *TodoService) parseFilter(filter string) (map[string]interface{}, error) {
	// Simple filter parser - in production, use a proper parser
	parsed := make(map[string]interface{})

	if filter == "" {
		return parsed, nil
	}

	// Example: "status=COMPLETED AND priority=HIGH"
	parts := strings.Split(filter, " AND ")
	for _, part := range parts {
		kv := strings.Split(strings.TrimSpace(part), "=")
		if len(kv) != 2 {
			return nil, fmt.Errorf("invalid filter expression: %s", part)
		}

		key := strings.TrimSpace(kv[0])
		value := strings.Trim(strings.TrimSpace(kv[1]), "'\"")

		// Validate filter keys
		switch key {
		case "status", "priority", "created_by":
			parsed[key] = value
		default:
			return nil, fmt.Errorf("unknown filter field: %s", key)
		}
	}

	return parsed, nil
}

func (s *TodoService) applyFieldMask(existing, update *todopb.Task, mask *fieldmaskpb.FieldMask) *todopb.Task {
	result := *existing

	for _, path := range mask.Paths {
		switch path {
		case "title":
			result.Title = update.Title
		case "description":
			result.Description = update.Description
		case "status":
			result.Status = update.Status
		case "priority":
			result.Priority = update.Priority
		case "due_date":
			result.DueDate = update.DueDate
		case "tags":
			result.Tags = update.Tags
		}
	}
	return &result
}

Server Implementation

Now let’s put it all together in our server:

package main

import (
	"context"
	"fmt"
	"log"
	"net"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"

	todopb "github.com/bhatti/todo-api-errors/api/proto/todo/v1"
	"github.com/bhatti/todo-api-errors/internal/middleware"
	"github.com/bhatti/todo-api-errors/internal/monitoring"
	"github.com/bhatti/todo-api-errors/internal/repository"
	"github.com/bhatti/todo-api-errors/internal/service"

	"github.com/grpc-ecosystem/grpc-gateway/v2/runtime"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"google.golang.org/grpc"
	"google.golang.org/grpc/codes"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/grpc/reflection"
	"google.golang.org/grpc/status"
	"google.golang.org/protobuf/encoding/protojson"
)

func main() {
	// Initialize monitoring
	if err := monitoring.InitOpenTelemetryMetrics(); err != nil {
		log.Printf("Failed to initialize OpenTelemetry metrics: %v", err)
		// Continue without OpenTelemetry - Prometheus will still work
	}

	// Initialize repository
	repo := repository.NewInMemoryRepository()

	// Initialize service
	todoService, err := service.NewTodoService(repo)
	if err != nil {
		log.Fatalf("Failed to create service: %v", err)
	}

	// Start gRPC server
	grpcPort := ":50051"
	go func() {
		if err := startGRPCServer(grpcPort, todoService); err != nil {
			log.Fatalf("Failed to start gRPC server: %v", err)
		}
	}()

	// Start HTTP gateway
	httpPort := ":8080"
	go func() {
		if err := startHTTPGateway(httpPort, grpcPort); err != nil {
			log.Fatalf("Failed to start HTTP gateway: %v", err)
		}
	}()

	// Start metrics server
	go func() {
		http.Handle("/metrics", promhttp.Handler())
		if err := http.ListenAndServe(":9090", nil); err != nil {
			log.Printf("Failed to start metrics server: %v", err)
		}
	}()

	log.Printf("TODO API server started")
	log.Printf("gRPC server listening on %s", grpcPort)
	log.Printf("HTTP gateway listening on %s", httpPort)
	log.Printf("Metrics available at :9090/metrics")

	// Wait for interrupt signal
	sigCh := make(chan os.Signal, 1)
	signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
	<-sigCh

	log.Println("Shutting down...")
}

func startGRPCServer(port string, todoService todopb.TodoServiceServer) error {
	lis, err := net.Listen("tcp", port)
	if err != nil {
		return fmt.Errorf("failed to listen: %w", err)
	}

	// Create gRPC server with interceptors - now using the new UnaryErrorInterceptor
	opts := []grpc.ServerOption{
		grpc.ChainUnaryInterceptor(
			middleware.UnaryErrorInterceptor, // Using new protobuf-based error interceptor
			loggingInterceptor(),
			recoveryInterceptor(),
		),
	}

	server := grpc.NewServer(opts...)

	// Register service
	todopb.RegisterTodoServiceServer(server, todoService)

	// Register reflection for debugging
	reflection.Register(server)

	return server.Serve(lis)
}

func startHTTPGateway(httpPort, grpcPort string) error {
	ctx := context.Background()

	// Create gRPC connection
	conn, err := grpc.DialContext(
		ctx,
		"localhost"+grpcPort,
		grpc.WithTransportCredentials(insecure.NewCredentials()),
	)
	if err != nil {
		return fmt.Errorf("failed to dial gRPC server: %w", err)
	}

	// Create gateway mux with custom error handler
	mux := runtime.NewServeMux(
		runtime.WithErrorHandler(middleware.CustomHTTPError), // Using new protobuf-based error handler
		runtime.WithMarshalerOption(runtime.MIMEWildcard, &runtime.JSONPb{
			MarshalOptions: protojson.MarshalOptions{
				UseProtoNames:   true,
				EmitUnpopulated: false,
			},
			UnmarshalOptions: protojson.UnmarshalOptions{
				DiscardUnknown: true,
			},
		}),
	)

	// Register service handler
	if err := todopb.RegisterTodoServiceHandler(ctx, mux, conn); err != nil {
		return fmt.Errorf("failed to register service handler: %w", err)
	}

	// Create HTTP server with middleware
	handler := middleware.HTTPErrorHandler( // Using new protobuf-based HTTP error handler
		corsMiddleware(
			authMiddleware(
				loggingHTTPMiddleware(mux),
			),
		),
	)

	server := &http.Server{
		Addr:         httpPort,
		Handler:      handler,
		ReadTimeout:  10 * time.Second,
		WriteTimeout: 10 * time.Second,
		IdleTimeout:  120 * time.Second,
	}

	return server.ListenAndServe()
}

// Middleware implementations

func loggingInterceptor() grpc.UnaryServerInterceptor {
	return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
		start := time.Now()

		// Call handler
		resp, err := handler(ctx, req)

		// Log request
		duration := time.Since(start)
		statusCode := "OK"
		if err != nil {
			statusCode = status.Code(err).String()
		}

		log.Printf("gRPC: %s %s %s %v", info.FullMethod, statusCode, duration, err)

		return resp, err
	}
}

func recoveryInterceptor() grpc.UnaryServerInterceptor {
	return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) {
		defer func() {
			if r := recover(); r != nil {
				log.Printf("Recovered from panic: %v", r)
				monitoring.RecordPanicRecovery(ctx)
				err = status.Error(codes.Internal, "Internal server error")
			}
		}()

		return handler(ctx, req)
	}
}

func loggingHTTPMiddleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()

		// Wrap response writer to capture status
		wrapped := &statusResponseWriter{ResponseWriter: w, statusCode: http.StatusOK}

		// Process request
		next.ServeHTTP(wrapped, r)

		// Log request
		duration := time.Since(start)
		log.Printf("HTTP: %s %s %d %v", r.Method, r.URL.Path, wrapped.statusCode, duration)
	})
}

func corsMiddleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Access-Control-Allow-Origin", "*")
		w.Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS, PATCH")
		w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization, X-Trace-ID")

		if r.Method == "OPTIONS" {
			w.WriteHeader(http.StatusOK)
			return
		}

		next.ServeHTTP(w, r)
	})
}

func authMiddleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		// Simple auth for demo - in production use proper authentication
		authHeader := r.Header.Get("Authorization")
		if authHeader == "" {
			authHeader = "Bearer anonymous"
		}

		// Extract user from token
		user := "anonymous"
		if len(authHeader) > 7 && authHeader[:7] == "Bearer " {
			user = authHeader[7:]
		}

		// Add user to context
		ctx := context.WithValue(r.Context(), "user", user)
		next.ServeHTTP(w, r.WithContext(ctx))
	})
}

type statusResponseWriter struct {
	http.ResponseWriter
	statusCode int
}

func (w *statusResponseWriter) WriteHeader(code int) {
	w.statusCode = code
	w.ResponseWriter.WriteHeader(code)
}

Example API Usage

Let’s see our error handling in action with some example requests:

Example 1: Validation Error with Multiple Issues

Request with multiple validation errors

curl -X POST http://localhost:8080/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"task": {
"title": "",
"description": "This description is wayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy too long…",
"status": "INVALID_STATUS",
"tags": ["INVALID TAG", "tag-1", "tag-1"]
}
}'

Response

< HTTP/1.1 422 Unprocessable Entity
< Content-Type: application/problem+json
{
  "detail": "The request contains 5 validation errors",
  "errors": [
    {
      "code": "TOO_SHORT",
      "field": "title",
      "message": "value length must be at least 1 characters"
    },
    {
      "code": "TOO_LONG",
      "field": "description",
      "message": "value length must be at most 100 characters"
    },
    {
      "code": "INVALID_FORMAT",
      "field": "tags",
      "message": "value does not match regex pattern `^[a-z0-9-]+$`"
    },
    {
      "code": "INVALID_TAG_FORMAT",
      "field": "tags[0]",
      "message": "Tag 'INVALID TAG' must be lowercase letters, numbers, and hyphens only"
    },
    {
      "code": "DUPLICATE_TAG",
      "field": "tags[2]",
      "message": "Tag 'tag-1' appears multiple times"
    }
  ],
  "instance": "/v1/tasks",
  "status": 400,
  "timestamp": {
    "seconds": 1755288524,
    "nanos": 484865000
  },
  "title": "Validation Failed",
  "traceId": "eb4bfb3f-9397-4547-8618-ce9952a16067",
  "type": "https://api.example.com/errors/validation-failed"
}

Example 2: Not Found Error

Request for non-existent task

curl http://localhost:8080/v1/tasks/non-existent-id

Response

< HTTP/1.1 404 Not Found
< Content-Type: application/problem+json
{
  "detail": "Task with ID 'non-existent-id' was not found.",
  "instance": "/v1/tasks/non-existent-id",
  "status": 404,
  "timestamp": {
    "seconds": 1755288565,
    "nanos": 904607000
  },
  "title": "Resource Not Found",
  "traceId": "6ce00cd8-d0b7-47f1-b6f6-9fc1375c26a4",
  "type": "https://api.example.com/errors/resource-not-found"
}

Example 3: Conflict Error

curl -X POST http://localhost:8080/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"task": {
"title": "Existing Task Title"
}
}'

curl -X POST http://localhost:8080/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"task": {
"title": "Existing Task Title"
}
}'

Response

< HTTP/1.1 409 Conflict
< Content-Type: application/problem+json
{
  "detail": "Conflict creating task: A task with this title already exists",
  "instance": "/v1/tasks",
  "status": 409,
  "timestamp": {
    "seconds": 1755288593,
    "nanos": 594458000
  },
  "title": "Resource Conflict",
  "traceId": "ed2e78d2-591d-492a-8d71-6b6843ce86f7",
  "type": "https://api.example.com/errors/resource-conflict"
}

Example 4: Service Unavailable (Transient Error)

When database is down

curl http://localhost:8080/v1/tasks

Response

HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 30
{
  "type": "https://api.example.com/errors/service-unavailable",
  "title": "Service Unavailable",
  "status": 503,
  "detail": "Database connection pool exhausted. Please try again later.",
  "instance": "/v1/tasks",
  "traceId": "db-pool-001",
  "timestamp": "2025-08-15T10:30:00Z",
  "extensions": {
    "retryable": true,
    "retryAfter": "2025-08-15T10:30:30Z",
    "maxRetries": 3,
    "backoffType": "exponential",
    "backoffMs": 1000,
    "errorCategory": "database"
  }
}

Best Practices Summary

Our implementation demonstrates several key best practices:

1. Consistent Error Format

All errors follow RFC 9457 (Problem Details) format, providing:

Machine-readable type URIs
Human-readable titles and details
HTTP status codes
Request tracing
Extensible metadata

2. Comprehensive Validation

All validation errors are returned at once, not one by one
Clear field paths for nested objects
Descriptive error codes and messages
Support for batch operations with partial success

3. Security-Conscious Design

No sensitive information in error messages
Internal errors are logged but not exposed
Generic messages for authentication failures
Request IDs for support without exposing internals

4. Developer Experience

Clear, actionable error messages
Helpful suggestions for fixing issues
Consistent error codes across protocols
Rich metadata for debugging

5. Protocol Compatibility

Seamless translation between gRPC and HTTP
Proper status code mapping
Preservation of error details across protocols

6. Observability

Structured logging with trace IDs
Prometheus metrics for monitoring
OpenTelemetry integration
Error categorization for analysis

Conclusion

This comprehensive guide demonstrates how to build robust error handling for modern APIs. By treating errors as a first-class feature of our API, we’ve achieved several key benefits:

Consistency: All errors, regardless of their source, are presented to clients in a predictable format.
Clarity: Developers consuming our API get clear, actionable feedback, helping them debug and integrate faster.
Developer Ergonomics: Our internal service code is cleaner, as handlers focus on business logic while the middleware handles the boilerplate of error conversion.
Security: We have a clear separation between internal error details (for logging) and public error responses, preventing leaks.

Shahzad Bhatti Welcome to my ramblings and rants!

August 15, 2025