Bobby Mathews Blog

Rust Systems Engineer turning volatile compute into reliable capacity

What I Help With

Cut AI Infrastructure Costs by 70%

Orchestrate spot instances with production-grade reliability. Same performance, fraction of the cost. Proven via simulation, validated in production.

Build Systems That Don't Break

Production-hardened distributed infrastructure. Formal verification where it matters. Diagnosed and resolved critical issues including 40% packet loss elimination.

Make GPUs Go Further

Custom kernels and optimized serving engines. Get more inference from the same hardware through direct CUDA integration.

See Synkti → (70% cost reduction for AI inference)

Selected Work

synkti Phase 1 Complete

In plain English: Run AI models on cheap interruptible cloud servers without interruption. Save 70-80% on infrastructure.

synapse

In plain English: Custom GPU kernels that squeeze maximum performance from hardware.

axon Exploratory

In plain English: Routing layer for LLM inference across clouds and chip types. Becomes valuable when Synkti goes multi-cloud. Waiting for usage patterns to validate the abstraction.

Get in Touch

Evaluating AI infrastructure cost reduction? I can run your workload through the simulator and show you expected savings with concrete numbers. I can share the simulator, cost model, and assumptions used to derive these numbers.

GitHub LinkedIn Email

TL;DR: Infrastructure should be application-aware, statically typed, and predictive. Generic orchestration fails because it ignores workload-specific dynamics. I'm designing type-theory-based cloud primitives that let applications deploy themselves.

Long-Term Vision: Autonomous Infrastructure

I'm building toward distributed systems that manage themselves: self-aware, self-regulating, self-scaling, self-healing, self-governing.

Norbert Wiener described cybernetic feedback systems in 1948. IBM's Autonomic Computing manifesto (2001) outlined the same goal. Kubernetes got partway there with reconciliation loops. DePIN adds economic self-governance.

The vision keeps recurring because it's the right direction. Execution is hard. I'm executing it for a specific domain where the value is concrete and measurable.

Why Kubernetes Failed (And My Approach)

Problem 1: Centralized Control Plane

Kubernetes complexity is a direct consequence of client-server architecture. One control plane (API server, etcd, scheduler, controller-manager) must coordinate all nodes. This creates: single point of failure, HA complexity (etcd quorum, leader election), scaling bottlenecks (all decisions through one brain), and operational burden (managing the control plane is a job).

My approach: P2P choreography. Each node runs its own orchestrator and coordinates with peers through discovery. No central brain to fail, scale, or operate. Like a flock of birds—no conductor, yet perfectly coordinated. The complexity of K8s evaporates when you remove the requirement for a single point of coordination.

Problem 2: The Reconciliation Tax (State Drift)

Kubernetes maintains a model of cluster state in etcd—a centralized representation of what the control plane believes is true. But the map is not the territory. Reality lives at the nodes, and the model is always an approximation, always slightly stale, always drifting.

This creates the reconciliation tax: continuous CPU cycles diffing desired vs actual state, network bandwidth syncing state to the center, exponential complexity handling edge cases (what if reconciliation itself fails?). The control plane is perpetually playing catch-up with a reality it can never fully observe. Network partitions turn drift into divergence. The model becomes fiction.

My approach: truth at the edge. In P2P architecture, each node IS the authoritative source of its own state. There is no central model to drift from reality—the state is distributed where the work happens. When you need to know a node's state, you ask the node. No reconciliation overhead. No stale cache. No fiction.

The deeper insight: if you encode all valid state transitions using type theory and mathematical analysis, invalid states become unrepresentable at compile time. The "model" becomes the code itself, which IS reality. No runtime reconciliation needed—the types prevent invalid states from existing. K8s fixes drift at runtime; we prevent it at compile time.

Problem 3: Workload Blindness

Kubernetes treats all pods as identical boxes. It knows CPU% and memory%, but not KV cache growth rates, request batch patterns, or token generation curves. Generic orchestration is mediocre at everything.

In practice, this shows up as OOM kills during peak token generation or late-night paging due to slow KV cache growth that K8s never saw coming.

My approach: application-aware autonomy. Every application type has characteristic stochastic patterns, extractable via DSP/FFT signal processing. Once you know the frequencies, you can predict ("memory pressure peaks in 47 minutes"), preempt ("scale up before daily traffic spike"), and diagnose ("this oscillation is anomalous").

Capacity Expansion, Not Cost Reduction

The industry fixates on cost savings; I focus on capacity expansion. By orchestrating spot instances and treating compute as a fluid resource, we amplify the total computational throughput available for inference and heavy workloads while simultaneously reducing expenditure.

A 70% cost reduction is actually a 3x capacity multiplier. The same budget buys three times more compute. This reframe matters: we're not making infrastructure cheaper—we're making previously impossible workloads possible. The constraint isn't money, it's capacity. Spot orchestration unlocks stranded compute that would otherwise sit idle.

Type-Theory-Based Cloud Primitives

Kubernetes uses untyped YAML. Errors surface at runtime, often at 3am. My approach: design type-theory-based cloud primitives. Each deployment pattern is a type. Workload requirements are types. Infrastructure capabilities are types. Mismatches are caught statically, before deployment.

The orchestrator ships with the application as a library, not as a separate system the application is deployed onto. Static analysis deduces workload requirements from the code itself. Applied category theory provides the mathematical foundation for mapping between workload types and infrastructure types.

The meta-shift: Traditional infrastructure deploys applications (passive). In this model, applications become autonomous agents that navigate decentralized infrastructure, find suitable nodes, self-deploy, and self-serve. The application does not wait to be orchestrated. It orchestrates itself.

The Endgame: Decentralized Autonomy

Self-governing systems need trustless coordination. If no single entity should control scheduling decisions, then the settlement layer must be permissionless.

On-chain settlement (Solana) enables this: off-chain execution for fast operational decisions, on-chain coordination for disputes, payments, and reputation. The DePIN model applied to GPU orchestration.

Read the full decentralization thesis →

For a deeper dive into my methodology, background, and technical approach:

View Full Methodology →

Get in Touch

GitHub LinkedIn Email

TL;DR: Building toward decentralized autonomous infrastructure. Phase 1 validates the algorithms. Phase 2 proves production viability. Phase 3 removes the central operator via on-chain settlement.

The Vision

Infrastructure that manages itself: self-aware, self-healing, self-governing. Not another Kubernetes—application-aware orchestration that understands workload-specific patterns.

The logical endpoint: permissionless, decentralized operation. No single entity controls scheduling decisions. Trustless coordination via on-chain settlement.

Three-Phase Roadmap

✓ Phase 1: Research Prototype

Kuhn-Munkres optimal migration (7-46% improvement over naive)
Stateless failover with graceful draining
Discrete-event simulation engine
2,191 LOC Rust, 32 tests, 243-scenario validation

→ Phase 2: Production Orchestrator

AWS multi-region integration
Prognostics engine (ARIMA + FFT/DSP prediction)
Pilot validation with early adopters
European provider adapters (Hetzner, OVHcloud)

◦ Phase 3: Decentralized Settlement

Solana on-chain settlement layer
Trustless compute verification
Permissionless node participation
Economic self-governance (DePIN model)

Why Solana

The Problem with Centralized Orchestration

Current orchestration systems require trust in a central operator. Scheduling decisions, compute verification, and dispute resolution all flow through a single point of control. This creates vendor lock-in at the orchestration layer—the very infrastructure meant to provide flexibility.

P2P Architecture: No Central Controller

Synkti's architecture is peer-to-peer from day one. Each node runs its own orchestrator, discovers peers dynamically, and makes autonomous decisions. Self-aware, self-monitoring, self-healing. No central control plane to fail, scale, or pay for. This P2P foundation makes the transition to decentralized networks natural—only the discovery layer changes (EC2 tags → libp2p).

The Solution: On-Chain Settlement

Separate execution from settlement. Off-chain execution handles fast operational decisions—migrations, failovers, scaling. On-chain settlement handles trustless coordination—disputes, payments, reputation. No single entity controls the feedback loop.

Why Solana Specifically

400ms finality — Spot preemptions need sub-second response
Sealevel runtime — Parallel state access enables non-blocking settlements across thousands of concurrent nodes
DePIN ecosystem — Helium, Render Network precedent for physical infrastructure
Rust-native — Same language as Synkti core, minimal context switching

The DePIN Thesis

Decentralized Physical Infrastructure Networks coordinate real-world resources without central operators. GPU compute is physical infrastructure with volatile supply.

Synkti brings orchestration intelligence—optimal migration, stateless failover, workload prediction. Solana brings trustless coordination—permissionless participation, cryptographic verification, economic alignment. Together: a permissionless GPU marketplace with production-grade reliability.

Technical Foundation

P2P peer discovery — EC2 tags (Phase 2) → libp2p DHT (Phase 3)
Provably optimal algorithms — Kuhn-Munkres vs. greedy heuristics
Domain-agnostic architecture — Same core for inference, training, batch
DSP/FFT signal processing — Extract workload patterns for prediction
Type-theoretic primitives — Compile-time guarantees for cloud operations

Explore Further

Synkti GitHub Axon GitHub Synapse GitHub

Get in Touch

GitHub LinkedIn Email

About

Independent consultant specializing in distributed systems, high-performance computing, and GPU programming. I build infrastructure that demands correctness and performance.

Currently building toward autonomous infrastructure: systems that are self-aware, self-healing, and self-governing. Starting with LLM inference orchestration, then expanding horizontally.

Philosophy

Long-term vision: Applications that deploy themselves. Not infrastructure managing applications, but applications as autonomous agents that navigate decentralized infrastructure, find suitable nodes, and self-serve. Type-theory-based cloud primitives provide compile-time guarantees. DSP/FFT extracts runtime patterns.

I derive concepts from first principles, then validate against existing work. Read more about my methodology...

Current Work

synkti Phase 1 ✓ → Phase 2 → Phase 3

70% cost savings for AI infrastructure without sacrificing reliability

Spot instance orchestration using optimal migration and stateless failover patterns. Makes volatile infrastructure production-ready for AI workloads.

2.2k LOC · 32 tests · Rust · AGPL-3.0 license

Research → Production → Decentralization

GitHub → Architecture → Simulation Results →

Built with our libraries:

synapse

High-performance CUDA FFI library with optimized GPU kernels. Rust-safe bindings for matrix operations, memory management, and tensor core acceleration.

Rust + C++/CUDA · Production FFI patterns · MIT license

axon Exploratory

Routing layer for multi-cloud LLM inference. Currently a thin vLLM wrapper. The abstraction becomes valuable when Synkti expands beyond AWS to GCP, Azure, etc. Waiting for real usage patterns before building routing features.

InferenceBackend trait · Multi-cloud potential · BSL-1.1 license

Expertise

Rust Async Runtimes (tokio, async_stream) Distributed Systems State Machine Design CUDA/C++ GPU Programming

Engagements

Available for CUDA-Rust consulting on high-performance systems, GPU orchestration, and distributed infrastructure.

Links

GitHub LinkedIn Email

What I Help With

Cut AI Infrastructure Costs by 70%

Build Systems That Don't Break

Make GPUs Go Further

Selected Work

Get in Touch

Long-Term Vision: Autonomous Infrastructure

Why Kubernetes Failed (And My Approach)

Problem 1: Centralized Control Plane

Problem 2: The Reconciliation Tax (State Drift)

Problem 3: Workload Blindness

Capacity Expansion, Not Cost Reduction

Type-Theory-Based Cloud Primitives

The Endgame: Decentralized Autonomy

Read More

Get in Touch

The Vision

Three-Phase Roadmap

Why Solana

The Problem with Centralized Orchestration

P2P Architecture: No Central Controller

The Solution: On-Chain Settlement

Why Solana Specifically

The DePIN Thesis

Technical Foundation

Explore Further

Get in Touch

About

Philosophy

Current Work

Built with our libraries:

Expertise

Engagements

Links