AI Engineering in Production
From RAG pipelines and vector databases to MCP servers and agent security — the operational patterns for shipping LLM-backed systems that survive contact with real traffic.
Articles in this series
Building Production RAG Pipelines: Chunking, Embeddings, and Retrieval at Scale
Build RAG systems that work in production: chunking strategies, embedding selection, pgvector ops, and retrieval quality evaluation.
Vector Databases Compared: pgvector vs Pinecone vs Weaviate
Compare pgvector, Pinecone, Weaviate, Qdrant, Milvus, and Chroma on performance, cost, and operational fit — with real code and each database's documented performance envelope.
LLM API Integration Patterns for Backend Engineers
Production LLM API patterns: streaming, function calling, retries, token budgets, cost optimization, and observability for backend engineers.
Spring AI in Production: RAG Pipelines, Reliability, and Observability for Java Backends
Spring AI 1.1 deep-dive: production RAG pipeline with PII scrubbing, circuit breakers, Micrometer observability, and answer evaluation.
Building an MCP Server in Go with Code Mode: From 1.17M Tokens to 1,000
2,500 API endpoints in one MCP server without blowing context windows. The Code Mode pattern uses search + execute to cut token cost by 1,000x.
Securing AI Agent Infrastructure: MCP Servers, Tool Calls, and the Attack Surface You're Not Watching
AI agents calling tools via MCP create new attack surfaces: prompt injection through tool responses, credential leakage, and unauthorized execution.