Domain Fine-Tuning
QLoRA training runs across four domains: IT support, academic tutoring, financial analysis, and code generation — each producing a distinct LoRA adapter loadable without restarting the base model.
Sovereign AI infrastructure: fine-tune, serve, and optimize open-weight models entirely on your own hardware.
As open-weight AI models become increasingly capable, the question of how to run, adapt, and operationalize them on local hardware has become practically important — for sovereignty, for data sensitivity, and for applications where inference cannot cross a network boundary. This research project built a complete local AI infrastructure platform to investigate these questions from first principles.
The platform serves two related purposes: a fine-tuning laboratory for adapting open-weight models to specific domains using QLoRA at minimal hardware footprint, and a production-grade inference server that routes between multiple fine-tuned adapters simultaneously via vLLM's OpenAI-compatible API. Both components run on consumer hardware: two NVIDIA RTX 3080s in a Windows WSL2 environment.
The most substantive finding was the AI-guided optimization loop — a system where a fine-tuned local model analyzes the results of prior experiments and proposes parameter configurations for subsequent ones. The model acts as a domain-specific optimizer with knowledge baked into its weights, outperforming random and grid search at equivalent iteration counts.
Architecture
Fine-tuning runs through LLaMA-Factory with Unsloth's kernel optimizations — reducing memory requirements enough to fit useful 7B–14B parameter models on 10GB-VRAM consumer cards using 4-bit quantization. Multi-GPU training uses DeepSpeed ZeRO-2/3 configurations to distribute optimizer state across both cards without a managed cluster.
Training datasets are built through a naturalization pipeline: raw structured data is converted into the natural language format models reason over most effectively. A numerical record becomes a prose sentence — the format the model was pretrained to understand, and the format it will see at inference time.
The inference layer is vLLM, configured to serve multiple LoRA adapters simultaneously from a single loaded base model. One server, multiple specialized variants — domain-specific expertise available via a single OpenAI-compatible endpoint that any existing tooling can consume without modification. DuckDB handles columnar analytics over experiment histories, providing millisecond aggregate queries across thousands of recorded runs.
Research Areas
QLoRA training runs across four domains: IT support, academic tutoring, financial analysis, and code generation — each producing a distinct LoRA adapter loadable without restarting the base model.
A two-phase optimization loop: Latin Hypercube Sampling explores the parameter space broadly in the first 20% of iterations, then a fine-tuned model guides exploitation — analyzing Sharpe ratio, drawdown, and win rate to propose the next configuration.
vLLM configured to serve multiple LoRA adapters from a single base model instance — exposing each via a unique model ID on an OpenAI-compatible endpoint, with no client-side changes required.
A pipeline converting raw structured records into natural language training examples — the format models reason over most effectively at inference time, improving task-specific accuracy measurably over structured prompt injection alone.
DuckDB stores every training run, backtest result, and optimization iteration. Fast aggregate queries surface relationships across thousands of experiments — which configurations cluster, which metrics correlate, where the model's suggestions were wrong.
A Typer-based CLI covers server lifecycle management, adapter loading and routing, dataset preparation, training invocation, and backtest orchestration — the full research workflow accessible from a single interface.
Ongoing Research
The findings from this platform inform how we build AI-powered tools across the portfolio — particularly where data cannot leave a device or a controlled environment. The research continues.