Multi-Adapter Endpoints on AWS: Cost-Optimized Fine-Tuning with QLoRA for Multi-Customer Legal GenAI

Video by FINOS via YouTube
Multi-Adapter Endpoints on AWS: Cost-Optimized Fine-Tuning with QLoRA for Multi-Customer Legal GenAI

Multi-Adapter Endpoints on AWS: Cost-Optimized Fine-Tuning with QLoRA for Multi-Customer Legal GenAI

https://github.com/aws-samples/sagemaker-genai-hosting-examples/tree/main/06-examples/01-train-deploy-LoRA

Supreet, a senior GenAI solutions architect at AWS Startups (Frontier AI team), presents a case study on using SageMaker multi-adapter endpoints to support a legal-tech SaaS with multiple customer datasets without deploying multiple model endpoints. After limited success with RAG and prompt engineering, the startup moved to fine-tuning using parameter-efficient QLoRA on an open-weights model (tested with Mistral 7B) to reduce cost and training time, training an initial ~100MB dataset in about two hours. A single real-time endpoint hosts the base model while multiple small adapters (about 50–200MB) are swapped in milliseconds via routing logic (implemented with AWS Lambda and keyword/LLM-based hybrid routing) to meet sub-5-second latency targets. Supreet emphasizes adapter-level evaluations using NLP metrics and LLM-as-a-judge with SME input, plus an AWS architecture involving S3, SageMaker training/evaluation, model registry, API Gateway, and monitoring.

00:00 Welcome and Intro
01:06 Session Setup and Slides
01:59 Legal Tech Case Study
03:10 Requirements and Constraints
04:42 Choosing QLoRA and Model
07:13 Multi-Adapter Endpoints Explained
09:39 Routing and Adapter Switching
11:46 Latency Results and Benefits
12:50 Evaluations and Architecture
14:14 Q&A Model Registry and Benchmarks
16:57 Deployment Layers and Monitoring
20:26 More Q&A Fine-Tuning vs RAG
24:53 Wrap-Up and Next Steps

Learn more about Zenith: https://zenith.finos.org

🌐 More about FINOS: https://www.finos.org/
📧 Join our newsletter: https://www.finos.org/sign-up
📥 Download the State of Open Source in Financial Services report: https://www.finos.org/state-of-open-source-in-financial-services
🎙️ Listen to our Open Source in Finance Podcast: https://www.youtube.com/@FINOS/podcasts
🗣️ Attend the next Open Source in Finance Forum: https://hubs.ly/Q03z9D9D0
LinkedIn: https://www.linkedin.com/company/finosfoundation

00:00 Welcome and Guest Intro
00:20 James Ashley XR Journey
02:53 Back to Smart Glasses
05:14 AI Accelerates XR Apps
08:05 Hunting the Killer App
10:24 SharePlay and Learning in XR
13:33 Design Philosophy in MR
16:26 Focus and VR Training
18:27 Star Wars vs Star Trek AI Lens
21:21 Will Developers Still Matter
23:38 AI Lowers the XR Barrier
26:40 Fragments and MR Storytelling
27:52 Blender MCP Breakthrough
32:26 Parting Advice and Closing

Source