Multi-Adapter Endpoints on AWS: Cost-Optimized Fine-Tuning with QLoRA for Multi-Customer Legal GenAI

Video by FINOS via YouTube

https://github.com/aws-samples/sagemaker-genai-hosting-examples/tree/main/06-examples/01-train-deploy-LoRA

Supreet, a senior GenAI solutions architect at AWS Startups (Frontier AI team), presents a case study on using SageMaker multi-adapter endpoints to support a legal-tech SaaS with multiple customer datasets without deploying multiple model endpoints. After limited success with RAG and prompt engineering, the startup moved to fine-tuning using parameter-efficient QLoRA on an open-weights model (tested with Mistral 7B) to reduce cost and training time, training an initial ~100MB dataset in about two hours. A single real-time endpoint hosts the base model while multiple small adapters (about 50–200MB) are swapped in milliseconds via routing logic (implemented with AWS Lambda and keyword/LLM-based hybrid routing) to meet sub-5-second latency targets. Supreet emphasizes adapter-level evaluations using NLP metrics and LLM-as-a-judge with SME input, plus an AWS architecture involving S3, SageMaker training/evaluation, model registry, API Gateway, and monitoring.

00:00 Welcome and Intro
01:06 Session Setup and Slides
01:59 Legal Tech Case Study
03:10 Requirements and Constraints
04:42 Choosing QLoRA and Model
07:13 Multi-Adapter Endpoints Explained
09:39 Routing and Adapter Switching
11:46 Latency Results and Benefits
12:50 Evaluations and Architecture
14:14 Q&A Model Registry and Benchmarks
16:57 Deployment Layers and Monitoring
20:26 More Q&A Fine-Tuning vs RAG
24:53 Wrap-Up and Next Steps

Learn more about Zenith: https://zenith.finos.org

🌐 More about FINOS: https://www.finos.org/
📧 Join our newsletter: https://www.finos.org/sign-up
📥 Download the State of Open Source in Financial Services report: https://www.finos.org/state-of-open-source-in-financial-services
🎙️ Listen to our Open Source in Finance Podcast: https://www.youtube.com/@FINOS/podcasts
🗣️ Attend the next Open Source in Finance Forum: https://hubs.ly/Q03z9D9D0
LinkedIn: https://www.linkedin.com/company/finosfoundation

00:00 Welcome and Guest Intro
00:20 James Ashley XR Journey
02:53 Back to Smart Glasses
05:14 AI Accelerates XR Apps
08:05 Hunting the Killer App
10:24 SharePlay and Learning in XR
13:33 Design Philosophy in MR
16:26 Focus and VR Training
18:27 Star Wars vs Star Trek AI Lens
21:21 Will Developers Still Matter
23:38 AI Lowers the XR Barrier
26:40 Fragments and MR Storytelling
27:52 Blender MCP Breakthrough
32:26 Parting Advice and Closing

Source

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Related Posts: