Tether Data announced the launch of QVAC Fabric LLM, a new LLM inference runtime and fine-tuning framework that makes it possible to execute, train and personalize large language models on hardware, including consumer GPUs, laptops, and smartphones. What had once reportedly required more high-end cloud servers or specialized NVIDIA systems “can now happen locally on devices people already own.”
High Performance LLM Inference Runtime and Fine-tuning have traditionally been reserved for firms with access to costly infrastructure, but QVAC Fabric reportedly shifts that model completely.
It is the first unified, portable, cross-platform, highly-scalable system capable of full LLM inference execution, LoRA, and “instruction-tuning across mobile operating systems (iOS and Android) as well as all other laptop, desktop, and server environments (Windows, macOS, Linux), allowing developers and organizations to build, deploy, execute, and customize AI privately and independently.”
No cloud dependency, no vendor lock-in, and “no sensitive data leaving the device.”
A breakthrough in this release is the ability “to fine-tune models on mobile GPUs such as Qualcomm Adreno and ARM Mali.”
This is the first time a production-ready framework “has enabled modern LLM training on smartphone-class hardware.”
It opens the door to personalized AI that can learn “directly from users on their devices, preserving privacy and functioning even without an internet connection, and powering a new generation of highly resilient, anti-fragile, on-device AI applications.”
QVAC Fabric LLM also expands the capabilities of “the llama.cpp ecosystem by adding fine-tuning support for modern models such as LLama3, Qwen3, and Gemma3.”
These models, previously unsupported in this environment, “can now be fine-tuned through a simple, consistent workflow across all hardware types.”
By enabling training across a range of GPUs, AMD, Intel, NVIDIA, Apple Silicon, and mobile chips, QVAC Fabric LLM breaks “the assumption that meaningful AI development requires access to specialized, single-vendor hardware.”
Consumer GPUs now play on equal footing, and “even mobile devices enter the conversation as legitimate training platforms.” This is a “significant step toward diversifying the hardware available for AI development.”
For enterprises, the implications extend “beyond convenience.”
Organizations can now fine-tune AI models in-house, “on secure hardware, without exposing sensitive data to external cloud providers”.
This makes it easier to meet privacy, regulatory, and cost requirements while still “deploying modern AI tailored to internal needs.”
It moves fine-tuning from centralized GPU clusters “to the broader device fleet companies already manage.”
Paolo Ardoino, CEO of Tether, said the release demonstrates the company’s commitment to “making AI more accessible and more resilient.”
They added that AI should not be something “controlled only by large cloud platforms.”
Ardoino also noted that QVAC Fabric LLM gives people and companies the ability to “execute inference and fine-tune powerful models on their own terms, on their own hardware, with full control of their data.”
This is the future of privacy-preserving, “decentralized, hyper-scalable, and ubiquitous AI.”
Tether Data has released QVAC Fabric LLM “as open-source software under the Apache 2.0 license, along with multi-platform binaries and ready-to-use adapters on Hugging Face.”
Developers can begin fine-tuning with “a few commands, lowering the barrier to AI customization in a way that has not been possible before.”
QVAC Fabric LLM represents a shift toward “decentralized, user-controlled AI.”
Although much of the industry continues to focus on cloud-first solutions, Tether Data is making personalization “accessible on local edge hardware to ensure operational continuity in high-latency geographical areas (i.e., emerging markets), to provide an anti-fragile, privacy-first, highly-resilient, and scalable AI platform.”