| CARVIEW |
AgentFlux
AgentFlux: A Framework for Privacy-Preserving
On-Device Agentic Systems
We introduce a framework for privacy-preserving, on-device AI agent workloads. By decoupling agentic tasks into function selection and argument generation, both tackled by local LLM orchestration, our system delivers accuracy approaching cloud-based models while fully protecting user data from third-party exposure and enabling cost-efficient execution on consumer hardware.
Tool Selector Adapter
Classification: Which tool to invoke?
Argument Generator Adapter
Generation: Tool-specific arguments
On-Device Execution
Privacy-preserving orchestration
AgentFlux: Decoupled Post-Training Pipeline & Inference Framework
System Demonstration
Watch AgentFlux in Action
See how AgentFlux orchestrates privacy-preserving tool calling across multiple applications, performing complex tasks while keeping all sensitive data on your device.
Demo: AgentFlux performing file operations tasks entirely on-device, compared to Llama-3.1-8B model
Why It Matters
AgentFlux bridges the performance gap between frontier orchestration models and local deployable systems.
Privacy-Preserving AI
Agents that run fully offline, protecting sensitive user data from third-party exposure
Efficient Orchestration
No reasoning latency overhead, fast execution on consumer hardware
Scalable & Modular
Training for tool ecosystems that evolve over time
By decoupling fine-tuning and introducing dynamic adapter loading, AgentFlux democratizes agentic AI, bringing practical autonomy to the edge.
Motivating Use Cases
Blockchains & Financial Applications
AgentFlux enables local consolidation, analysis, and reporting across blockchain and traditional finance, keeping all private data on-device. Only anonymized outputs leave the user's machine.
AI Browsers
Local models execute privacy-preserving tasks for sensitive data while collaborating with cloud models for large public data like web search summarization.
Developer Terminals & Coding Agents
AgentFlux addresses data leakage in coding assistants by executing parts locally and cost-efficiently while maintaining access to entire codebases.
Background
AI systems are rapidly expanding from chatbots and media generation to robotics and financial applications. Leading AI platforms run in the cloud, sending all user queriesโoften including sensitive context like code, preferences, and past interactionsโto third-party providers.
๐ Privacy Challenge
User data, including medical and financial records, is routinely exposed to cloud providers.
โก Cost & Latency
Cloud APIs charge per token and throttle requests, with true steady-state costs still unknown.
The Solution: Edge Computing with AgentFlux
AgentFlux introduces a new framework for edge computing that partitions workloads into two distinct tasks: selection of functions that need to be called, and argument generation. This partitioning allows for a hierarchical architecture that achieves end-to-end accuracy comparable with state-of-the-art cloud models while maintaining the benefits of privacy and performance on consumer-grade GPUs.
Core Architecture
AgentFlux architecture showing the decoupled post-training pipeline and inference framework with specialized LoRA adapters
Agentic systems autonomously solve complex tasks through iterative cycles: decomposing goals into discrete steps, executing each by invoking external tools, and dynamically adjusting based on tool outputs. Success hinges on LLM orchestrationโthe system's ability to accurately select the right tool and generate correct arguments at each decision point.
AgentFlux fundamentally reimagines this orchestration. Rather than relying on a monolithic LLM orchestrator, it employs multiple specialized LoRA adapters trained through a decoupled post-training pipeline and coordinated by a novel inference framework.
Post-Training Pipeline
1. Tool Selector Adapter
Functions as a classifier, identifying the optimal tool for each workflow step during inference.
2. Argument Generator Adapter
Produces precise, context-appropriate arguments for the selected tool at each step.
Decoupled Inference Framework
Classification Sub-Step
Dynamically loads the tool selector adapter to determine which tool to invoke.
Argument Generation Sub-Step
Dynamically loads the corresponding argument generator adapter to construct the tool's input parameters.
Inference Pipeline Flow
Toolset Selection
Base model routes to relevant toolset (Filesystem, Notion, Monday.com)
Tool Selection (Classification)
Load Tool Selector LoRA adapter โ Classify which specific tool to invoke
Argument Generation
Load Argument Generator LoRA adapter โ Generate precise, structured arguments
Tool Execution
Execute in containerized sandbox โ Return observation โ Continue or summarize
Complete inference pipeline showing hierarchical orchestration with dynamic LoRA adapter loading