Personalized large language models (LLMs) tailor content to individual preferences using user profiles or histories. However, existing parameter-efficient fine-tuning (PEFT) methods, such as the ``One-PEFT-Per-User'' (OPPU) paradigm, require training a separate adapter for each user, making them computationally expensive and impractical for real-time updates. We introduce Profile-to-PEFT, a scalable framework that employs a hypernetwork, trained end-to-end, to map a user's encoded profile directly to a full set of adapter parameters (e.g., LoRA), eliminating per-user training at deployment. This design enables instant adaptation, generalization to unseen users, and privacy-preserving local deployment. Experimental results demonstrate that our method outperforms both prompt-based personalization and OPPU while using substantially fewer computational resources at deployment. The framework exhibits strong generalization to out-of-distribution users and maintains robustness across varying user activity levels and different embedding backbones. The proposed Profile-to-PEFT framework enables efficient, scalable, and adaptive LLM personalization suitable for large-scale applications.
While large language models (LLMs) offer powerful "one-size-fits-all" capabilities , tailoring them to individual user preferences remains a critical research direction. Methods based on parameter-efficient fine-tuning (PEFT), particularly the "One-PEFT-Per-User" (OPPU) paradigm, have been effective in encoding user preferences into lightweight, specific parameters. Despite this success, their reliance on training a unique module from scratch for each user presents a major limitation. This approach is computationally expensive, faces significant scalability challenges in systems with millions of users, and is impractical for real-time updates as preferences evolve. This bottleneck motivates our work: to explore whether a hypernetwork can be used to learn a direct mapping from a user's profile to a full set of personalized adapter parameters , enabling instant, scalable, and efficient LLM personalization without relying on iterative, per-user fine-tuning at deployment.
Our proposed method, Profile-to-PEFT (P2P), leverages a hypernetwork to generate personalized adapter parameters instantly, enabling scalable and real-time LLM adaptation. Unlike traditional approaches like the "One-PEFT-Per-User" (OPPU) paradigm , which depend on computationally intensive fine-tuning for every individual user , P2P trains a single hypernetwork to learn a direct mapping from a user's profile to their specific parameters. This framework employs the end-to-end trained hypernetwork to take a user's encoded profile as input and directly generate a full set of adapter parameters (e.g., LoRA) in a single forward pass. This design completely eliminates the need for per-user training at deployment , making it a practical and efficient alternative that offers strong generalization to unseen users and significantly reduces computational overhead.
Across the majority of tasks in the random split setting, P2P demonstrates superior or highly competitive performance. On average, it achieves the highest accuracy in classification tasks and the best ROUGE-L scores in generation tasks. Compared to the PEFT-based OPPU baseline, which requires expensive per-user training, P2P achieves better average performance in both classification and generation without any user-specific fine-tuning at deployment. The framework also demonstrates strong generalization to out-of-distribution users. In LLM-as-a-Judge evaluations on open-ended tasks, P2P not only achieves the highest average score (3.98) but also secures the best win-rate (58.4%) against the base model, confirming its superior personalization capabilities.
To analyze the trade-off between data diversity and volume, we compare P2P models trained on data from a varying number of users (diversity) versus a varying number of data points per user (quantity). Results show that increasing user diversity yields a significant performance boost across all task types. In contrast, increasing the data quantity per user provides diminishing returns. This suggests that the model benefits more from exposure to a wider range of user profiles than from more data from a few users, confirming that learning a generalizable mapping is key.
We compare the time required to generate personalized PEFT parameters for each user at deployment On average, OPPU (LoRA) takes 20.44s per user. In contrast, our proposed P2P requires only 0.57s per user, representing a speedup of 33x. The cumulative time for OPPU increases steeply and linearly with the user count, while the cost for P2P remains near-zero and constant. This highlights P2P's suitability for large-scale, real-time applications.
The quality of user embeddings is critical for the hypernetwork. We test three different embedding backbones: Qwen3-Emb-4B (4B), GTE-large (0.3B), and GTE-base (0.07B). All backbones yield substantial improvements over the non-personalized baseline, indicating P2P is not overly sensitive to the embedding model. While the strongest model (Qwen3-Emb-4B) performs best, the smaller GTE models still achieve 95%-98% of its performance, demonstrating the framework's robustness and flexibility.
We analyze P2P's performance across users with varying levels of activity (i.e., different amounts of historical data). Users are grouped into 'low' (1-10 samples), 'medium' (10-50 samples), and 'high' (50+ samples) activity levels. P2P consistently outperforms the non-personalized baseline across all groups, with performance improving as user activity increases. This demonstrates the framework's robustness and its ability to effectively personalize for both new users with sparse data and highly active users with rich histories.
@article{tan2025instant,
title={Instant Personalized Large Language Model Adaptation via Hypernetwork},
author={Tan, Zhaoxuan and Zhang, Zixuan and Wen, Haoyang and Li, Zheng and Zhang, Rongzhi and Chen, Pei and Mo, Fengran and Liu, Zheyuan and Zeng, Qingkai and Yin, Qingyu and others},
journal={arXiv preprint arXiv:2510.16282},
year={2025}
}