Instant Personalized Large Language Model Adaptation via Hypernetwork

1University of Notre Dame, 2Amazon.com Inc., 3Université de Montréal, Work done while as an intern at Amazon

Abstract

Personalized large language models (LLMs) tailor content to individual preferences using user profiles or histories. However, existing parameter-efficient fine-tuning (PEFT) methods, such as the ``One-PEFT-Per-User'' (OPPU) paradigm, require training a separate adapter for each user, making them computationally expensive and impractical for real-time updates. We introduce Profile-to-PEFT, a scalable framework that employs a hypernetwork, trained end-to-end, to map a user's encoded profile directly to a full set of adapter parameters (e.g., LoRA), eliminating per-user training at deployment. This design enables instant adaptation, generalization to unseen users, and privacy-preserving local deployment. Experimental results demonstrate that our method outperforms both prompt-based personalization and OPPU while using substantially fewer computational resources at deployment. The framework exhibits strong generalization to out-of-distribution users and maintains robustness across varying user activity levels and different embedding backbones. The proposed Profile-to-PEFT framework enables efficient, scalable, and adaptive LLM personalization suitable for large-scale applications.

Motivation

PUGC motivation illustration showing how user-generated content contains implicit preferences

The "One-PEFT-Per-User" method uses computationally intensive fine-tuning to create personalizedparameters. In contrast, our proposed Profile-to-PEFT uses a hypernetwork to directly generate parameters from user history or profile in a single inference pass.

While large language models (LLMs) offer powerful "one-size-fits-all" capabilities , tailoring them to individual user preferences remains a critical research direction. Methods based on parameter-efficient fine-tuning (PEFT), particularly the "One-PEFT-Per-User" (OPPU) paradigm, have been effective in encoding user preferences into lightweight, specific parameters. Despite this success, their reliance on training a unique module from scratch for each user presents a major limitation. This approach is computationally expensive, faces significant scalability challenges in systems with millions of users, and is impractical for real-time updates as preferences evolve. This bottleneck motivates our work: to explore whether a hypernetwork can be used to learn a direct mapping from a user's profile to a full set of personalized adapter parameters , enabling instant, scalable, and efficient LLM personalization without relying on iterative, per-user fine-tuning at deployment.

Profile-to-PEFT Framework

PUGC framework illustration showing the three-stage process of converting UGC to preference data

Overview of the Profile-to-PEFT architecture, where user history, depth, module embeddings are fed into the hypernetwork to obtain personalized LoRA. P2P is optimized in a end-to-end training manner.

Our proposed method, Profile-to-PEFT (P2P), leverages a hypernetwork to generate personalized adapter parameters instantly, enabling scalable and real-time LLM adaptation. Unlike traditional approaches like the "One-PEFT-Per-User" (OPPU) paradigm , which depend on computationally intensive fine-tuning for every individual user , P2P trains a single hypernetwork to learn a direct mapping from a user's profile to their specific parameters. This framework employs the end-to-end trained hypernetwork to take a user's encoded profile as input and directly generate a full set of adapter parameters (e.g., LoRA) in a single forward pass. This design completely eliminates the need for per-user training at deployment , making it a practical and efficient alternative that offers strong generalization to unseen users and significantly reduces computational overhead.

Experimental Results

Random Split Results

Main experimental results showing PUGC performance across different benchmarks and models

Main experiment results on the LaMP and LongLaMP benchmarks under the Random split setting.

Out-of-Distribution Split Results

Main experimental results showing PUGC performance across different benchmarks and models

Main experiment results on the LaMP and LongLaMP benchmarks under the OOD split setting.

LLM-as-a-Judge Evaluation Results

Main experimental results showing PUGC performance across different benchmarks and models

LLM-as-a-Judge (GPT-4o) evaluation on Personal Reddit and Empathic Conversation datasets.

Across the majority of tasks in the random split setting, P2P demonstrates superior or highly competitive performance. On average, it achieves the highest accuracy in classification tasks and the best ROUGE-L scores in generation tasks. Compared to the PEFT-based OPPU baseline, which requires expensive per-user training, P2P achieves better average performance in both classification and generation without any user-specific fine-tuning at deployment. The framework also demonstrates strong generalization to out-of-distribution users. In LLM-as-a-Judge evaluations on open-ended tasks, P2P not only achieves the highest average score (3.98) but also secures the best win-rate (58.4%) against the base model, confirming its superior personalization capabilities.

Additional Analysis

Training Data Diversity > Quantity

Ablation study results showing the contribution of different PUGC components

To analyze the trade-off between data diversity and volume, we compare P2P models trained on data from a varying number of users (diversity) versus a varying number of data points per user (quantity). Results show that increasing user diversity yields a significant performance boost across all task types. In contrast, increasing the data quantity per user provides diminishing returns. This suggests that the model benefits more from exposure to a wider range of user profiles than from more data from a few users, confirming that learning a generalizable mapping is key.

Deployment Efficiency

Performance scaling with different amounts of UGC data

We compare the time required to generate personalized PEFT parameters for each user at deployment On average, OPPU (LoRA) takes 20.44s per user. In contrast, our proposed P2P requires only 0.57s per user, representing a speedup of 33x. The cumulative time for OPPU increases steeply and linearly with the user count, while the cost for P2P remains near-zero and constant. This highlights P2P's suitability for large-scale, real-time applications.

Embedding Model Robustness

Cross-domain evaluation results showing PUGC's generalization capabilities

The quality of user embeddings is critical for the hypernetwork. We test three different embedding backbones: Qwen3-Emb-4B (4B), GTE-large (0.3B), and GTE-base (0.07B). All backbones yield substantial improvements over the non-personalized baseline, indicating P2P is not overly sensitive to the embedding model. While the strongest model (Qwen3-Emb-4B) performs best, the smaller GTE models still achieve 95%-98% of its performance, demonstrating the framework's robustness and flexibility.

P2P Robust to User Activity Level

Human evaluation results comparing PUGC with baseline methods

We analyze P2P's performance across users with varying levels of activity (i.e., different amounts of historical data). Users are grouped into 'low' (1-10 samples), 'medium' (10-50 samples), and 'high' (50+ samples) activity levels. P2P consistently outperforms the non-personalized baseline across all groups, with performance improving as user activity increases. This demonstrates the framework's robustness and its ability to effectively personalize for both new users with sparse data and highly active users with rich histories.

BibTeX

@article{tan2025instant,
      title={Instant Personalized Large Language Model Adaptation via Hypernetwork},
      author={Tan, Zhaoxuan and Zhang, Zixuan and Wen, Haoyang and Li, Zheng and Zhang, Rongzhi and Chen, Pei and Mo, Fengran and Liu, Zheyuan and Zeng, Qingkai and Yin, Qingyu and others},
      journal={arXiv preprint arXiv:2510.16282},
      year={2025}
    }