Zhaoxuan Tan

How to pronounce my name?

Zhaoxuan -> jow sh-yen.
Tan -> tรฆn.
You can call me Josh, which is pronounced similarly to Zhaoxuan.

Hi there, thanks for visiting my website! I'm a (Now() - 08/2023).ceil().ordinal()-year CSE PhD student at the University of Notre Dame, where I am fortunate to be advised by Prof. Meng Jiang and affiliated with the DM2 lab. Prior to that, I obtained my bachelor's degree in computer science and technology at Xi'an Jiaotong University (2019-2023) and had wonderful time conducting research at the LUD lab, where I worked closely with Shangbin Feng and Prof. Minnan Luo.

I play with user data, including user-generated content and user behavior data, to personalize and enhance Large Language Models, as well as to detect suspicious user behavior.

Please feel free to drop me an Email for any form of communication or collaboration!โ˜˜๏ธ

Email:  ztan3 [at] nd [dot] edu  /  tanzx9 [at] gmail [dot] com

CV  /  Google Scholar  /  Semantic Scholar  /  X (Twitter)  /  Github  /  LinkedIn

profile photo
๐Ÿ”ฅWhat's New
  • [2024.09] 4 papers were accepted to EMNLP 2024!๐Ÿ‘ Main: OPPU, Per-Pcs, ProCo. Findings: NLGift. See you in Miami!๐Ÿ–๏ธ
  • [2024.07] Chain-of-Layer was accepted to CIKM 2024!๐Ÿ‘
  • [2024.06] Per-Pcs is alive on arxiv. Come check out our framework of personalizing LLM with collaborative efforts!
  • [2024.05] 4 papers were accepted to ACL 2024๐Ÿ‘! Main: BotSay. Findings: SKU, DELL, and K-Crosswords.
  • [2024.02] OPPU is alive on arxiv. Welcome to check out our work!
  • [2024.01] KGQuiz was accepted to WWW 2024. Huge congrats to Yuyang!๐Ÿ‘
  • [2024.01] I will join Amazon as an applied scientist intern this summer. See you in Palo Alto!
  • [2023.12] LLM-UM was accepted to DEBULL and live on arxiv. Welcome to check out our work and the reading list!๐Ÿค—
  • [2023.10] LMBot was accepted to WSDM 2024. Huge congrats to Zijian!๐Ÿ‘
  • [2023.10] BotPercent and MVSD were accepted to EMNLP 2023. Congrats to co-authors!
  • [2023.09] NLGraph was accepted to NeurIPS 2023 as a spotlight. Huge congrats to Heng!๐Ÿ‘
  • [2023.07] I graduated from XJTU and stepped down as the LUD lab director. Be the Light of the World!๐ŸŽ“
  • [2023.05] KALM was accepted to ACL 2023, kudos to coauthors!๐Ÿ‘
  • [2023.04] MVSD is live on arxiv, welcome to check out our work!
  • [2023.04] BotMoE was accepted to SIGIR 2023. Huge congrats to Yuhan!
  • [2023.03] I will join the University of Notre Dame to work with Prof. Meng Jiang this fall. Thank you for seeing my potential and looking forward to the incoming PhD journey!๐Ÿฅณ
  • [2023.02] BotPercent is alive on arxiv, welcome to check out our work!
  • [2023.01] KRACL was accepted by WWW 2023, cheers!๐Ÿป
  • [2022.11] Our team CogDL-kgTransformer won the 4-th place in the OGB-LSC@NeurIPS2022 competition WikiKG90Mv2 track!
Selected Publications (* indicates equal contribution) [Google Scholar]
2024
3DSP Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts
Zhaoxuan Tan, Zheyuan Liu, Meng Jiang
Proceedings of EMNLP 2024.

We proposed Personalized Pieces (Per-Pcs) for personalizing large language models, where users can safely share and assemble personalized PEFT modules efficiently through collaborative efforts. Per-Pcs outperforms non-personalized and PEFT retrieval baselines, offering performance comparable to OPPU with significantly lower resource use, promoting safe sharing and making LLM personalization more efficient, effective, and widely accessible.

3DSP Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning
Zhaoxuan Tan, Qingkai Zeng, Yijun Tian, Zheyuan Liu, Bing Yin, Meng Jiang
Proceedings of EMNLP 2024.

We proposed One PEFT Per User (OPPU) for personalizing large language models, where each user is equipped a personal PEFT module that can be plugged in base LLM to obtain their personal LLM. OPPU exhibits model ownership and enhanced generalization in capturing user behavior patterns compared to existing prompt-based LLM personalization methods.

3DSP LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection
Zijian Cai, Zhaoxuan Tan, Zhenyu Lei, Zifeng Zhu, Hongrui Wang, Qinghua Zheng, Minnan Luo
Proceedings of WSDM, 2024.

We propose LMBot, which utilizes a language model with graph-aware knowledge distillation to act as a proxy for graph-less Twitter bot detection inference. This approach effectively resolves graph data dependency and sampling bias issues.

2023
3DSP User Modeling in the Era of Large Language Models: Current Research and Future Directions
Zhaoxuan Tan, Meng Jiang
IEEE Data Engineering Bulletin (DEBULL), 2023.
reading list

We summarize existing research about how and why LLMs are great tools of modeling and understanding UGC. Then we review a few categories of large language models for user modeling (LLM-UM) approaches that integrate the LLMs with text and graph-based methods in different ways. Then we introduce specific LLM-UM techniques for a variety of UM applications. Finally, we present remaining challenges and future directions in the LLM-UM research.

3DSP BotPercent: Estimating Bot Populations in Twitter Communities
Zhaoxuan Tan*, Shangbin Feng*, Melanie Sclar, Herun Wan, Minnan Luo, Yejin Choi, Yulia Tsvetkov
Proceedings of EMNLP-Findings, 2023.
demo / tweet

We introduce the concept of community-level Twitter bot detection and develope BotPercent, a multi-dataset, multi-model Twitter bot detection pipeline. Utilizing BotPercent, we investigate the presence of bots in various Twitter communities and discovered that bot distribution is heterogeneous in both space and time.

3DSP Can Language Models Solve Graph Problems in Natural Language?
Heng Wang*, Shangbin Feng*, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, Yulia Tsvetkov
Proceedings of NeurIPS 2023 (Spotlight)
code

Are language models graph reasoners? We propose the NLGraph benchmark, a test bed for graph-based reasoning designed for language models in natural language. We find that LLMs are preliminary graph thinkers while the most advanced graph reasoning tasks remain an open research question.

3DSP HOFA: Twitter Bot Detection with Homophily-Oriented Augmentation and Frequency Adaptive Attention
Sen Ye, Zhaoxuan Tan, Zhenyu Lei, Ruijie He, Hongrui Wang, Qinghua Zheng, Minnan Luo
arXiv preprint 2023.

We identify the heterophilous disguise challenge in Twitter bot detection and proposed HOFA, a novel framework equipped with Homophily-Oriented Augmentation and Frequency Adaptive Attention to demystify the heterophilous disguise challenge.

3DSP BotMoE: Twitter Bot Detection with Community-Aware Mixtures of Modal-Specific Experts
Yuhan Liu, Zhaoxuan Tan, Heng Wang, Shangbin Feng, Qinghua Zheng, Minnan Luo
Proceedings of SIGIR 2023.

We propose community-aware mixture-of-experts to address two challenges in detecting advanced Twitter bots: manipulated features and diverse communities.

3DSP KRACL: Contrastive Learning with Graph Context Modeling for Sparse Knowledge Graph Completion
Zhaoxuan Tan, Zilong Chen, Shangbin Feng, Qingyue Zhang, Qinghua Zheng, Jundong Li, Minnan Luo
Proceedings of The Web Conference (WWW), 2023.
code / talk

We adopt contrastive learning and knowledge relational attention network to alleviate the widespread sparsity problem in knowledge graphs.

2022
3DSP TwiBot-22: Towards Graph-Based Twitter Bot Detection
Shangbin Feng*, Zhaoxuan Tan*, Herun Wan*, Ningnan Wang*, Zilong Chen*, Binchi Zhang*, Qinghua Zheng, Wenqian Zhang, Zhenyu Lei, Shujie Yang, Xinshun Feng, Qingyue Zhang, Hongrui Wang, Yuhan Liu, Yuyang Bai, Heng Wang, Zijian Cai, Yanbo Wang, Lijing Zheng, Zihan Ma, Jundong Li, Minnan Luo
Proceedings of NeurIPS, Datasets and Benchmarks Track, 2022.
website / GitHub / bibtex / poster

We present TwiBot-22, the largest graph-based Twitter bot detection benchmark to date, which provides diversified entities and relations in Twittersphere and has considerably better annotation quality.

3DSP Heterogeneity-Aware Twitter Bot Detection with Relational Graph Transformers
Shangbin Feng, Zhaoxuan Tan, Rui Li, Minnan Luo
Proceedings of AAAI 2022.
slides / code / bibtex

We propose the relational graph transformers GNN architecture to leverage the intrinsic relation heterogeneity and influence heterogeneity in Twitter network.

Industrial Experience
Amazon Science
2024.05 - 2024.10

Applied Scientist Intern @ Rufus
Host: Dr. Zheng Li
Palo Alto, CA
Education
University of Notre Dame
2023.08 - present

Ph.D. in Computer Science and Engineering
Advisor: Prof. Meng Jiang
Xi'an Jiaotong University
2019.08 - 2023.07

B.E. in Computer Science and Technology
GPA: 89.1 (+3) / 100.0 [top 5%]
Advisor: Prof. Minnan Luo
Service
  • Reviewer: AISTATS (2025), COLM (2024), KDD (2024), ARR (Dec 2023-), WWW (2024, 2025), ICLR (2024, 2025), TKDE (2023-), TNNLS (2023-), ICWSM (2024), NeurIPS (2023, 2024), NeurIPS dataset and benchmark track (2022), LoG (2022, 2023, 2024), AGI@ICLR (2024), TGL@NeurIPS (2023), GCLR@AAAI (2024), KnowledgeNLP@ACL (2024), WiNLP@EMNLP (2024).
  • Volunteer: EMNLP 2023 (virtual), EMNLP 2022 (virtual).
  • Director of the LUD lab (promote undergraduate research) 2022-2023.
Miscellaneous
  • I have the fortune to work with brilliant mentors, collaborators, and advisors during my research journey and I am truly grateful for their guidance and help. If you feel like I can be of some help to your research career, welcome to reach out!โ˜•
  • My Chinese name is ่ฐญๅ…†่ฝฉ (Tan, Zhaoxuan).
  • I enjoy playing trumpet๐ŸŽบ and served as the principal trumpet player in my primary and high school band๐ŸŽผ.
  • I am a big fan of Jules Verne, and especially fascinated with In Search of the Castaways, From the Earth to the Moon, and Five Weeks in a Balloon.
  • I also love jogging๐Ÿƒ and playing table tennis๐Ÿ“.

Template courtesy: Jon Barron.