Jie Huang

Institute of Computing Technology, Chinese Academy of Sciences
Beijing, China · 1650675829 [at] qq [dot] com

I am a master's student working on computer vision and vision-language models. My recent work studies multimodal model architectures and large VLM systems. I also enjoy contributing model support and inference infrastructure, especially for Qwen3-VL and Qwen3.5, to open-source communities.

Google Scholar GitHub Website

News

Jan. 2026 Our paper Revisiting Multimodal Positional Encoding in Vision-Language Models has been accepted by ICLR 2026. Paper · GitHub
Nov. 2025 The Qwen3-VL technical report has been released. Paper · GitHub
May. 2025 Our paper RefHCM has been released and accepted by TMM. Paper · Code

Experience

Institute of Computing Technology, CAS

Master's student · 2024.9 - 2027.6 expected

Huazhong University of Science and Technology

B.Eng. · 2020.9 - 2024.6

Qwen Team, Alibaba Cloud

Research intern · 2025.4 - 2025.9

Core contributor to Qwen3-VL, including multimodal positional encoding, inference/training infrastructure, and the open-source adaptation.

Publications

Revisiting Multimodal Positional Encoding in Vision-Language Models

Jie Huang*, Xuejing Liu*, Sibo Song, Ruibing Hou, Hong Chang, Junyang Lin, Shuai Bai

International Conference on Learning Representations (ICLR), 2026.

Paper Code
Qwen3-VL Technical Report

Core Contributor

arXiv preprint, 2025.

Paper Code
RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios

Jie Huang, Ruibing Hou, Jiahe Zhao, Hong Chang, Shiguang Shan

IEEE Transactions on Multimedia (TMM), 2026.

Paper Code
Stealthy and Effective Physical Adversarial Attacks in Autonomous Driving

Man Zhou, Wenyu Zhou, Jie Huang, Junhui Yang, Minxin Du, Qi Li

IEEE Transactions on Information Forensics and Security (TIFS), 2024.

Paper

Main Open Source Efforts

Transformers: added support for fusion mapping, ZAYA1, Qwen3-VL, and Qwen3.5.
vLLM: added support for Qwen3-VL and Qwen3.5.
llama.cpp: added support for Qwen3-VL and Qwen3.5.
MLX community: contributed to mlx-vlm #722 and mlx-lm #869.