JJJYmmm

Jie Huang

Institute of Computing Technology, Chinese Academy of Sciences
Beijing, China · 1650675829 [at] qq [dot] com

I am a master's student working on computer vision and vision-language models. My recent work studies multimodal model architectures and large VLM systems. I also enjoy contributing model support and inference infrastructure, especially for Qwen3-VL and Qwen3.5, to open-source communities.

Portrait of Jie Huang

News

Experience

Institute of Computing Technology, CAS
Master's student · 2024.9 - 2027.6 expected
Huazhong University of Science and Technology
B.Eng. · 2020.9 - 2024.6
Qwen Team, Alibaba Cloud
Research intern · 2025.4 - 2025.9
Core contributor to Qwen3-VL, including multimodal positional encoding, inference/training infrastructure, and the open-source adaptation.

Publications

  1. Revisiting Multimodal Positional Encoding in Vision-Language Models
    Jie Huang*, Xuejing Liu*, Sibo Song, Ruibing Hou, Hong Chang, Junyang Lin, Shuai Bai
    International Conference on Learning Representations (ICLR), 2026.
  2. Qwen3-VL Technical Report
    Core Contributor
    arXiv preprint, 2025.
  3. RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios
    Jie Huang, Ruibing Hou, Jiahe Zhao, Hong Chang, Shiguang Shan
    IEEE Transactions on Multimedia (TMM), 2026.
  4. Stealthy and Effective Physical Adversarial Attacks in Autonomous Driving
    Man Zhou, Wenyu Zhou, Jie Huang, Junhui Yang, Minxin Du, Qi Li
    IEEE Transactions on Information Forensics and Security (TIFS), 2024.

Main Open Source Efforts