JJJYmmm
Jie Huang
Institute of Computing Technology, Chinese Academy of Sciences
Beijing, China · 1650675829 [at] qq [dot] com
I am a master's student working on computer vision and vision-language models. My recent work studies multimodal model architectures and large VLM systems. I also enjoy contributing model support and inference infrastructure, especially for Qwen3-VL and Qwen3.5, to open-source communities.
News
Experience
Institute of Computing Technology, CAS
Huazhong University of Science and Technology
Qwen Team, Alibaba Cloud
Publications
-
Revisiting Multimodal Positional Encoding in Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2026.
-
Qwen3-VL Technical ReportarXiv preprint, 2025.
-
RefHCM: A Unified Model for Referring Perceptions in Human-Centric ScenariosIEEE Transactions on Multimedia (TMM), 2026.
-
Stealthy and Effective Physical Adversarial Attacks in Autonomous DrivingIEEE Transactions on Information Forensics and Security (TIFS), 2024.
Main Open Source Efforts
- Transformers: added support for fusion mapping, ZAYA1, Qwen3-VL, and Qwen3.5.
- vLLM: added support for Qwen3-VL and Qwen3.5.
- llama.cpp: added support for Qwen3-VL and Qwen3.5.
- MLX community: contributed to mlx-vlm #722 and mlx-lm #869.