Portfolio item number 2
Short description of portfolio item number 2 
Short description of portfolio item number 2 
Published in IEEE Transactions on Information Forensics and Security (TIFS), 2024
We propose stealthy and effective physical adversarial attack methods targeting perception systems in autonomous driving.
Recommended citation: Mingfu Zhou, Wei Zhou, Jie Huang, Jianyuan Yang, Mingxing Du, Qiang Li. (2024). "Stealthy and Effective Physical Adversarial Attacks in Autonomous Driving." IEEE Transactions on Information Forensics and Security. 19, 6795-6809.
Download Paper
Published in IEEE Transactions on Multimedia (TMM), 2025
RefHCM is a unified framework that integrates a wide range of human-centric referring tasks into a sequence-to-sequence paradigm using a plain encoder-decoder transformer.
Recommended citation: Jie Huang, Ruibing Hou, Jiahe Zhao, Hong Chang, Shiguang Shan. (2025). "RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios." IEEE Transactions on Multimedia.
Download Paper
Published in arXiv preprint, 2025
We propose MHRoPE and MRoPE-I, simple and plug-and-play positional encoding variants that consistently outperform existing approaches in vision-language models.
Recommended citation: Jie Huang, Xuejing Liu, Shijie Song, Ruibing Hou, Hong Chang, Jinlin Lin, Shuai Bai. (2025). "Revisiting Multimodal Positional Encoding in Vision-Language Models." arXiv preprint arXiv:2510.23095.
Download Paper
Published in arXiv preprint, 2025
Qwen3-VL is the most capable vision-language model in the Qwen series, supporting interleaved contexts of up to 256K tokens for text, images, and video.
Recommended citation: Shuai Bai, ..., Jie Huang, ..., et al. (2025). "Qwen3-VL Technical Report." arXiv preprint arXiv:2511.21631.
Download Paper