Yupan Huang

Hi! I am currently a Senior Researcher at Microsoft Research within the General Artificial Intelligence group and Microsoft Research Asia – Vancouver lab. My research interests focus on multimodal AI. I am engaged in the innovation of multimodal foundation models, generative AI techniques, and document intelligence.

I obtained my Ph.D. degree in Computer Science and Technology from Sun Yat-sen University in 2023 as a member of the joint Ph.D. program between Sun Yat-sen University and Microsoft Research, advised by Prof. Yutong Lu and Dr. Furu Wei. I had the privilege of visiting the University of Cambridge, where I was hosted by Prof. Nigel Collier.

I'm happy to connect and discuss research-related topics through email or over a coffee chat. If you are interested in a research internship, please check out this page.

Research Experience

Jun. 2024 – present, General Artificial Intelligence Group, Microsoft Research Asia - Vancouver, Senior Researcher

Topic: Multimodal AI, General AI and Large Foundation Models

Jan. 2023 – Jun. 2023, Language Technology Lab, University of Cambridge, Visiting Student

Advisor: Prof. Nigel Collier
Topic: multimodal instruction-following models

Jul. 2021 – Jun. 2024, Natural Language Computing (now GenAI) Group, Microsoft Research Asia - Beijing, Research Intern

Mentors: Dr. Lei Cui and Dr. Furu Wei
Topic: multimodal document foundation models; visual text rendering with diffusion models

Jun. 2019 – Jul. 2021, Multimedia Search and Mining Group, Microsoft Research Asia - Beijing, Research Intern

Mentors: Dr. Bei Liu and Dr. Jianlong Fu
Topic: vision-language pre-training; image-and-text generation

Jul. 2017 – Jul. 2018, Multimedia Search and Mining Group, Microsoft Research Asia - Beijing, Research Intern

Mentors: Dr. Qi Dai and Dr. Tao Mei
Topic: video action detection

Research Papers

	RedStone: Curating General, Code, Math, and QA Data for Large Language Models. Yaoyao Chang, Lei Cui, Li Dong, Shaohan Huang, Yangyu Huang, Yupan Huang, Scarlett Li, Tengchao Lv, Shuming Ma, Qinzheng Sun, Wenhui Wang, Furu Wei, Ying Xin, Mao Yang, Qiufeng Yin, Xingxing Zhang (alphabetical order by last names). arXiv preprint, 2024. pdf \| code
	Kosmos-2.5: A Multimodal Literate Model. Tengchao Lv, Yupan Huang, Jingye Chen, Yuzhong Zhao, Yilin Jia, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei (Equal Contributions). arXiv preprint, 2023. pdf \| code \| news
	✨Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models. Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu. ICLR Workshop: Navigating and Addressing Data Problems for Foundation Models, 2024. pdf \| code \| resources \| demo video \| homepage \| poster
	TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering. Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei. European Conference on Computer Vision (ECCV), 2024, oral presentation. Top10 in the Hugging Face Space Trending List (December 2023) pdf \| homepage \| code \| demo \| discord \| news
	TextDiffuser: Diffusion Models as Text Painters. Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei (Equal Contributions). Conference on Neural Information Processing Systems (NeurIPS*), 2023. Top10 in the Hugging Face Space Trending List (June 2023) pdf \| homepage \| code \| demo \| colab \| news
	LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. ACM International Conference on Multimedia (ACM Multimedia), 2022, oral presentation. Over 100 million downloads in two years, 12 million downloads in a month (rank 9/505825 models) (Hugging Face Models Statistics in Feb 2024) pdf \| code \| video \| news in English and Chinese \| Hugging Face docs and models
	Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning. Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fi, Jianlong Fu (Equal Contributions). IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, oral presentation. pdf \| code
	Unifying Multimodal Transformer for Bi-directional Image and Text Generation. Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu. ACM International Conference on Multimedia (ACM Multimedia), 2021. pdf \| code \| video \| slides \| poster
	Probing Inter-modality: Visual Parsing with Self-Attention for Vision- Language Pre-training. Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo. Conference on Neural Information Processing Systems (NeurIPS), 2021. pdf \| video

Education

Ph.D. in Computer Science and Technology, Sun Yat-Sen University, 2018-2023
B.E. in Software Engineering, Sun Yat-Sen University, 2014-2018

Honors & Awards

Outstanding Doctoral Dissertation Award in Sun Yat-sen University (top 1%), 2024
China National Scholarship for Graduate Excellence, 2022
Best Paper Award in ACM International Conference on Multimedia Retrieval Workshop, 2021
Outstanding Undergraduate Graduates in Sun Yat-sen University (top 5%), 2018
The Third Prize in Tianchi FashionAI Global Challenge (rank 3/2950 teams), 2018
China National Scholarship for Undergraduate Excellence, 2017
The First Place in JD Discovery Global Challenge (rank 1/1386 teams), 2017
The First Prize Scholarship in Sun Yat-sen University (top 5%), 2016 and 2017

Last updated: December 2024. Page inspired by Jon Barron