profile photo

Yupan Huang

Hi! I am currently a Senior Researcher at Microsoft working on General AI and Large Foundation Models. I obtained my Ph.D. degree in Computer Science and Technology from Sun Yat-sen University in 2023 as a member of the joint Ph.D. program between Sun Yat-sen University and Microsoft Research, advised by Prof. Yutong Lu and Dr. Furu Wei. I had the privilege of visiting the University of Cambridge, where I was hosted by Prof. Nigel Collier.

My research interests focus on multimodal AI. I am engaged in the innovation of multimodal foundation models, generative AI techniques, and document intelligence.

Microsoft Research Homepage  |  Email  |  Google Scholar  |  Github  |  LinkedIn  |  Blog

Research Experience

Jan. 2023 – Jun. 2023, Language Technology Lab, University of Cambridge, visiting student

  • Advisor: Prof. Nigel Collier
  • Topic: multimodal instruction-following models

Jul. 2021 – Dec. 2023, Natural Language Computing Group, Microsoft Research Asia, research intern

  • Mentors: Dr. Lei Cui and Dr. Furu Wei
  • Topic: multimodal document foundation models; visual text rendering with diffusion models

Jun. 2019 – Jul. 2021, Multimedia Search and Mining Group, Microsoft Research Asia, research intern

  • Mentors: Dr. Bei Liu and Dr. Jianlong Fu
  • Topic: vision-language pre-training; image-and-text generation

Jul. 2017 – Jul. 2018, Multimedia Search and Mining Group, Microsoft Research Asia, research intern

  • Mentors: Dr. Qi Dai and Dr. Tao Mei
  • Topic: video action detection

Research Papers
Kosmos-2.5: A Multimodal Literate Model.
Tengchao Lv*, Yupan Huang*, Jingye Chen*, Lei Cui*, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei (*Equal).
arXiv preprint, 2023.
pdf | code | news
✨Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models.
Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu.
ICLR Workshop: Navigating and Addressing Data Problems for Foundation Models, 2024.
pdf | code | resources | demo video | homepage | poster
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering.
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei.
European Conference on Computer Vision (ECCV), 2024.
Top10 in the Hugging Face Space Trending List (December 2023)
pdf | homepage | code | demo | discord | news
TextDiffuser: Diffusion Models as Text Painters.
Jingye Chen*, Yupan Huang*, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei (*Equal Contributions).
Conference on Neural Information Processing Systems (NeurIPS), 2023.
Top10 in the Hugging Face Space Trending List (June 2023)
pdf | homepage | code | demo | colab | news
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking.
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
ACM International Conference on Multimedia (ACM Multimedia), 2022, oral presentation.
Over 100 million downloads in two years, 12 million downloads in a month (rank 9/505825 models) (Hugging Face Models Statistics in Feb 2024)
pdf | code | video | news in English and Chinese | Hugging Face docs and models
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning.
Zhicheng Huang*, Zhaoyang Zeng*, Yupan Huang*, Bei Liu, Dongmei Fi, Jianlong Fu (*Equal Contributions).
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, oral presentation.
pdf | code
Unifying Multimodal Transformer for Bi-directional Image and Text Generation.
Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu.
ACM International Conference on Multimedia (ACM Multimedia), 2021.
pdf | code | video | slides | poster

Probing Inter-modality: Visual Parsing with Self-Attention for Vision- Language Pre-training.
Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo.
Conference on Neural Information Processing Systems (NeurIPS), 2021.
pdf | video

Education

Ph.D. in Computer Science and Technology, Sun Yat-Sen University, 2018-2023
B.E. in Software Engineering, Sun Yat-Sen University, 2014-2018

Honors & Awards

Outstanding Doctoral Dissertation Award in Sun Yat-sen University (top 1%), 2024
China National Scholarship for Graduate Excellence, 2022
Best Paper Award in ACM International Conference on Multimedia Retrieval Workshop, 2021
Outstanding Undergraduate Graduates in Sun Yat-sen University (top 5%), 2018
The Third Prize in Tianchi FashionAI Global Challenge (rank 3/2950 teams), 2018
China National Scholarship for Undergraduate Excellence, 2017
The First Place in JD Discovery Global Challenge (rank 1/1386 teams), 2017
The First Prize Scholarship in Sun Yat-sen University (top 5%), 2016 and 2017

Last updated: January 2024. Page inspired by Jon Barron