profile photo

Yupan Huang

Hi! I obtained my Ph.D. degree in Computer Science and Technology from Sun Yat-sen University in 2023 as a member of the joint Ph.D. program between Sun Yat-sen University and Microsoft Research, advised by Prof. Yutong Lu and Dr. Furu Wei. I had the privilege of visiting the University of Cambridge, where I was hosted by Prof. Nigel Collier.

My research interests lie at the intersection of computer vision and natural language processing. I am engaged in the innovation of multimodal foundation models, generative AI techniques, and document intelligence.

Email  |  Google Scholar  |  Github  |  LinkedIn  |  Blog

Research Experience

Jan. 2023 – Jun. 2023, Language Technology Lab, University of Cambridge, visiting student

  • Advisor: Prof. Nigel Collier
  • Topic: multimodal instruction-following models (arXiv 2023)

Jul. 2021 – Dec. 2023, Natural Language Computing Group, Microsoft Research Asia, research intern

  • Mentors: Dr. Lei Cui and Dr. Furu Wei
  • Topic: multimodal document foundation models (ACM Multimedia 2022, arXiv 2023);
    diffusion models, visual text rendering (NeurIPS 2023)

Jun. 2019 – Jul. 2021, Multimedia Search and Mining Group, Microsoft Research Asia, research intern

  • Mentors: Dr. Bei Liu and Dr. Jianlong Fu
  • Topic: vision-language pre-training (CVPR 2021, NeurIPS 2021);
    image-and-text generation (ACM Multimedia 2021)

Jul. 2017 – Jul. 2018, Multimedia Search and Mining Group, Microsoft Research Asia, research intern

  • Mentors: Dr. Qi Dai and Dr. Tao Mei
  • Topic: video action detection (ICME 2019)

Research Papers
Kosmos-2.5: A Multimodal Literate Model.
Tengchao Lv*, Yupan Huang*, Jingye Chen*, Lei Cui*, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei (*Equal).
arXiv preprint, 2023.
pdf | code | news
✨Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models.
Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu.
arXiv preprint, 2023.
pdf | code | resources | demo video
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering.
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei.
arXiv preprint, 2023.
Top10 in the Hugging Face Space Trending List (December 2023)
pdf | homepage | code | demo | discord | news
TextDiffuser: Diffusion Models as Text Painters.
Jingye Chen*, Yupan Huang*, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei (*Equal Contributions).
Conference on Neural Information Processing Systems (NeurIPS), 2023.
Top10 in the Hugging Face Space Trending List (June 2023)
pdf | homepage | code | demo | colab | news
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking.
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
ACM International Conference on Multimedia (ACM Multimedia), 2022, oral presentation.
Over 100 million downloads in two years, 12 million downloads in a month (rank 9/505825 models) (Hugging Face Models Statistics in Feb 2024)
pdf | code | video | news in English and Chinese | Hugging Face docs and models
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning.
Zhicheng Huang*, Zhaoyang Zeng*, Yupan Huang*, Bei Liu, Dongmei Fi, Jianlong Fu (*Equal Contributions).
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, oral presentation.
pdf | code
Unifying Multimodal Transformer for Bi-directional Image and Text Generation.
Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu.
ACM International Conference on Multimedia (ACM Multimedia), 2021.
pdf | code | video | slides | poster

Probing Inter-modality: Visual Parsing with Self-Attention for Vision- Language Pre-training.
Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo.
Conference on Neural Information Processing Systems (NeurIPS), 2021.
pdf | video

Education

Ph.D. in Computer Science and Technology, Sun Yat-Sen University, 2018-2023
B.E. in Software Engineering, Sun Yat-Sen University, 2014-2018

Hornors & Awards

Best Paper Award in ACM International Conference on Multimedia Retrieval Workshop, 2021
China National Scholarship for Graduate Excellence, 2022
China National Scholarship for Undergraduate Excellence, 2017
Outstanding Undergraduate Graduates in Sun Yat-sen University (top 5%), 2018
The First Prize Scholarship in Sun Yat-sen University (top 5%), 2016 and 2017
The First Place in JD Discovery Global Challenge (rank 1/1386 teams), 2017
The Third Prize in Tianchi FashionAI Global Challenge (rank 3/2950 teams), 2017

Last updated: January 2024. Page inspired by Jon Barron