|
Yupan Huang
Hi! I am currently a Senior Researcher at Microsoft Research within the General Artificial Intelligence group and Microsoft Research Asia – Vancouver lab. My research interests focus on multimodal AI. I am engaged in the innovation of multimodal foundation models, generative AI techniques, and document intelligence.
I obtained my Ph.D. degree in Computer Science and Technology from Sun Yat-sen University in 2023 as a member of the joint Ph.D. program between Sun Yat-sen University and Microsoft Research, advised by Prof. Yutong Lu and Dr. Furu Wei. I had the privilege of visiting the University of Cambridge, where I was hosted by Prof. Nigel Collier.
I'm happy to connect and discuss research-related topics through email or over a coffee chat. If you are interested in a research internship, please check out this page.
Microsoft Research Homepage  | 
Email  | 
Google Scholar  | 
Github  | 
LinkedIn  | 
Blog
|
Research Experience
Jun. 2024 – present, General Artificial Intelligence Group, Microsoft Research Asia - Vancouver, Senior Researcher
- Topic: Multimodal AI, General AI and Large Foundation Models
Jan. 2023 – Jun. 2023, Language Technology Lab, University of Cambridge, Visiting Student
- Advisor: Prof. Nigel Collier
- Topic: multimodal instruction-following models
Jul. 2021 – Jun. 2024, Natural Language Computing (now GenAI) Group, Microsoft Research Asia - Beijing, Research Intern
- Mentors: Dr. Lei Cui and Dr. Furu Wei
- Topic: multimodal document foundation models; visual text rendering with diffusion models
Jun. 2019 – Jul. 2021, Multimedia Search and Mining Group, Microsoft Research Asia - Beijing, Research Intern
- Mentors: Dr. Bei Liu and Dr. Jianlong Fu
- Topic: vision-language pre-training; image-and-text generation
Jul. 2017 – Jul. 2018, Multimedia Search and Mining Group, Microsoft Research Asia - Beijing, Research Intern
- Mentors: Dr. Qi Dai and Dr. Tao Mei
- Topic: video action detection
|
|
Kosmos-2.5: A Multimodal Literate Model.
Tengchao Lv*, Yupan Huang*, Jingye Chen*, Yuzhong Zhao, Yilin Jia, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei (*Equal Contributions).
arXiv preprint, 2023.
pdf
|
code
|
news
|
|
✨Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models.
Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu.
ICLR Workshop: Navigating and Addressing Data Problems for Foundation Models, 2024.
pdf
|
code
|
resources
|
demo video
|
homepage
|
poster
|
|
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering.
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei.
European Conference on Computer Vision (ECCV), 2024, oral presentation.
Top10 in the Hugging Face Space Trending List (December 2023)
pdf
|
homepage
|
code
|
demo
|
discord
|
news
|
|
TextDiffuser: Diffusion Models as Text Painters.
Jingye Chen*, Yupan Huang*, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei (*Equal Contributions).
Conference on Neural Information Processing Systems (NeurIPS), 2023.
Top10 in the Hugging Face Space Trending List (June 2023)
pdf
|
homepage
|
code
|
demo
|
colab
|
news
|
|
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking.
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
ACM International Conference on Multimedia (ACM Multimedia), 2022, oral presentation.
Over 100 million downloads in two years, 12 million downloads in a month (rank 9/505825 models) (Hugging Face Models Statistics in Feb 2024)
pdf
|
code
|
video
|
news in English and Chinese
| Hugging Face docs and models
|
|
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning.
Zhicheng Huang*, Zhaoyang Zeng*, Yupan Huang*, Bei Liu, Dongmei Fi, Jianlong Fu (*Equal Contributions).
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, oral presentation.
pdf
|
code
|
|
Unifying Multimodal Transformer for Bi-directional Image and Text Generation.
Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu.
ACM International Conference on Multimedia (ACM Multimedia), 2021.
pdf
|
code
|
video
|
slides
|
poster
|
|
Probing Inter-modality: Visual Parsing with Self-Attention for Vision- Language Pre-training.
Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo.
Conference on Neural Information Processing Systems (NeurIPS), 2021.
pdf
|
video
|
Education
Ph.D. in Computer Science and Technology, Sun Yat-Sen University, 2018-2023
B.E. in Software Engineering, Sun Yat-Sen University, 2014-2018
|
Honors & Awards
Outstanding Doctoral Dissertation Award in Sun Yat-sen University (top 1%), 2024
China National Scholarship for Graduate Excellence, 2022
Best Paper Award in ACM International Conference on Multimedia Retrieval Workshop, 2021
Outstanding Undergraduate Graduates in Sun Yat-sen University (top 5%), 2018
The Third Prize in Tianchi FashionAI Global Challenge (rank 3/2950 teams), 2018
China National Scholarship for Undergraduate Excellence, 2017
The First Place in JD Discovery Global Challenge (rank 1/1386 teams), 2017
The First Prize Scholarship in Sun Yat-sen University (top 5%), 2016 and 2017
|
Last updated: August 2024.
Page inspired by Jon Barron
|