Qitong Wang (王琦童)

Ph.D. Student

University of Delaware

Dept. of Computer & Information Sciences

Address:
EmailCVScholarGithubX (Twitter)

I am currently pursuing my Ph.D. in the Department of Computer and Information Sciences (CIS) at the University of Delaware (UD), advised by Xi Peng. Previously I collaborate with Julie Michelle Klinger on designing machine learning frameworks tailored for geospatial data analysis. My research primarily revolves around Computer Vision and Machine Learning. Specifically, I am dedicated to exploring the application of trustworthy deep learning models. Additionally, my research also involves developing frameworks for video learning and understanding.

Prior to joining the University of Delaware, I completed my M.S. degree in the Department of Computer Science at Boston University advised by Margrit Betke. During that period, my research focus was on developing models for text detection and recognition. Before that, I got my B.Eng. degree from the Wuhan University of Technology.

In the industry, I am fortunate to have the opportunities to intern or collaborate with Fan Du (Dolby Laboratories), Ting Liu (Google), Long Zhao (Google), Liangzhe Yuan (Google), R. Manmatha (Amazon), Yusheng Xie (Amazon).


News


Publications

Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality

Qitong WangTang LiKien X. NguyenXi Peng

Association for the Advancement of Artificial Intelligence (AAAI), Philadelphia, Pennsylvania, USA, 2025.

paper code

@InProceedings{Wang_2025_Rationale,
 author = {Wang, Qitong and Li, Tang and Nguyen, Kien X. and Peng, Xi},
 title = {Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality},
 booktitle = {In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI)},
 month = {February},
 year = {2025},
}

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

Qitong WangLong ZhaoLiangzhe YuanTing LiuXi Peng

International Conference on Computer Vision (ICCV), Paris, France, 2023.

paper code blogpost

@InProceedings{Wang_2023_ICCV,
 author = {Wang, Qitong and Zhao, Long and Yuan, Liangzhe and Liu, Ting and Peng, Xi},
 title = {Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition},
 booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
 month = {October},
 year = {2023},
 pages = {3307-3317}
}

Learning Representational Invariances for Data-Efficient Action Recognition

Yuliang ZouJinwoo ChoiQitong WangJia-Bin Huang

Computer Vision and Image Understanding (CVIU), 2022.

paper code website

@article{zou2023learning,
 title={Learning representational invariances for data-efficient action recognition},
 author={Zou, Yuliang and Choi, Jinwoo and Wang, Qitong and Huang, Jia-Bin},
 journal={Computer Vision and Image Understanding},
 volume={227},
 pages={103597},
 year={2023},
 publisher={Elsevier}
}

Region-aware Arbitrary-shaped Text Detection with Progressive Fusion

Qitong WangBin FuMing LiJunjun HeXi PengYu Qiao

IEEE Transactions on Multimedia (TMM), 2022.

paper

@article{wang2022region,
 title={Region-aware Arbitrary-shaped Text Detection with Progressive Fusion},
 author={Wang, Qitong and Fu, Bin and Li, Ming and He, Junjun and Peng, Xi and Qiao, Yu},
 journal={IEEE Transactions on Multimedia},
 year={2022},
 publisher={IEEE}
}

Semantic-Based Sentence Recognition in Images Using Bimodal Deep Learning

Yi ZhengQitong WangMargrit Betke

IEEE International Conference on Image Processing (ICIP), Anchorage, Alaska, USA, 2021.

paper data

@article{Zheng2021SemanticBasedSR,
 title={Semantic-Based Sentence Recognition in Images Using Bimodal Deep Learning},
 author={Y. Zheng and Qitong Wang and Margrit Betke},
 journal={2021 IEEE International Conference on Image Processing (ICIP)},
 year={2021},
 pages={2753-2757},
 url={https://api.semanticscholar.org/CorpusID:238082348}
}

A Method for Detecting Text of Arbitrary Shapes in Natural Scenes That Improves Text Spotting

Qitong WangYi ZhengMargrit Betke

Workshop on Text and Documents in the Deep Learning Era (CVPR), Virtual, 2020.

paper code

@InProceedings{Wang_2020_CVPR_Workshops,
 author = {Wang, Qitong and Zheng, Yi and Betke, Margrit},
 title = {A Method for Detecting Text of Arbitrary Shapes in Natural Scenes That Improves Text Spotting},
 booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
 month = {June},
 year = {2020}
}


Service


Education



University of Delaware
May 2021 - Present
Doctor of Philosophy

Boston University
Sep 2018 - May 2020
Master of Science

Wuhan University of Tech
Sep 2014 - Jun 2018
Bachelor of Engineering


Intern & Collab



Dolby Laboratories
PhD Research Intern
Jun 2025 - Sep 2025

Google Research
Research Collaboration
Sep 2021 - Nov 2022

Amazon Web Service
Applied Science Intern
Jun 2021 - Aug 2021

Shenzhen Inst of Adv Tech
Visiting Student
May 2020 - Aug 2020


MISC