- * 프린트는 Chrome에 최적화 되어있습니다. print
Our lab aims to help understanding and implement human intelligence for most common communication media: vision, natural language, and speech. Since they are connected and correlated to each other, we work on developing effective and efficient machine learning models for multi-modalities.
In the Interactive Multimodal Machine Learning Lab lab, we are interested in Machine Learning and applications to Computer Vision and Language Processing. Specifically, we work on Multimodal Learning, Generative Models, and Deep Learning and our research topics include (but not limited to) Multimodal LLM (Vision-Language Model), Embodied AI/Vision Language Action Model, Text-to-image generation, Multi-modal conversational models, Video understanding and question answering, and Explainable AI. Additionally, we are also work on AI for Science.
Major research field
Machine Learning and applications to Computer Vision and Language Processing
Desired field of research
Multimodal Learning, Generative Models, Machine Learning, and Deep Learning
Research Keywords and Topics
· Multimodal LLM (Vision-Language Model)
· Embodied AI/Vision-Language Action Model
· Text-to-image/video generation
· Multi-modal conversational models
· Video understanding and question answering
Research Publications
· Thu Phuong Nguyen*, Duc M. Nguyen*, Hyotaek Jeon, Hyunwook Lee, Hyunmin Song, Sungahn Ko** and Taehwan Kim**, VEHME: A Vision Language Model For Evaluating Handwritten Mathematics Expressions, Empirical Methods in Natural Language Processing (EMNLP), 2025
· Taesoo Kim, Yongsik Jo, Hyunmin Song and Taehwan Kim, Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech, Interspeech, 2025
· Taegyeong Lee, Jeonghun Kang, Hyeonyu Kim and Taehwan Kim, Generating Realistic Images from In-the-wild Sounds, IEEE/CVF International Conference on Computer Vision (ICCV), October 2023




