인터랙티브 멀티모달 기계학습 랩_김태환 교수

* 프린트는 Chrome에 최적화 되어있습니다. print

인공지능대학원 / Graduate School of Artificial Intelligence

Interactive Multimodal Machine Learning Lab

인터랙티브 멀티모달 기계학습 연구실

우리 연구실은 사람의 가장 효과적인 소통 수단인 시각, 자연어, 그리고 음성을 통해 사람의 지능을 이해하고 구현하는 기계 학습 모델을 연구하고 있습니다. 구체적으로는 멀티모달 학습, 생성 모델, 심층 학습을 연구하고 있고, 연구 주제들은 멀티모달 LLM (Vision-Language Model), Embodied AI/Vision Language Action Model, 텍스트-to-이미지 생성 모델, 멀티모달 대화 모델, 동영상 이해 및 QA 모델, 이해 가능한 인공지능 등이 있습니다. 또한 자연과학에 AI 적용도 연구하고 있습니다.
Our lab aims to help understanding and implement human intelligence for most common communication media: vision, natural language, and speech. Since they are connected and correlated to each other, we work on developing effective and efficient machine learning models for multi-modalities.
In the Interactive Multimodal Machine Learning Lab lab, we are interested in Machine Learning and applications to Computer Vision and Language Processing. Specifically, we work on Multimodal Learning, Generative Models, and Deep Learning and our research topics include (but not limited to) Multimodal LLM (Vision-Language Model), Embodied AI/Vision Language Action Model, Text-to-image generation, Multi-modal conversational models, Video understanding and question answering, and Explainable AI. Additionally, we are also work on AI for Science.

Major research field

기계 학습 및 컴퓨터 비전, 언어처리 / Machine Learning and applications to Computer Vision and Language Processing

Desired field of research

멀티모달 학습, 생성 모델, 기계학습, 심층 학습 / Multimodal Learning, Generative Models, Machine Learning, and Deep Learning

Research Keywords and Topics

· 멀티모달 LLM (Vision-Language Model)
· Embodied AI/Vision-Language Action Model
· 텍스트-이미지/비디오 생성
· 멀티모달 대화 모델
· 비디오 이해 및 답변 생성 모델
· Multimodal LLM (Vision-Language Model)
· Embodied AI/Vision-Language Action Model
· Text-to-image/video generation
· Multi-modal conversational models
· Video understanding and question answering

Research Publications

· Thu Phuong Nguyen*, Duc M. Nguyen*, Hyotaek Jeon, Hyunwook Lee, Hyunmin Song, Sungahn Ko** and Taehwan Kim**, VEHME: A Vision Language Model For Evaluating Handwritten Mathematics Expressions, Empirical Methods in Natural Language Processing (EMNLP), 2025
· Taesoo Kim, Yongsik Jo, Hyunmin Song and Taehwan Kim, Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech, Interspeech, 2025
· Taegyeong Lee, Jeonghun Kang, Hyeonyu Kim and Taehwan Kim, Generating Realistic Images from In-the-wild Sounds, IEEE/CVF International Conference on Computer Vision (ICCV), October 2023

김태환 TAEHWAN KIM

Curriculum Vitae

Academic Credential

Awards/Honors/Memberships

Interactive Multimodal Machine Learning Lab

Major research field

Desired field of research

Research Keywords and Topics

Research Publications