Statistical Decision Making
통계적 의사결정관련기사 바로가기
Our research interests are focused on statistical approaches to the sequential decision problem. The multi-armed bandit (MAB) problem formulates the sequential decision problem in which a learner is sequentially faced with a set of available actions, chooses an action, and receives a random reward in response. The actions are often described as the arms of a bandit slot machine. The act
of choosing an action is characterized as pulling an arm of the bandit machine, where different arms give possibly different rewards. By repeating the process of pulling arms and receiving rewards, the learner accumulates information about the reward compensation
mechanism and learns from it, choosing the arm that is close to optimal as time elapses. In our lab, we integrate online learning and optimization techniques to develop algorithms that efficiently learn the reward model while maximizing the rewards. We also apply the developed algorithms to real tasks such as recommendation systems and mobile health apps. We also use causal inference to evaluate the performance of multi-armed bandit algorithms in a retrospective way.
Sequential Decision, Multi-armed bandit algorithms, Online learning, Causal inference, Policy evaluation
Reinforcement learning, Online deep learning
Research Keywords and Topics
Sequential Decision, Multi-armed bandit algorithms, Online learning, Causal inference, Policy evaluation, Causal inference, Missing data analysis
Kim, G.S., Kim, J.P., Yang, H.J. Robust tests in online decision-making . Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022).
Kim, G.S., Paik, M.C. Doubly-robust Lasso bandit. Neural Information Processing Systems (NeurIPS 2019),
Kim, G.S., Paik, M.C. Contextual multi-armed bandit algorithm for semiparametric reward model. Proceedings
of the 36th International Conference on Machine Learning (ICML 2019), 97:3389-3397, 2019.
다중 사용자에 대한 그래프 기반 상황별 다중 슬롯 머신 문제의 해를 산출하는 장치 및 방법, 백명희조, 백승훈, 최영근, 김지수, 2021년 8월.
METHOD AND DEVICE FOR REINFORCEMENT LEARNING USING NOVEL CENTERING OPERATION
BASED ON PROBABILITY DISTRIBUTION (신규한 가중치를 이용한 센터링 연산을 적용한 강화 학습 방법 및 장치), 백명희조, 김지수, 2020년 3월.
- EE. 정보/통신
- EE01. 정보이론
- EE0102. 알고리즘
- 기타 분야
- 060000. 국가기술지도(NTRM) 99개 핵심기술 분류에 속하지 않는 기타 연구
- 녹색기술관련 과제 아님
- 녹색기술관련 과제 아님
- 999. 녹색기술 관련과제 아님
- IT 분야
- 정보처리 시스템 및 S/W
- 010316. 기타 정보처리시스템 및 S/W 기술