Data-Driven Decision Science Lab

Better Decisions by Better Data and Methodology

The D3SL conducts cutting-edge research at the intersection of statistics, data science, and privacy-preserving technologies. Our work aims to advance statistical methodology for the safe and efficient use of data in both public and private sectors, particularly focusing on complex survey data, incomplete data, and privacy-constrained environments.

People

Faculty

Associate Professor

Department of Statistics and Data Science

Yonsei University

ijh38@yonsei.ac.kr

PhD Course

Jeong, D.

Synthetic Data & Data Privacy

Kim, C.

Imbalanced data & Imputation

Kim, I.

Survey Sampling / Political Engineering

Oh, H.

Missing Data Analysis

Kim, J.

Anomaly Detection

Master Course

Lee, C.

Time-series data synthesis

Park, K.

Text data analysis

Park, J.

Data synthesis

Chun, S.

Data synthesis

Jeon, J.

Causal analysis

Research

Manuscripts (Submitted)

  1. A Survey on Tabular Data Synthesis: Generation, Evaluation, and Benchmarking Experiments
  2. Imputation-Based Causal Analysis for Observational Data
  3. Beyond the Privacy-Utility Trade-off: Cryptographic Federated Calibration for Enhanced Model Performance
  4. Auditing Alignment: A Framework for Measuring Value Drift in Public Sector AI Procurement
  5. Bias-corrected estimation in causal mediation analysis

Published Works (Statistics & Data Science, Other Domains)

  1. (2025) GAM-MIDAS: Generalized additive model-based mixed-data sampling regression with informal data, Applied Soft Computing
  2. (2025) Calibrated Mixup for Imbalanced Regression on Tabular Data, Pattern Recognition
  3. (2025) Re-sampling Calibrated SNN Loss: A Robust Approach to Non-IID Data in Federated Learning, Expert Systems
  4. (2025) Identification enhanced generalised linear model estimation with nonignorable missing outcomes, Journal of the Royal Statistical Society: Series A
  5. (2025) Optimizing Federated Learning: Addressing Key Challenges in Real-World Applications, IEEE Internet of Things Journal
  6. (2025) A Generalized Theory of Mixup for Structure-Preserving Synthetic Data, AISTATS
  7. (2025) 선거 여론조사의 품질과 대표성에 관한 분석, 제20대 제21대 국회의원선거를 중심으로, 조사연구
  8. (2024) Overcoming Data Imbalance in Federated Learning with Calibration Weighting, IEEE International Conference on Big Data
  9. (2024) Online News-Based Economic Sentiment Index, IEEE Transactions on Big Data
  10. (2024) Robust demand estimation with customer choice-based models for sales transaction data, Production and Operations Management
  11. (2024) Proposing Bayesian Hierarchical Growth Curve Models (BHGCMs) for Tourism and Hospitality Research, International Journal of Hospitality & Management
  12. (2024) Relabeling & Raking Algorithm for Imbalanced Classification, Expert Systems with Applications
  13. (2023) Resampling Approach for One-Class Classification, Pattern Recognition
  14. (2023) Quantile regression with multiple proxy variables, STAT
  15. (2023) A new global measure to simultaneously evaluate data utility and privacy risk, IEEE Transactions Information Forensics and Security
  16. (2023) Potential threat of microplastics to humans: toxicity prediction modeling by small data analysis, Environmental Science: Nano
  17. (2022) Confirmatory aspect-level opinion mining processes for tourim and hospitality research: a proposal of DissBUS, Current Issues in Tourism
  18. (2022) Data integration of National Dose Registry and survey data using multivariate imputation by chained equations, Plos One
  19. (2022) Cosine-based variable bandwidth selection for nonparametric spectral density estimation under long-range dependence, Journal of Statistical Computation and Simulation
  20. (2022) A small-data-driven model for predicting adsorption properties in polymeric thin films, Chemical Communications
  21. (2021) A note on stationary bootstrap variance estimator under long-range dependence, Statistics & Probability Letters
  22. (2021) Integration of statistical and administrative agricultural data from Namibia, Statistical Journal of the IAOS
  23. (2021) A growth curve-based Bayesian hierarchical model for multi-building energy use data analysis, Building and Environment
  24. (2021) COVID-19, social distancing, and risk-averse actions of hospitality and tourism consumers: A case of South Korea, Journal of Destination Marketing & Management
  25. (2021) COVID-19: Were public health interventions and the disclosure of patients' contact history effective in upholding social distancing? Evidence from South Korea, Journal of Multidisciplinary Healthcare
  26. (2020) Does the written word matter? the role of uncovering and utilizing information from written comments in housing ads, Journal of Housing Research
  27. (2020) An assessment of opinions and perceptions of smart thermostats using aspect-based sentiment analysis of online reviews, Building and Environment
  28. (2020) A least squares-type density estimator using a polynomial function, Computational Statistics & Data Analysis
  29. (2019) Frequency domain bootstrap for ratio statistics under long-range dependence, Journal of the Korean Statistical Society
  30. (2019) Impacts of fractional hot-deck imputation on learning and prediction of engineering data, IEEE Tran. Knowledge and Data Engineering
  31. (2019) Cost-effective extreme case-control design using a resampling method, Evolutionary Bioinformatics
  32. (2018) Proposing a missing data method for hospitality research on online customer reviews: An application of imputation approach, International Journal of the Contemporary Hospitality Management
  33. (2018) FHDI: An R package for fractional hot deck imputation, R Journal
  34. (2017) Multiple imputation for nonignorable missing data, Journal of the Korean Statistical Society
  35. (2017) Correlation estimation with singly truncated bivariate data, Statistics in Medicine
  36. (2017) Energy efficiency in US residentail rental housing: Adoption rates and impact on rent, Applied Energy
  37. (2016) A post-hoc genome-wide association sutdy using matched samples, International Journal of Data Mining and Bioinformatics
  38. (2016) A propensity-score-adjustment method for nonignorable nonresponse, Journal of Survey Statistics and Methodology
  39. (2015) Two-phase sampling approach to fractional hot deck imputation, Proceedings of the Survey Research Methods Section
  40. (2014) Propensity score adjustment with several followups, Biometrika

Project

국가 R&D

  1. [인문사회융합] AI 생성 합성데이터의 경제사회적 영향 및 활용 연구 (2025.06~2028.05)
  2. [ITRC] 데이터기반 메너지시스템 혁신 연구센터 (2023.07~2030.12)
  3. [우수신진] 결측치 대체 방법을 활용한 데이터 결합 및 정보보호 방법론 연구 (2021.03~2026.02)

연구용역 (최근 4년)

  1. (2025) 한국도로공사 - DSRC 합성데이터 생성
  2. (2025) 통계청 - 농림어업총조사
  3. (2024~2025) KISDI - 반도체 시장위험지수 작성
  4. (2023) 통계청 - 농림어업총조사 등록센서스
  5. (2023) 한국원자력안전기술원 - 최신 방사선 영향평가 기술기준 적용 방안 연구
  6. (2022) 현대모비스 - 서비스부품 수요예측 알고리즘 고도화
  7. (2022) 통계개발원 - 개발관측치 단위(극단값 등)에서의 통계적 노출위험 및 제어방법론 개발 연구
  8. (2022~2025) KOICA 베트남 금융분야 조기경보 및 위기관리 역량강화 사업 PMC 용역

Alumni (최근 5년)

학계

  • Yonsei University (Post Doctor)
  • Carnegie Mellon University (PhD Student)
  • University of Texas, Austin (PhD Student)

산업

  • 나이스평가정보
  • 대한항공
  • 리디
  • DB 화재
  • LG 전자

정부&공공

  • 금융감독원
  • 주택도시보증공사
  • 정보통신정책연구원
  • 한국은행