语言: English
科学研究research
科研进展
您当前的位置: 首页 > 科学研究 > 科研进展 > 正文
Data-driven learning optimal K values for K-nearest neighbour matching in causal inference
发布人:芦旭然发布时间:2025-05-28

Abstract

Within the realm of causal inference, a pivotal task involves causal effect estimation from observational data when there exist confounding variables. The K-Nearest Neighbour Matching (K-NNM) method is widely applied to handle confounding bias, but its general application sets a uniform K value for all samples, which can lead to suboptimal results in practice. To overcome this limitation, this paper introduces a novel method for causal effect estimation called Dynamic K-Nearest Neighbour Matching (DK-NNM). The DK-NNM method employs a data-driven learning strategy to determine the optimal value of K for each sample. In practice, DK-NNM reconstructs a sparse coefficient matrix for all samples using sparse learning, while simultaneously learning a graph matrix to preserve local information and sample similarity. This approach helps identify the most suitable K-value for each sample. Additionally, DK-NNM utilizes joint propensity and prognostic scores to effectively mitigate confounding bias arising from high-dimensional covariates during the K-NNM process. Experiments performed on various synthetic, semi-synthetic, and real-world datasets conclusively demonstrate that DK-NNM surpasses baseline models in estimating causal effects from observational data and provides significant improvements over traditional methods.