Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system

Abstract : Probabilistic record linkage is a process of combining data from different sources, when such data refer to common entities and identifying information is not available. Fellegi and Sunter proposed a probabilistic record linkage framework that takes into account multiple non-identifying information, but is limited to simple binary comparison between matching variables. In our work, we propose an extension of this model for mixed-type comparison vectors. We develop a mixture model for handling comparison values of low prevalence categorical matching variables, and a mixture of hurdle gamma distribution for handling comparison values of continuous matching variables. The parameters are estimated by means of the Expectation Conditional Maximization (ECM) algorithm. Through a Monte Carlo simulation study, we evaluate both the posterior probability estimation for a record pair to be a match, and the prediction of matched record pairs. The simulation results indicate that the proposed methods outperform existing ones in most considered cases. The proposed methods are applied on a real dataset, to perform linkage between a registry of patients suffering from venous thromboembolism in the Brest district area (GETBO) and the French national health information system (SNDS).
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03290773
Contributor : Thanh Huan Vo Connect in order to contact the contributor
Submitted on : Monday, July 19, 2021 - 2:50:54 PM
Last modification on : Tuesday, January 4, 2022 - 10:14:04 AM
Long-term archiving on: : Wednesday, October 20, 2021 - 6:54:13 PM

File

Record_linkage.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03290773, version 1

Citation

Thanh Huan Vo, Guillaume Chauvet, André Happe, Emmanuel Oger, Stephane Paquelet, et al.. Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system. 2021. ⟨hal-03290773⟩

Share

Metrics

Les métriques sont temporairement indisponibles