Semiparametric density estimation under a two-sample density ratio model

(Bernoulli, 10(4), 583-604, 2004)





A semiparametric density estimation is proposed under a two-sample density ratio model. This model, arising naturally from case°Vcontrol studies and logistic discriminant analyses, can also be regarded as a biased sampling model. Our proposed density estimate is therefore an extension of the kernel density estimate suggested by Jones for length-biased data. We show that under the model considered the new density estimator not only is consistent but also has the °•smallest°¶ asymptotic variance among general nonparametric density estimators. We also show how to use the new estimate to define a procedure for testing the goodness of fit of the density ratio model. Such a test is consistent under very general alternatives. Finally, we present some results from simulations and from the analysis of two real data sets.

Keywords: asymptotic relative efficiency; biased sampling problem; case°Vcontrol data; density estimation; goodness-of-fit test; logistic regression; semiparametric maximum likelihood estimation