This paper answers the fundamental question about estimating KL divergence from samples. We argues that the estimation is unreliable if the complexity of discriminator is not controlled during training. We show how to do that in an RKHS space with theoretical analysis.