Several scholars have critized the Freedom House democracy ratings as politically biased; do countries indeed incorrectly receive better ratings that have stronger political ties with the United States? With the partial exception of contributions by Bollen [1993: Liberal Democracy: Validity and Method Factors in Cross-National Measures] and Bollen/Paxton [2000: Subjective Measures of Liberal Democracy] this claim has not been systematically tested to date. These existing studies focus on the monadic characteristics of individual countries to detect bias, rather than on more appropriate measures of bilateral relations with the U.S. In this contribution, I adopt a novel strategy to investigate into the assertion that FH ratings are contaminated by systematic bias. I start from the assumption that other indices of democracy and/or latent summary measure derived from these can be used as a benchmark (after the democracy scales have been made comparable). I employ different estimation strategies to gauge whether differences between these indices and the FH ratings can be explained in a systematic manner by variables that record relationships between the U.S. and the countries under investigation. I consider, inter alia, data on diplomatic contacts, affinity measures based on voting behavior in the UN’s general assembly, data on common membership in alliances and data on foreign assistance by the U.S. The underlying strategy is based on the rationale that while differences between democracy measures are to be expected due to varying concepts and operationalizations, these differences should be orthogonal to the data on bilateral relations if there was no systematic bias. In addition to these statistical tests, possible mechanisms that may lead to bias in the ratings are thoroughly discussed. The question tackled might be considered as substantially interesting in itself. Ultimately, detecting and modelling systematic biases in democracy score might also prove helpful in constructing latent democracy measures (such as Pemstein et al.’s UDS) that, to date, build on the restrictive assumption of non-systematic rating errors.