Interpersonal persuasion is central to democratic politics, yet much remains unknown about the determinants of persuasive influence. Large language models offer new opportunities to explore these complexities at a depth and scale previously impossible. We illustrate this potential by using generative AI to address key challenges in the study of gender in political persuasion. Previous work consistently reports gender gaps in political persuasion, but important dimensions of gender are typically observed together. Men’s and women’s voices, as signaled by vocal prosody (pitch, tone, speed, etc.), cue gendered stereotypes about expertise and authority that may disadvantage women. Men and women also typically rely on different rhetorical strategies (argument, morphology, syntax, semantics, etc.) in persuasion. This makes it difficult to causally identify the mechanisms driving the gender gap in persuasion. In a large-scale experiment, we put 4,300 human respondents in dynamic phone conversations with life-like AI agents using the same prompt to try and persuade participants to change their views on an assigned political topic. These agents were randomly assigned either a male or female sounding voice. This approach allowed us to manipulate the perception of the gender of the speaker while holding constant rhetorical strategy. The gender prosody of the voice did not have an aggregate effect on persuasiveness. However, consistent with gender role congruity theory, we observed important penalties for female-coded voices (but not male-coded voices) who took political positions incongruent with gender stereotypes.