Based on the assumptions, Gender (A nominal) should not be included in the P CC. However, inadventently some one may be estimated keeping in view of the robustness.
Rather than why Pearson's r can be used, I'd ask why it is. More importantly, what are the assumptions violated by using Pearson's r for gender? Clearly gender can't constitute an interval or ratio variable. However, neither can likert-type scale variables, which are analyzed using Pearson's r all the time. The extent to which linearity is violated given any dataset is specific to that dataset. Most research papers I read which rely on Pearson' r do not justify (and nowhere claim to have tested the assumption of joint normal distributions, yet this is also a required assumption for Pearson's r. Basically, most uses of Pearson's r in some sense violate required assumptions. The question is how and in what ways and what the effect is. One can easily model how Pearson's r can pose problems for dichotomous variables. But plug it into SAS, SPSS, Statistica, MATLAB, etc., and lo and behold one will get an output. How robust this output is to the assumptions violated is, even for gender, unique to the dataset in question (although I can't imagine many real-world examples where Pearson's r can be justified in this case). My point is mainly that this, the most frequently used measure of correlation, is frequently used in violation of its underlying assumptions, and thus it is usually better to approach a question about when and why a statistical measure/metric/test should be used by looking at what assumptions underlie it, not whether a given application violates one or other of these.
I have been reading articles in relation to this and in some social sceinces studies genedr is included in correlation analysis because it's a discrete variable
Technically, an assumption underlying Pearson's r is joint normal distributions, and thus continuity. A probability function or a probability distribution function that is continuous over any interval (the latter would be over the interval [0,1]) isn't discrete; rather, it assumes that a variable can take infinitely many values between any two points, and thus every "point" in the interval is infinitesimal. For the most part, a variable need not remotely approximate a truly continous variable in order to use Pearson's r (and, indeed, the degree to which any finite set can be said to approximate an uncountably infinite set better than some other finite set is not a trivial question). That said, just because a variable is discrete doesn't necessarily make Pearson's r any more appropriate than if it were nominal (and nominal variables are discrete; a set which isn't discrete is continuous). Again, I think it important not to ask why one should or should not use Pearson's r or any statistical measure, but rather to ask what using such a measure entails. Whatever logic underlies some statistical technique, test, metric, etc., comes first. Only once this is taken into account can we ask whether or not its application is justified given some dataset, research question, etc.
All of the discussion above about assumptions of Pearson's r are well and good. Like, I think, the OP, I am confused. I'm a neuroscientist, and I don't believe I have ever tried to correlate non-scale variables over my 30+ year career. However, I'm currently a graduate student in Forensics and have been required to take a course in quantitative analysis. The course uses the Field "Discovering Statistics Using IBM SPSS" textbook. Field insists that the Pearson r can be used when one variable is categorical, or even binary (i.e., both cases obviously being nominal). Also, I have seen the statistic used quite commonly in typical criminological analyses. For example, in determining the relationship between neighborhoods and crime rate (e.g., 10 neighborhoods arbitrarily given numbers [a nominal variable] correlated with crime rate [a continuous variable].). I guess this is OK, and I doubt I will seriously need to think about this again after the course is over, but in the same chapter, Field covers biserial and point-biserial correlations, which makes the matter all the more confusing. In the case of biserial correlations, one of the variables is truly dichotomous (e.g., dead or alive), and in point-biserial correlations there are continuities in the dichotomy (e.g., grade on a test which constitutes passing or failing, where if the cutoff is 60% and 59% is a lot closer to passing than is 38%). Notwithstanding the obvious question about my point-biserial example (e.g., why not just use the actual exam score as the variable), I hope some social scientist that understands this issue chimes in here.
I got the biserial and point biserial variables backward (RG should some provision for editing answers). The point biserial r is for correlating a true dichotomous variable with a continuous variable, and a biserial r is for correlating a (continuous-dichotomous) variable with a continuous variable. I think that further research on the point-biserial correlation would provide the answer the OP was looking for.
Actually, Point-biserial correlations are special cases of Pearson R when one correlates a discrete variable with a continuous one. Biserial estimates are not true correlations but rather estimates that predict what the correlation between a discrete variable and a continuous variable would be if both variables had been continuous. This estimate overestimates the relationship, and the "real" correlation would then be somewhere in between the point biserial correlation and the biserial estimate. (Nunnelly & Bernstein, 1994, "Psychometric Theory").