You must be careful and clear about definitions here. There is a difference between false alarm probability (FAP) and false alarm rate (FAR).
FAP is the probability of registering a detection (just one detection) given that no target is present in a given segment of an image. The segment is a patch of the image that you must also define. Presumably a segment would be a small face-sized patch extracted from a larger image. If the segment is larger than 2 or more faces (if you made segment to be the entire image, for instance, in which many faces could fit) then FAP must generally be defined as the probability of registering 1 or more faces in regions where there are no faces. In any case, FAP is always a probability between 0 and 1 – with 0 signifying high quality clutter rejection, and 1 signifying dysfunctional clutter rejection.
FAR is the number of false positives that are expected to occur in s given number of face-sized segments, or in a given entire image, taken from a given scene. In any case, the FAR is a number of FPs between 0 and infinity --- with 0 being good, and high FAR being bad of course.
FAP and FAR are related, but they remain very different quality metrics.
In the subject line of your question, you expressly want FAR. In the first equation of your question, FAP=FP/(FP+TN), however, you seem to want FAP. Note that I have changed the left side to FAP rather than FP, because the right side is an empirical estimate of FPA when FP = number of false positives, and TN = number of true negatives.
The second equation in your question, Q=FP/ total detection in the frame. Note that I have changed the right side again, this time using Q, because the right side cannot be interpreted as either a FP probability or a FP rate. Q is a quality metric, that is Q=1 for entirely false detections and Q=0 for entirely accurate detections --- hence 0 being good, and 1 being bad. You can use that metric, provided that its meaning is clearly defined, but it is an unconventional metric whose significance is not entirely clear.
I would suggest FAR = FP given one frame of imagery, or given T seconds of imagery consisting of many frames. That would be the rate of false detections made per frame, or per second of operation. It would be a number between 0 and infinity --- 0 being good, and high being bad. But you must be careful when calculating an average (expected) value for this FAR, by observing and averaging the number of FP observed during a frame sequence, because adjacent frames will generally be correlated. The averaging must be done across a range of frames (durations of time) that are independent from each other. In other words, the scene must change significantly across the range of its realistic variability when averaging FP to get the expected FAR.
The statistical definition of FAR, indeed involves TN, which is typically not computed for face detection, thus one can consider using the false discovery rate instead, i.e.
FDR = FP/(FP+TP)
which one tends to minimize; or its dual quantity, precision: