Acoustic-phonetic production experiments often report relative segment durations (rather than absolute durations), mostly because relative durations are less prone to influences from speaking rate.
Typical reference units for normalization in the literature are:
1) units that contain the target segment (e.g., the syllable, the word, the phrase)
2) units that are adjacent to the target segment (e.g., sounds or words to the right or left)
3) the average phone duration in the respective phrase
Depending on the structure of the utterance and/or the nature of the target segment (e.g., phonemically long vs. short), differences across experimental conditions may appear larger or smaller (depending on whether the duration of the reference unit is negatively or positively correlated with the duration of the target).
Are there theoretical considerations that speak for (or against) one of those units of reference? Or do we need perception data in order to decide which relative measure participants are sensitive to? Should we always collect recordings in different speech rates in order to identify relative durations that are not (or least) influenced by the speaking rate manipulation?