The plagiarism which we are checking with the help of Turnitin software counts the similarity of usual English words even helping verbs and proverbs. If the similarity of specific terminology is counted instead of the entire usual English Sentences?
A typical plagiarism checker focuses on individual words that occur in segments using a sliding window approach(they window can be variable sized also). There are other techniques that can be used to increase the probability of a match such as:
Stemming
Synonym matches
The greater the length of the match the higher the score that marks it as plagiarism.
Not all plagiarism is easy to detect. for example these two fragments:
(1) my cat is thirsty and is drinking water
(2) I have a cat. he is thirsty and drinking water
both express ownership but have a different structure that is why a window is used instead of sentence level segments. Take the next example
(3) I own a feline. he is in need of liquid and having it
Though expressing the same thing, I am using a hypernym tactic and Sentence splitting to avoid the checker.
Other tactics used are:
Negating the antonym
Sentence fusion
word elimination
Passive/active sentence exchange
Multiple fragment repositioning in paragraph
The list is not complete but it should give you some of the challenges present for plagiarism checkers.
I've found some use of Turnitin, but also find it necessary to carefully read the submission as Turnitin often suggests elements are plagiarised when that is not the case (I'm from a law background). I don't rely on Turnitin, I simply use it as an indicator which then leads me to investigate further.