Shouldn't the following be excluded from plagiarism checking: title, author names, authors' affiliations, abstract keywords, tests sub/titles, equipment names, figure legends, statistical analysis methodology.
Similarity will certainly contribute to a higher plagiarism index. However, I believe that some parts of any article nowadays cannot be made unique no matter your paraphrasing ability. This is simply because of the great number of papers published yearly in one's field of interest.
Some parts (like test names, equipment's names, chemicals' names, statistical analysis paragraph..etc) should not be made unique. They cannot be altered and no specific citation should be added for any of them usually.
I could not find any guideline on what to include for plagiarism checking or whether publishers check the article full for plagiarism or exclude somethings. Moreover, do they consider matches with papers published in predatory journals?! Does someone have any full list of all predatory publishers (one that is updated in real time)?
I think with more and more papers published year by year, plagiarism tools need to be tweaked or a higher plagiarism index should be accepted. When there were only few publishers online, I guess the system was somewhat efficient, but is it still efficient with hundreds of publishers publishing thousands of papers yearly, and even publishing conference proceedings online from hundreds of conferences worldwide (ALL in English). There must come a time when sentences overlap, right?