I am new in NGS, so I have a question, why is bad to have similar sequences in the fastq file? Would not be normal to have more than one copy of the same read?
It is normal to have overlapping reads, but the probability of have two independent reads start and stop at the same base if very low, so if multiple reads have identical starting and end bases it is taken as a sign that it is due to PCR duplication during library construction. In many analysis work flows these presumptive PCR duplicates are removed.