Dear Authors,
my compliments for your nice study:
Article Studying the difference between natural and programming lang...
I have been collecting software models - mostly UML (now about 25.000 designs) , and recently Software Architecture Documents (SAD's) mostly from open source projects (GitHub)(*1).
I would suspect that there is almost no repetition in designs of software. Would this justify that the repetition in source code is because of the repetition of 'solutions-patterns'?
Our dataset is public, so I am happy to compare/study with your analyses.
(*1) Hebig, Regina, Truong Ho Quang, Michel RV Chaudron, Gregorio Robles, and Miguel Angel Fernandez. "The quest for open source projects that use UML: mining GitHub." In Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, pp. 173-183. ACM, 2016.