Presented below are two patterns created from the data set: 1024 cities that follow precisely Zipf’s law, which implies that the first largest city is size 1, the second largest city is size 1/2, the third largest city is size 1/3, and the smallest city is size 1/1024. The left pattern was created from head/tail breaks (https://en.wikipedia.org/wiki/Head/tail_Breaks) that is a new classification scheme for data with a heavy-tailed distribution (Jiang 2013). The classification scheme partitions the data around its average into two unbalanced parts: those above the average for big values in the head (a minority), and those below the average for small values in the tail (a majority); this partition process continues recursively for the head or the large values until the notion of far more small values than large ones is violated. For this particular data, it ends up with 5 classes. The right pattern is created by natural breaks (Jenks 1967), with which variance within classes is minimized, while the variance between classes are maximized. The natural breaks is in fact the same as k-means clustering (https://en.wikipedia.org/wiki/K-means_clustering).

Jenks G. F. (1967), The data model concept in statistical mapping, International Yearbook of Cartography, 7, 186–190.

Jiang B. (2013), Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution, The Professional Geographer, 65 (3), 482 – 494.

I have argued that the head/tail breaks helps revealing the underlying scaling of far more small values than large ones for data with a heavy-tailed distribution (Jiang 2013). My question to you is, do you feel more alive with the left pattern that is created by head/tail breaks? Or alternatively, do you feel the left pattern is more beautiful? do you feel you are more comfortable with the left rather than right pattern?

More Bin Jiang's questions See All
Similar questions and discussions