In the paper "Apache Hadoop YARN: Yet Another Resource Negotiator", it claims that Yarn can extend node number from 4000 in Hadoop 1.x to over 7000. My question is that what is the key principle of improving scalability behind YARN.
In Hadoop 1.x, there are centralized NameNode and centralized JobTracker. It seems that things haven't changed much in YARN since it also has centralized ResourceManager. The bottleneck in Hadoop is still the design of centralized control.