I am not sure I understand the last part of the question: easier then what?
Generally speaking the 57-64GHz reserved for indoor communications, as you probably know, because of the attenuation. The longer distances are seemed to be aligned for 71 GHz to 76 GHz, 81 GHz to 86 GHz and 92 GHz to 95 GHz bands and here the usual challenge of attenuation and rain fading remains. However at the moment I see very directive antennas being developed for this applications as it is basically free space point-to-point data transfer.
If we are talking indoors or urban canyon I would agree about the phase distribution variation. I am not sure whether this is a disadvantage though. For MIMO I assume you would need very rich channel and electric field variation within aperture of your antenna(whether this is an array of individual elements of parasitic-switch or whatever it may be) to distinguish the spatial modes.
First you must define how far you wish to go. 5G is attainable, but in reality true 4G is much closer to realisation. Now, why would that be...
To get anywhere, you need to make a real link budget of your system, and the best way is to go from energy per bit over noise, because it makes your calculations scalable. You have mostly a free space propagation, but your antenna aperture is very small. Larger apertures are obtainable through more gain ... which ends in point to point links with very high gain antennas. In handheld you are limited to low gain antennas.
Next is a choice of technology. This is required for establishing margins required for your link budget. Every technology comes with a margin of a few dB due to thermal noise etc. Some manufacturers are better than others...
In mm waves modulation is somewhat less cooperative than in lower microwave. Primitive modulations require less margins, but consume more bandwidth - which is not a problem for mm waves. More bandwidth means more noise, and south goes your margins gain. Therefore the suggested energy per bit as the basis for calculations - it is always the same, and it is scalable.
This is simple way of saying that bandwidth of signal is largely invariant for link budget calculation.
You'll find very few choices of modulation with modern mm wave devices, and frankly, you don't need anything fancy. More primitive modulations are also much more energy efficient, and that particular point is the demise of all 4G attempts in lower microwave - they are far too smart for their own good.
To get closer to the Shannon limit a better approach than modulation is coding. Say turbo codes. It also relieves your Tx from dynamically changing radio link conditions.
Power is the final frontier. It sets your link budget ceiling. You'll soon find that each bit requires a certain amount of energy to get conveyed from Tx to Rx, and everything in between is a set of complications invented to get as close to Shannon limit as possible. The only fair way of calculation is expressing the link in terms of Shannon limit, and express every difficulty as additional margin that is taken away from the link budget. At the long last, you'll be able to squeeze as many bits into the air over a given link as the power of your Tx allows.
MIMO and other complications ... are fine for indoor. In systems without high gain antennas, MIMO assures gain in case of available multipath. No multipath -> MIMO=SISO.
------------------------------------
Long story short: the Tx power limits available energy per bit, which in turn limits your link budget. Primitive modulations are your friends because they are energy efficient (more energy per bit). MIMO is fine for indoor and compensates for the lack of high gain antenna.
Well i think the main problem would be attenuation at these higher frequencies. Like 15dB/km for 60 GHz. So only smaller cells can only be designed, so in my opinion attenuation could be taken as advantage as new generation mobile systems are moving towards amaller and denser networks and cells.
Furthermore they are interference-limited. Higher attenuation means less interference. Also the available bandwidth will allow higher datarates. I think 60 GHz will be for small cell backhaul and the 39 GHz or 28 GHz will be for the 5 G mobile access. Frequencies higher than 60 GHz such as 90 GHz and 120 GHz and 240 GHz maybe for WPAN still all this is under research.
Biggest challenge would be implementation different mm-wave frequency bands on a small terminal, mainly because of the RF and antenna. we will be needing a series of novel approaches to resolve issues with power consumption, packaging, frequency conversion, architecture etc
60GHz is attenuated by oxygen, and in fact it is beneficial for complex networks with high reuse. You see, in low microwave/UHF/VHF bands most communication happens over LOS/NLOS links not in free space, and therefore with attenuation over 34*log(distance), and that is a mechanism which ensures high frequency reuse. High reuse -> more data per square km. Constant attenuation per km works even better.
Links at 70GHz and 80GHz are also free, but they are not benefiting from this attenuation. It is not a problem for point to point links over a few km distance using high gain antennas.