Usually, load balancing in the cloud is done using machine utilization, and by reducing access latencies. Systems use performance feedback with information from core utilization, bandwidth usage, and contention. This way, applications are scheduled on resources that are free or those that have less usage.
The following attached works show how one can schedule applications within and across machines. A predictor selects which machine to use (GPU vs Multicore vs CPU), based on application information, and which threading/scheduling/machine choices to use within a selected machine.
Preprint HeteroMap: A Runtime Performance Predictor for Efficient Pro...
Article Efficient Situational Scheduling of Graph Workloads on Singl...