Dynamic voltage and frequency scaling, clock gating, task migration combined with on-silicon power and temperature monitoring have been used in centralized feedback systems with programmed power and thermal constraints. For certain families of Intel cores, for example, the processor modes change from burst to LFM to HFM, etc. as a result of TDP, Tskin, Tjmax considerations. The techniques may be any combination of open-loop, predictive, or closed-loop.
However, there are concerns of scalability (computational complexity), even when there are dedicated cores or microcontrollers that collect data and orchestrate the power/thermal actions, due to the rising number of cores, homogeneous or heterogeneous. A second reason is a single point of failure is discouraged for robust many-core systems. Yet another reason can be that there is significant delay between a temperature/power event, its measurement/detection and triggered corrective action (and transient versus steady-state effects).
There are two broad approaches possible: (1) loosely-coupled, distributed thermal management, (2) closely-coupled, distributed control. In the former, the tasks and domains (such as core versus uncore) that the distributed controllers manage are disjoint, while in the latter often focus on distributed task migration and throttling across cores.
A few public literature titles:
Thermal and Energy Management of High-Performance Multicores: Distributed and Self-Calibrating Model-Predictive Controller
Techniques for multicore thermal management: Classification and new exploration
Central vs. distributed dynamic thermal management for multicore processors: which one is better?”
DVFS (P-States), CPU throttling (C-States) and power capping (using Intel's RAPL) can be used to control processor temperature. Since a multi-core has multiple cores operating on the same socket, core temperatures can vary depending on proximity of a core to caches, memory controller etc. Currently, it is difficult to ensure thermal constraints on a per-core basis since you can only use DVFS or power capping on per socket basis, i.e. all cores will run at the same frequency. However, you can still use C-States to achieve per core temperature constraints. Changing C-states can have a much higher overhead (performance penalty).
I have been working in this area for the last 4 years or so. You might be interested in reading the following papers:
- A `cool' load balancer: http://charm.cs.illinois.edu/newPapers/11-18/paper.pdf
You may want to look at additional "domains" other than CPU cores for power management. Modern SOCs have graphics, imaging, memory, high-speed I/O, sets of cores as different domains and can be designed for per-domain control.
Often somethings are not enabled because validation (pre-si or post-si) are too complex (hence also interesting!).
There are many silicon constraints as well. Inverse temperature effect is one of them.