I was wondering whether anyone knows about an automated tool to collect GPU kernels features, i.e., stencil dimension, size, operations, etc. Such tools are widely available for CPU kernels.
If you are using NVidia GPUs and want to collet this metrics during execution, you can profile them with nvprof, which comes with CUDA Toolkit. nvprof has some metrics and events that can count the amount of instructions.
Well, I only work with CUDA GPUs for servers, so i'm not aware of how things works in embedded/mobile platforms.
But basically you would need a profiler. If you are dealing with smartphones, you coud try the Snapdragon/Adreno profiler
https://developer.qualcomm.com/download/software
Also, looking at their software I see that they provide an LLVM compiler. If there is an open source version available, you can modify it to obtain the features you want.