I have some 2D images (100*100 pixels) of gradient fields, and the target for each 2D-image is a scalar. Is there some method to do this regression in machine learning?
Convolutional neural networks (CNN) are typically the best way to address prediction tasks with images as input. For a regression case, a CNN can be trained to predict the desired scalar output for a given 2D input.
One thing to keep in mind is that CNNs are typically data-hungry: a large dataset is needed to train the network (the needed data size is dependent on the complexity of the network architecture)
One common regression problem based on images is crowd-estimation which maps an image to a scalar (number of people in image). There are many papers and resources available online for this particular problem.
Here is one implementation example using Keras in python (code not mine):
While I did NOT try running the code, I think the network structure shows a good example of a CNN regression. Compared to CNN classification, the main difference is having a single output neuron with, typically, mean squared error loss.