There is no definite rule as it depends on the case under consideration. For example, to classify images of digits from the MNIST database, which are 28 by 28 pixel black and white images, a good choice is to use 20 filters of size 9 by 9 (reference: MATLAB Deep Learning by P. Kim). This number of filters will be equal to the number of feature maps obtained In the first convolutional layer. Other types of images may require more or fewer feature maps depending on how structured the images are.
The size of filters has to do with your data and the patterns expected to be recognized at each layer. Visual inspection of the filters learned could help gain an intuition of what the network learns. But, sometimes, the filters learned are difficult to interpret by visual inspection. Usually, you could try architectures that have proven to work for a particular task. Similarly, you could try first filter sizes of previous networks. Still, depending on your data, you can optimize the size of the filters by monitoring your train and test sets. For 2-D signals, you can systematically search the space in width and length and see how the results vary. The number of filters might be related to capturing variation in your data. Again, try first known architectures, and change the number of filters monitoring your train and test sets. You can systematically increase the width and depth of your network.