I assume your algorithm detects objects on images (e.g., faces, ...).
You can count the true positives (TP), false positives (FP) and false negatives (FN).
Where the TP are the number correct detections of your method that coincide with the ground-truth, FP are the number of false detections (i.e., your method thinks there is an object where there is not) and FN are the number of ground-truth objects that your method has missed.
You can consider an automated detection as a TP if it has high overlap with the ground-truth object (you can use the Dice ratio as overlap measure).
You can just provide the TP/FP/FN values or draw a ROC curve (TP, FP).
The most appropriate way is to compare the detection of your algorithm is with a ground thruth. The ground truth is ususally the deliniation or the detection which is done manually by experts (observers). So if you have a number of images, you may ask some experts to deliniate them. Then you can compare the manual tracings with your automated segmenattaion tracings. Thgere are a nuber of metrics that can be used towards this direction, among them the true-positve, true negative, false positive and false negatives as said by Gewrard before. There are also other metrics that can be used such as the overalp, the willimas index, the men square error, and many more. Please have a look in a recent publication of ours where we compare manual and automated tracings in order to evalaute the performance of a video segmenattion algorithm.
C.P. Loizou, S. Petroudi, C.S. Pattichis, M. Pantziaris, A.N. Nicolaides, “An integrated system for the segmentation of atherosclerotic carotid plaque in ultrasound video”, IEEE Trans. Ultras. Ferroel. Freq. Contr., vol. 61, no. 1, pp. 86-101, 2014.
You may also have alook at another publication of our where similar problem was talked:
C.P. Loizou, C.S. Pattichis, M. Pantziaris, A. Nicolaides, “An integrated system for the segmentation of atherosclerotic carotid plaque,” IEEE Trans. Inform. Techn. Biomed.,” vol. 11, no. 6, pp. 661-667, 2007.
The guidelines are practically a industry standard and outline good practice for the manual labeling of data for the use in an evaluation benchmark, and also outline the evaluation protocol. Pay careful attention to the detail, such as; if two or more predictions overlap the ground truth by 50%, only the one with the highest score is the a true positive, the rest are false positives; and you may label some of your ground truth with meta-labels such as "difficult", or "truncated", allowing you to evaluate on different sets, to get a better picture of where your approach succeeds and fails.
If you have Matlab, then you can try their evaluation toolkit:
The actual workshops centered around the challenges have stopped running since the passing of Mark Everingham, R.I.P, one of the key people behind the benchmark.