Monitors are variable in body pattern, you can recognize individuals by colouration details. Why not to use camera traps or primary photofixation (photographing all individuals in the area) and monitor them electronically after (I mean install traps and use "face recognition" software to compare individuals you will see on the photos)
Camera traps sound the best method to me too, unless you have the resources to use genetic means of identifiaction. Whatever method you use, the first step is probably to talk to community leaders to explain that your work can be used to help the monitors and hopefully get some public support from them for your work. People can be very protective of animals, which is usually a good thing but even here in Britain I have had people threaten me for disturbing "their" amphibians during a survey.
If the animals have cultural significance and you are not allowed to handle them, then I agree with D. O'Brien about camera traps. If you do not have an adequate budget for these expensive traps, I suggest using Visual Encounter Transect Surveys with a reasonable pair of binoculars to ID specific individuals based on pattern and territory. It really depends on the question as to exact ways to execute the VEs transects, but as far as population estimation, I would recommend VESes. Good luck!
See Welbourne, D 2014 Chapter 20 in Meek & Fleming Camera Trapping fof Wilflife management & Research for some ideas. see link
Using camera traps in a grid or randomised placement, then you can estimate density with spatially explicit occupancy models or Spatially explicit mark-recapture, depending on how detailed your photos are- known individuals or just known species. see Ramsey et al 2015 link