I think while it is not unusual to test through a tool, it would rather be careful of knowing the algorithm. Parsing is easy, but outcomes would accrue based on the internal architecture and algorithm of the tool. Perhaps benchmarking of algorithms would benefit.