I have data having over 20000 rows.

1) I am going to use a tool to test how many error rates this tool will calculate in this data.

2) And then, I will randomly choose some data and check how many error this data have. (data will be checked one by one not by the tool in 1))

3) And I want to compare the error rates of 1) and 2) to validate the tool I used in 1) to check if that tool correctly calculated or not.

But, I don't know how much sample size is proper to validate. The data is pretty big and take too much time to check one by one in 2) . Is there any methodology to decide the sample size in this case?

More Jiwon Yoon's questions See All
Similar questions and discussions