I am tried to figure out the ways to label the click logs from the advertising log data for ad-fraud detection. Label are Fraud, not fraud.

One way which comes to my mind is rule-based meaning i manually analyze few 100 records and accuse them of being fraud if certain conditions met!

Secondly i could use clustering algos, i tried it but couldn't get it right because i have mixed categorical and numeric data (i tried K-means and K-medoids) but data is very sparse and categorical features tends to expand over time while using one-hot encoding).

Any idea to label the data automatically/algorithmically?

Thanks,

Similar questions and discussions