my topic is the " fraud detection in banking sector by using data mining techniques " so i am looking for the data set in banking and how t use that data set.
I think Support Vector machine based Classification ( the class values be: fraud and genuine, so it would be a binary classification problem) , would be an appropriate ML tool in this context.
From the data repository collect the instances ( data points) , where for each data point, the feature variables be a relevant and complete as possible, set of variables that relates to the output variable, which is the class variable, in this case it is the nature of transaction, with the class values say +1 for genuine transaction and -1 for a fraudulent one.
Split the collected dataset into training set , validation set and test set, usually 70% of data points go into training and validation and remaining 30% into test dataset.
depending on the nature of the data, the penalty coefficient can be set or varied till a reasonably accurate linear SVM classifier is constructed.
It may also be necessary in some situation to apply kernel transformation on the data to be able to fit a linear classifier in the data in the transformed space.
A machine learning dataset is a collection of data that is used to train the model. A dataset acts as an example to teach the machine learning algorithm how to make predictions. ... The common types of data include: