16 October 2023 3 5K Report

I have two different datasets with same features. One is for training dataset and other one is for test data. In training dataset I have around 13 lakh records and in test data I have around 5 lakh records. Due to this much amount of data I am facing difficulty in implementing the algorithms and it consumes so much time. I want to know is there any way through which I can make my training and test dataset small in size by maintaining feature value ratio corresponds to the target variable. Is there any free tool or online software or website is avialble which can help me in this?

I tried to implement code in python, but it doesn't maintain same ratio. For example if x feature has 2 values, say x1 and x2. x1 value is occurring corresponds to target variable with value 1 for 55% and x2 value is occurring for the same variable lets say 45%. Now I am trying to create a new dataset from my training dataset and I want to reduce size from 13 lakh records to 3 lakh records. After implementing the code I get this value for x1 and x2 53% and 47% around respectively. I tried to search on internet, it says it would be difficult to maintain ratio and you may not get same ratio as before implementation.

anyone can help on this?

More D. Shah's questions See All
Similar questions and discussions