I think, you have to consider what types of the data will be used. Genomic, protemic, metabolomic, or images even medical information / data of the patient can be used. Each of the data has behaviour and spesific threatment before proccessed on machine learning.
It is depends on which cancer you are going to predict. Try to find some parameters (I mean risk factors of these particular cancer) by reading some articles