Healthcare data analysis makes sense only if the healthcare business problem is clearly stated. Data collection, organizing, and method/algorithm for analysis make sense only within the healthcare business context or specific problem to be solved.
See, for example, the attached chapter "Management Science for Healthcare Applications". You could replace 'management science' with 'data analytics', 'business analytics', 'management engineering', 'operations management'. All these terms have a similar meaning.
1. ideally, they should be arranged as records, but with identifiers that allow the source to be recognized. Consider any variable that allows traceability.
With respect to how it is collected, do you mean how it is recorded (paper/digital), I think it depends on the sector (geographic) where it is located. although digital media is better, it improves the possibility of error and is relevant.
2, what are you looking for? what is the question or what is the problem to be solved?
When you answer it you will be able to define what to do.
3 & 4 depends on the previous point,
outliers can make sense in a context,
example,
when a prevalence analysis is made, and an event appears that alters the prevalence, then what do you do with the outlier?
remember that there are probability analyses for the occurrence of outliers,
such as Frechet distribution, inverted Weibull.
This is why a question is required to know what to analyze and how.