A principal components model by definition accounts for all of the variance in the data, since the number of principal components generated is always equal to the number of starting variables. Each principal component contains a loading from all the starting variables. Common heuristics used to identify "useful" principal components include (i) those having eigenvalues > 1, and (ii) those contributing to the first ca. 80% of cumulative variance explained.
It's very important that the variables have similar scales, and are free of extreme values. It's common to transform the data so that all variables are approximately normally distributed (this can also deal with extreme values), and also to scale (e.g. to z-scores) to avoid large differences in scale between variables.
(see Reimann, C., Filzmoser, P., Garrett, R. G., & Dutter, R. (2008). Statistical Data Analysis Explained: Applied Environmental Statistics with R (First ed.). Chichester, England: John Wiley & Sons.).
The answer to the above question is yes. PCA maximizes the variances of the principal components. The problem is, for example, formulated in the book: David J. Hand, Heikki Mannila, and Padhraic Smyth. Principles of data mining. MIT press, 2001. I tried to present it at work: https://www.researchgate.net/publication/319469038_New_Interpretation_of_Principal_Components_Analysis.
Article New Interpretation of Principal Components Analysis
Yes. More generally, the first k principal components (where k can be 1, 2, 3 etc.) explain the most variance any k variables can explain, and the last k variables explain the least variance any k variables can explain.