First, you could filter for low-expressed genes (e.g., set a cutoff of average 10 reads across samples, remove genes with zero counts etc). Then, you could select the top n most variable genes across samples.
If your experimental design includes distinctive groups (e.g., treated vs control) you could estimate differential expression with packages like DESeq2 or EdgeR in R.
Genes cooperate in gene expression networks, thus the main point is that the 'actual dimensionality' of the system is drastically reduced with respect to the number of genes. The classical way (applicable in any situation with a very high number of concurring elemnts) is like this:
1. Consider samples as variables and genes as statistical units, then compute PCA on the resulting matrix, if the samples refer to the same tissue you will come out with a by far main first principal component in order of percent of explained variance (it is the so called 'batch effect', correspondent to the fact each tissue or celular kind has an 'ideal profile' of gene eexpression levels). Consider the subsequent components (each other linear independent by construction and thus pointing to indpendent biological latent concepts), until you reach the 'noise floor' (only noise, i.e. very minor part of variance explained by minor components). let's imagine you end up with 3 principal components.
2. Each component has associated its loading profile. Loadings are the correlation coefficients between variables (samples in your case) and components. Higher loadings (in absoluye value) mark the leading variables for component interpretation. If you have, let's imagine, cancer and normal samples, the component in which cancer and normal samples have opposite sign loadings is 'cancer effect'. At this point look at the score of genes on different components, if you extracted the principal components from between samples correlation (not covariance) matrix your component scores will have the form of z-scores (zero mean and unit standard deviation), select those gens having a score > 2 (or < -2) and you will get what you want.
Here some (out of the many) examples:
Article Mining gene expression data by interpreting principal components
If your intention is to select the top genes before applying statical analysis, you need to remove genes with 0 counts and those with low counts across samples. After applying edgeR for example, you may sort genes by FDR value and Log2FC