As you surely know, the classical way in one dimension is Chi-square, where you divide the x-line in "cell". Not included in Chi-square is how to chose cell limits. if you are going top have N cells, one possibility is to divide the support of the theoretical probability density function in such a way that the area the support is equal to 1 / N, which is a maximum entropy (a fair procedure). Recall, in Chi-square you are testing an hypothesis.
I detailed the above to tell you can generalize the above to several dimensions, just divide the M.dimensional space in N equal hyper-volume cells.
in case you have reasons to assume the p.d.f. is Gaussian, you just need to estimate the co-variance matrix from the the vectors.
However, if you do not have any prior p.d.f. to test an hypothesis, a normalized histogram will have the properties of a p.d.f., and is a good approximation as long as you include enough data.
Some authors like to smooth the normalized histogram by means of a window. You may take a look at this: https://www.projectrhea.org/rhea/index.php/Parzen_Window_Density_Estimation.
Just one point more: "best" is not an analytic mathematics concept. Instead, one always says maximum or minima, in order to find the optimum, of some specified measure or quantity. Simplest and fastest are another common goals.
As you surely know, the classical way in one dimension is Chi-square, where you divide the x-line in "cell". Not included in Chi-square is how to chose cell limits. if you are going top have N cells, one possibility is to divide the support of the theoretical probability density function in such a way that the area the support is equal to 1 / N, which is a maximum entropy (a fair procedure). Recall, in Chi-square you are testing an hypothesis.
I detailed the above to tell you can generalize the above to several dimensions, just divide the M.dimensional space in N equal hyper-volume cells.
in case you have reasons to assume the p.d.f. is Gaussian, you just need to estimate the co-variance matrix from the the vectors.
However, if you do not have any prior p.d.f. to test an hypothesis, a normalized histogram will have the properties of a p.d.f., and is a good approximation as long as you include enough data.
Some authors like to smooth the normalized histogram by means of a window. You may take a look at this: https://www.projectrhea.org/rhea/index.php/Parzen_Window_Density_Estimation.
Just one point more: "best" is not an analytic mathematics concept. Instead, one always says maximum or minima, in order to find the optimum, of some specified measure or quantity. Simplest and fastest are another common goals.