The idea is very simple. Thermodynamical systems normally can have number of competing states, in which different distribution of atoms takes place. MC is statistical method, aimed to generate some of these states, and AVERAGE some demanded property over these states. Accuracy of the result depends on number of generated states, and correctness of the model. 1) One needs the procedure, which generates different states (for example, by random shift/insertion/removal of atoms one by one). 2) In this procedure the states should appear with the same probability, as they in reality do (it is governed by special selection rules, like popular Metropolis rule, and it is the most sophisticated point to learn). 3) One averages demanded property over these states. For example, one can simulate nanoparticle in vacuum, and measure its average density, generating different atom packings. In simulation these states should appear with the probability that is proportional to the Boltzmann factor of potential energy of the given packing. Important question is how to find potential energy. It can be found as a sum of special (empirical) functions: potentials, whose value depends only on the spacial distribution of atoms. Choice of potentials is the key point, because it determines physical behavior of the system.
The best way to understand how to apply a Monte Carlo simulation is to find a ready program from the Internet or from relative books applied to a physical model, e.g. Ising model.
I can suggest the REMC method, developed by Lisal and Smith. You can find an application in the attached paper. The method is quite simple, sampling randomly, using metropolis algorithm, different configurations of the phase space, evaluating the probability of a transition according the grand partition function. It includes also the interaction energy to evaluate non-ideal effects.
It depends on whether you are thinking of a system in equilibrium or out of equilibrium. The latter is a more complex problem and the MC simulation is normally used for the former system. It is an established method and the idea is to number all the configurations and pick one randomly by picking a number randomly. The configuration is given a probability proportional to its Bolzmann weight and is then used to calculate its contribution to calculate any thermodynamic average.
There are several textbooks offering extensive exposures to the topic and setting things straight. There is not just one "Monte Carlo" and the particular way you do it (moves, growth of molecules, rotations, acceptance criteria) are all interconnected. Particular care must be taken in maintaining detailed balance when you do MC, otherwise you are not sampling equilibrium.
I suggest the book of Frenkel and Smit, "Understanding Molecular Simulation".
You can read some of the chapters from Gould and Tobochinik, An Introduction to Computer Simulation methods. This book has lot of examples and codes which can help you to understand the method.
Monte Carlo method is based on probability distribution of any event. We know that tossing a coin is having probability of 0.5 of HEAD and 0.5 TAIL. If you toss the coin that 10 times, that doesn't mean that 50% times HEAD will come. However, if you increase the number of toss, this may converge to 50%. That is why and where, the Monte Carlo work to explain laws of statistical thermodynamics, as the systems are having large number of particles.
As explained by Anatoly and Gujrati, it predicts the AVERAGE properties of the system and work good to explain the equilibrium properties.