the mice library in R (cran.r-project.org/web/packages/mice/mice.pdf) is doing the trick. Otherwise you can have a look at the publication of Horton and Kleinman about comparison of missing data methods and software to fit incomplete data regression models (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839993/).
In general most software can handle ignorable missing data. A missing data mechanism which only depends on the observed data is managable, and using imputation techniques will result in more robust analysis results.
With non-monotone missing patterns the chained equation method is the easiest to use. It may be possible to use the EM-algorithm or you may in some cases find a full-information maximum likelihood method, but the chained equations approach is easier.
There are some recommendations in the litterature regarding the number of MI replications and the number of burn-in iterations needed. These are general recommendations and though may be valid in general, should be tested in a particular situation. The sensitivity analysis is quite easy, you just run some simulations with longer burn-in periods and study the result on the characteristics of you analysis. You can also do some analysis with more MI-imputations and study the effect of that on your analysis.
With non-ignorable missing data the situation is more complex. In order to make good imputations you will have to do some modeling of the missing data mechanism. The assumptions you make during imputation are not testable with non-ignorable data so a sensitivity analysis is much more important than in the ignorable case. Here you should do analysis where you alter your model for the missing data, and analyze the effect of those alterations on your analysis.
In some sense sensitivity analysis is easy, you vary some parameters to see if the analysis results are stable.
In the ignorable case these parameters has to do with the convergence of the Markov chain, and less to do with your analysis model. As I understand it the results of your analysis will only be weakly dependent (or perhaps not at all) on your imputation technique.
With non-ignorable missing data more complexity is added to the sensitivity analysis since it also includes the model for the missing data mechanism. Also the analysis results will be dependent of you model for the missing data mechanism. It's a rather different situation than in the ignorable case.
I hope you find these considerations useful. Regardless of which package you use and the extra effort, I think you will find the results interesting.
My view on this type of analysis is purely statistical, and there may be possible to make other arguments within certain research fields. However, from a probabilistic/mathematical viewpoint this is how MI works.