This kind of task is not trivial, and several techniques have been proposed in literature, even though i personally don't know any commercial or research software which is able to accomplish this task efficiently. The best method to apply usually depends on the type of a priori knowledge you have about the mixed audio signal. The domain is usually a time-frequency representation of the mixed signal. In the field of Blind Audio Source Separation (BASS) methods, in which we assume we don't have any knowledge about the input sources, techniques such as Independent Component Analysis (ICA), Non-Negative Matrix Factorization (NMF), or statistical approaches like Hidden Markov Model (HMM) are generally employed. If you have some knowledge of the recorded voiced signal, this can help in developing specific spectral masking/subtraction techniques allowing to separate the spectral content of recorded voice in the time-frequency representation of the mixed signal, which is modeled as a sparse matrix of pitched events and related harmonic patterns. However, since real sounds are composed by a comb-pattern of harmonics exponentially spaced in frequency, it is (still) almost impossible to resolve the harmonic overlap problem in the mixed signal, so that a distortion will be inevitably introduced in the separated signals after processing.