in NOMA to detect the composed signal, low power user need to detect all higher power users and cancel them before detecting its own. Why this low user does not detect its message by cancelling all other messages together (jointly)?
In the basic NOMA concept, initially SIC was proposed to be used at the cell center (low power signal) user to decode and cancel all high power signals of paired users. Therefore, most of the research is being carried out on SIC. But this doesn't mean that other methods are not being evaluated. And you are exactly right about the computational complexity issue of SIC.
As you asked about joint modulation/demodulation, this method is also being evaluated in a couple of papers. For example, in the following paper titled "Receiver Design for Downlink Non-Orthogonal
Multiple Access (NOMA)", you can find a good comparative analysis of ideal SIC, codeword level SIC, symbol level SIC, and joint modulation/demodulation (what you mentioned) based interference cancellation.
Similarly, there are also other methods (gray labeling, maximum likelihood, triangular SIC, etc) regarding interference cancellation at the cell center (low power signal) user. Gray labeling based decoding is also one interesting approach as mentioned in some papers. You can read the paper titled "On Gray Labeling for Downlink Non-Orthogonal Multiple Access Without SIC" from the following link
This is known as the very strong or strong interference channel model. In this model, the desired destination can detect the strong signal at first. Then, this signal can be easily removed from the received signal. Finally, the destination can decode the intended signal. In information theory, joint decoding can be used. However, practically, successive interference cancellation is normally used.
In the basic NOMA concept, initially SIC was proposed to be used at the cell center (low power signal) user to decode and cancel all high power signals of paired users. Therefore, most of the research is being carried out on SIC. But this doesn't mean that other methods are not being evaluated. And you are exactly right about the computational complexity issue of SIC.
As you asked about joint modulation/demodulation, this method is also being evaluated in a couple of papers. For example, in the following paper titled "Receiver Design for Downlink Non-Orthogonal
Multiple Access (NOMA)", you can find a good comparative analysis of ideal SIC, codeword level SIC, symbol level SIC, and joint modulation/demodulation (what you mentioned) based interference cancellation.
Similarly, there are also other methods (gray labeling, maximum likelihood, triangular SIC, etc) regarding interference cancellation at the cell center (low power signal) user. Gray labeling based decoding is also one interesting approach as mentioned in some papers. You can read the paper titled "On Gray Labeling for Downlink Non-Orthogonal Multiple Access Without SIC" from the following link
SIC is one of good Option....There are some other options available in the literature...NOMA is not Standard yet...Many researchers are trying to find best.....like NOMA with or without OMA is one of big debate....
I would like to stress that, the successive interference cancellation is a concept to separate the the signals in the amplitude domain. It is so, that one signal is strong and the other signal is weak such as two multiplexed signals radiated from base station one with strong power S2 seeking the user2 at the edge of the cell while the other is seeking the user 1 near the base sation with power S1.
So, the combined power will S=S2+S1,
Assuming the channel power gain of the near user is H1^2 and that of the far user is H2^2,
Then the received signal at the near used = H1^2 S +N= H1^2 S2 + H162 S1 +N,
Since S2 is much larger than S1, Si can be considered as an additional noise signal effectively increasing the noise power level if it is random like noise. So, the condition that S2 is much greater than the interferer +noise one can decode S2 and determine an estimate to it. This estimate will be more exact if the signal to noise is larger than certain threshold. And this is the basis for the successive interference cancellation.
After estimating S2, one can estimate S1. So, the mathematical complexity is moderate since it is based on approximations. Exact solutions will take more mathematical efforts than the successive interference cancellation.
In the position of the far user,
The received power= H2^2 S1+H2^2S2+ N, since S1 is much smaller than S2 , S1 will appear as noise as N and one can get a good estimate to S2 provided S2 is larger than the threshold value of noise and interference.
SIC will introduce additional complexity and delay. Base on Shannon's theory, more complexity and delay will provide a better system. 1. Direct demod will result in error propagation, errors will occur in pairs in both signals. For SIC, when there is sufficient CNR to demod/decode the less robust signal, the robust signal should have no error at all. So error propagation will not occur. 2. If SIC used, the power level of the to signal can be quite close, the constellations can even overlap and cross the quadrant, the signals can still be successfully retrieved. For direct demod, the constellation can not overlap, or very complected demod process needs to be used. The power distributed to each signal must have larger difference. So SIC can more flexible allocate powers between the two signals.
Another issue on complexity: When you have sufficient SNR to demod/decode both signals. The error correction for robust signal should be very easy. For Turbo or LDPC code with iterative decoding, only 3-5 iterations are needed rather than 20 to 50 iterations at close to CNR threshold. So decoding two signal was not double the FEC complexity. It needs only about 10% computation power, and about 15% more memory in comparison to single layer system.
In NOMA, many users can use the same frequency bandwidth whereas they are distinguished by the power value that allocated respectively. While decoding, say user k, the decoder will delect the signals of users with lower powers by SIC, say, user 1 to (k-1). For the users with higher power, it will be treated as the noises.
In 5G, as users can share the same frequency bandwidth, the SE can greatly enhanced in such a scheme, while 5G is calling for even higher transmission rate.
Information Theoretically, both JD and SIC achieves capacity for MAC channel. (meaning that both can achieve same performance). However SIC is easy to implement (SC at Tx. and SIC at the Rx.), and JD requires complex encoding and decoding. (Joint code book design and Joint ML decoding.)