I have been trying since last few days to design a modified radix 4 booth multiplier. I want to implement compressors, so as to reduce the delay while adding the partial products. But, I am facing a great deal of difficulty in understanding the sign extension. If I use the conventional method of sign extension, hardware will become quite large and in turn will increase the delay of the system. I have tried looking up on google, but the method that has been described doesn't seem to work. Does anyone know a proven and an efficient scheme that will reduce the delay of the system and will work alongside the compressors and parallel prefix adder?