How low is low? Are we talking .70, .60, .50, or something even lower?
You may not have done anything wrong. Let's see:
1. The negative scores should have no impact on your results. If you want to be 100% sure, you could simply add 4 points to every score & re-run the calculations. But it won't make any difference.
2. I presume you're using some sort of statistical software package to run Cronbach's alpha. If you are doing it by hand, there is a possibility that you've made an error somewhere along the line. Forgive me for saying so, but many of us slip up when handling lots of negative numbers and so forth in a cumbersome equation.
3. The Barrett-Lennard is usually either a 64-item or a 40-item scale, but I gather that he introduced this 24-item scale a couple of years ago? You are correct: there are just 12 scored items, and these constitute the empathy scale. The others are just "filler." They are drawn from the other scales of the complete instrument. They are supposed to give the new short test the "feel" of the complete one. The author acknowledges, in his 2015 book (published by Wiley), that he doesn't have actual data regarding the new scale, but he offers the working assumption that it will be comparable to the original test. This is a perfectly reasonable working assumption - unless and until evidence proves it erroneous.
My guess, then, is that one of two things have happened. Either you've been unlucky in your choice of participants & gotten some random responders who messed things up, or you've collected evidence that the empathy scale can't "stand alone" without adversely affecting the reliability of the measure.
How is your sample? If your sample could be expected to be rather homogeneous regarding empathy (say a sample of psychology students), the variance of the true score would be low. Reliability (of which alpha is thought to be a lower bound if the classical measurement model holds) is high if the true score variance is large relative to the random noise associated to the measurement process. So, by restricting the range of the true score (by using a homogeneous population) the true score variance is lowered too. If random noise would be unaffected, so, the reliability is lower.
There is a debate wether reliability should be seen as a property of the instrument or of the sample. IMO it makes more sense to think of reliability as aproperty of the instrument. From this point of view it may be ok to use the instrument even if you have low alpha in your sample, if the instrument has been shown to have appropriate psycometric properties in studies designed to evaluate it.
It may however be, that the intercorrelations of the long form were low from the start.
Then the short form may not be expected to have high alpha, as alpha is a function of item intercorrelation and number of items. You may use the spearman brown predictions formula to estimate the effect of shortening the test. If the original test with 64 Items had alpha 0.8, then your test with 12 Items would be predicted to have 0.429 which seems to be exactly what you have. However, I don't know the properties of the test, you can check for yourself what to expect due to shortening.
Andrea's first point is quite good. You can't have correlation without variability, so if the scores don't vary much, then apparent internal consistency will be low. For example, if everyone in your sample rated an item the same way, it won't correlate with anything & will lower the alpha. That's easily checked by examining the descriptive statistics for the items, which can be generated as an option when you run a reliability analysis in SPSS.
Her last suggestion (that it might be an issue relating to a shortened scale) is also excellent in principle, but I don't think it applies here. As best I understand it, the 12 empathy items you used are the only empathy items on the longer versions of the scale. The author just dropped the other scales. Is that correct? If so, then it can't be a scale-shortening artifact.
Besides, even if you took a 48-item scale and reduced it to 12 items randomly, to end up with reliability of .43 you'd have to have started at about .75. That's not terrible, but it's fairly low for a 48-item scale. And nobody drops items randomly these days - the authors would selectively retain the strongest correlates of the original, preserving much of the reliability.
Cronbach's alpha is a function of the number of items. Using only a subset of items you would expect (if the whole scale is uni-dimensional) a lower alpha than for the whole scale. Cronbach's alpha is not a measure of, for example, the average correlation. It is often mis-interpreted.
Cronbach's alpha coefficient becomes the default method to estimate the test scores reliabilty without checking the required assumptions like the essentially Tau-Equivalent, violating this assumption leads to the undrestimation of Cronbach's alpha coefficient.
Make sure that you have properly cleaned your data before performing statistical analysis, check the outliers in your data. If everything is fine then its better to go for Exploratory Factor Analysis,, so that you may learn the factor structure of the items of scale.