I am running an IRT analysis on an instrument in XCalibre, and the analysis reports substantially different means for the items than those calculated in Excel? Is there some weighting happening of which I am unaware?
I am not an expert in XCALIBRE but I have decades of experience with IRT software and I would consider it a bug if item means were substantially different when calculated by XCALIBRE. I would be worried that XCALIBRE is likely to use those means as starting points for estimating the item difficulties and reach at least somewhat wrong solutions.
In my experience, the number one cause for problems like this is user error, so I would go back and double-check. For example, in Excel, I've managed to occasionally get the range of cells wrong. Or had Excel do funny things with missing data. You should check whatever output is available from XCALIBRE regarding the data it read (e.g., BILOG prints out the first two records for your review).
If there's no error in reading the data and no errors in Excel, then I'd consider the role of missing data handling or a software bug. For example, maybe XCALIBRE is using listwise deletion while Excel is using the available data (which I think it does).
If you cannot find the answer, I'd recommend that you shoot the publishers of XCALIBRE an email explaining that this seems odd.
How different are these item means? in standard deviation terms, how different are they?
Don't rule out the possibility that it's Excel giving the incorrect mean. You should consider calculating the means independently using SPSS to see which one differs from the other two.
As Alan suggested, there is indeed likely an algorithmic reason that Xcalibre might be calculating differently. I know of one situation for sure. A few weeks ago I received a support email with the same question, and the issue was that they were running a 5-point polytomous calibration on a sample of only 36. (!!!!) Xcalibre automatically combines response levels with N=0. In that case, there was, for example, an item where no one responded as 1 or 2, only 3-4-5. The 1 and 2 levels were dropped and it was treated as a 3-option item, and since numbering starts at 0 or 1 (depending on if doing PCM or RSM approach), it gets renumbered. So the researcher anticipated an item mean of 4 or so and it was reported as 2 or so.
I'd encourage you to contact the support team about the issue.