Given an extrinsic matrix, it is granted that it transforms a point in world coordinate to a point in camera coordinate.
Let E be the extrinsic matrix:
R11 R12 R13 t1
R21 R22 R23 t2
R31 R32 R33 t3
0 0 0 1
Technically, extrinsic matrix gives you the orientation of world coordinate frame with respect to camera coordinate frame. It also gives you the translation of world coordinate frame with respect to camera coordinate frame. Thus, t1 should be distance from camera coordinate origin to world coordinate origin in x direction and t2 in y direction and t3 in z direction. This indirectly means that camera coordinate origin is located -t1 in x direction, -t2 in y direction and -t3 in z direction from the world coordinate origin. That is, (-t1, -t2, -t3) is the camera coordinate origin in world coordinate frame. Camera coordinate origin should correspond to camera position and thus (-t1, -t2, -t3) should correspond to camera position in world coordinate frame. However, it isn't so.
Let Xw be the camera position in world coordinate frame and Xc be the camera position in camera coordinate frame. We need to find Xw.
Then:
E * Xw = Xc
Xc is (0,0,0) and hence:
Xw = -1 * (R)' * (t), where R (3*3) and t (3*1) are the rotation and translation components of E respectively.
Why is Xw not equal to (-t1, -t2, -t3)?