In a picture, we have enough corner points, but don't know their correspondent real object points in the world, how shall we get the camera caliberation base on the above formula?
If you don't know the according 3D coordinates, I doubt that this is possible at all. Since the world coordinate system is completely unknown in this case, you can get an arbitrary number of solutions, since each of the known 2D coordinates lie somewhere on the ray passing through the principal point and that pixel. If you know the 3D coords as it's shown in your formula, you can use a straightforward pose estimation algorithm to estimate the camera's orientation.