The 1x1 convolution is used to enable the addition used in the residual unit. You can see that within a residual unit, the first convolution increases the number of channels and there is a max-pooling operation. Therefore, in order to enable the addition, the spatial dimensions and the number of channels are made to be the same as the output of the residual unit by applying the 1x1 convolution.
Note, that in the 'middle flow' residual units, the number of channels doesn't increase in the first conv and there is no max pooling operation- hence no 1x1 conv is needed.