You should look at the literature on content analysis to find discussions of what is known as inter-rater reliability. The general consensus you will find there is that you should compare the work of independent codings of the same data.
Good sources include the books by Krippendorff and Neuendorf.
It depends. Generally, if the task is highly subjective (like, "how toxic is still social media comment?"), I would almost suggest the following: (1) each coder codes independently, (2) you look at the cases where there is disgreement and jointly agree on the final value. (and possibly, (3) carry out another independent coding with another sample to ensure your understandings are now aligned. Repeat till you reach a good inter-rater reliability score.)
The above method is sometimes referred to as "Delphi coding", although the original idea of Delphi is to use external experts ("oracles") to evaluate the coding results and convergence into a sort of expert consensus.
If there is low expected subjectivity, meaning the categories require little interpretation, then the principle is to have the coders do the coding independently.
Some good references include:
Alonso, O. (2015). Practical Lessons for Gathering Quality Labels at Scale. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1089–1092. https://doi.org/10.1145/2766462.2776778
Salminen, J., Almerekhi, H., Dey, P., & Jansen, B. J. (2018, October 15). Inter-rater agreement for social computing studies. Proceedings of The Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS - 2018). The Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS - 2018), Valencia, Spain.