I'm interested in know how well ChatGPT or other models of LSTMs are capable of grading a code basing on it's merits of how well designed and well coded are like a human programming teacher will. Grade a answer of a student of introduction to programming for instance. Today, we can measure time complexity and syntax, also if the algorithm produced pass or not given a set of tests. This is used to auto corrrect/feedback the student to give a better answer. But in a situation of a final exam, when feedback is given after the grading (lets assume A+, A, B, C, D, and E for fail). How well ChatGPT could grade as well as a human? In other more complex words, how positive is the correlation between a grading made by a human and a grade made by an AI?

More Allan David Garcia de Araujo's questions See All
Similar questions and discussions