All depends of your research question and objective. The important part is finding commun concepts to different codes et rephrase your categories to group different codes together. This should then be validated by at least one other independant searcher who also coded the interviews. It is also important to find a hierarchie within your codes. For exemple we could consider that "lack of compliance" is a first level category of "not buying medication", "Forgeting to take medication", and "Preventing side effects". In turn "not buying medication" could be related to a third level such as "Financial difficulties", "Not trusting drugs", "Waiting for signs of the disease before buying medication". This structure can be found within the interviews.
It is then important that your final coding is supported by the text and can be validated by participants.
Few thoughts because as suggested earlier, it depends. You are probably right, 3.000 codes or nodes in a 12 semi-structured interview set up means something went wrong. You need a good advisor or consultant.
Not sure you are using qualitative research software but this over coding is often the result of believing that counting is what count in qualitative research. My advice would be to start thinking to what extend your coding informs the research question you had at the start of the research or the question that was made clear after you collected this data. Since this is a semistructured interview, you definitely have a set of categories. To what extend some of those codes inform the categories. Who coded? As suggested earlier, find others to look at your data and your codes. Coding is just one piece of the process: thinking throughout about potential hypothesis is central in informing further coding. When you think about trustworthiness, think about triangulating your analysis. It is not about validity (an empirical term that often doesn´t apply to interpretative methods.
One more thing, if you are thinking hermeneutics, think of coding as one piece of the circular process, take distance and look at your coding as a new dataset, see what the patterns are in that coding, go for member checking to see where you´re biases are, etc.
Sounds like you've done a great micro-analysis and now you should build up to more macro levels......(as Nelleke says this will help with how your work progresses).When you do analysis with pen and paper (rather than software) this is the level analysis begins with and I think its indicative of good and thorough analysis.
I would shift your attention to coding your codes now (it sounds like the work you have done has tied your coding very thoroughly to the original text), so you should feel confident now about this shift in analytic attention. It might feel a bit weird but look think about how your codes relate to each other.
I think analysis should be fluid and iterative and half the fun is exploring it and playing.
I'm agree with Sluisveld and Vaucher. YI am sure you do not make mistakes. you have to make the grouping more on the answer from the code that you have previously made. The same codes are grouped into one group, and so on ... I believe you can.
3000 sounds too many. I wouldn't say it's wrong as such but rather you need a number which allows you to make sense of the data and to understand how codes may interrelate or impact on each other. You need to find a way of breaking down the transcripts into workable categories.
Trustworthyness does sound nicer than validity! Thank you Gonzalo for pointing this out.
As I understand it, you mean that asking participants to comment the final conceptualisation of the question is a form of new analysis that in turn can serve the model. As such we then triangulate two different methods. Each participant should find his position within the constructed conceptual framework. Do you have any exemples of how to formulate such an assessment? Is simply asking participants if they recognize their position within the framework enough? Can this be done by asking them to write short answers to a written questionnaire?
Thank you to everyone who took the time to comment, really very much appreciated. I employed the use of semi-structured interviews and followed a 13 qeustion interview schedule, with additional prompts. I am the only person analysing the data. I took the intstruction litterally and coded the 12 interviews (approx 1.5 hours length) line by line, hence the amount of codes. Unfortunately I didn't know I could use generic codes that had surfaced from the data, such as 'education' so each code is personalised to the participant, although will belong to a generic group. I am using HyperResearch sofware to code and analyse my data so all the codes are tied specifically to the text - phew. I am now moving the codes into categories, although they currently stand at 30? I presume the categories will then form heirarchies within themes? Any gratefully received. Tara
There is an article about the temptation of over-coding with CAQDAS somewhere. Be careful, line by line coding is sort of like checking for the atoms to understand your substance. In a semi-structured interview, often what matters most if how each individual makes meaning. Here you would have 12 cases. Then you can look at patterns in the meaning making process. A bunch of codes of things say could make you miss what people really mean in their statements. Qualitative data analysis is much harder than quantitative precisely because what amounts as data varies as you carry on the analysis. Have you read the practical advice provided by Saldana's book on coding? However I find the book on analyzing social settings still much more useful. Without knowing your theoretical approach to the analysis, it is difficult to advice. You may actually try now a different approach to the data analysis and then compare with the initial coding and see what emerges. That would help you triangulate and would be the equivalent of having two views of the data.
Paul,
there are plenty of examples in several of the qualitative research journals and some more methodological ones in journals like QUaltative Inquiry or Qualitative Research in Psychology, etc.
Yes, going back to participants and asking how they make sense of what you concluded is a good way of checking if your analysis is grounded in the meaning making process of people rather than some methodological assumption you make as the researcher (often of the kind "if I find a lot of these words about X then it must be Y") when what really matters is what makes sense and it is significant, and the later is not related to a number of times something emerges. The indicators can be many others. Indeed, if you are not listening again to your interviews and just staying with the transcripts then you may really miss the analogical aspects. In this regard, Tara, did write a fieldnote after each interview with your initial analysis. One of the most important aspects that anthropologists know well but others miss thinking that everything will be in the transcript. (remember, the transcript is just one piece of data and actually, one piece of analysis too)
After this micro coding, please take a step back and look at the data from a more macro perspective. See if your micro codes can be subsumed under larger categories. Also, please pay attention to not just what was said, but how it was said. You will probably be able to identify some major themes with micro codes describing variation in the major themes. Hope this makes sense. Is anyone else going to be coding the data? Perhaps think about inter-rater reliability.
Hey guys, to answer your questions, I understood that one coded meaningful segments, whether that be a word, sentence or paragraph. I have completed my interviews and have really rich and detailed data. I believe where I may have somewhat fallen down is that I could have applied generic codes to my data, so for instance 'education' 'childhood' 'adolescence', rather than personalising each interview.
I did not intend to code 3000 codes and by no means realised that this was not the norm until I blew my software. I am now beta testing the next version as it can cope with my codes.
I am using thematic analysis so have read Braun and Clark and Howitt and Cramer.
Unfortunately going back to my participants is not an option.
I confirm that I have written field notes but not specifically with regard to interviews.
However, I also understand that I can apply a bottom up approach and a top down approach. So whilst I might have dealt with the minutiae, I can always gather the essence and compare the two.
In several of the CAQDAS, you don´t need to code words anymore, a couple of clicks and you can get the frequencies of all the words organized, this plus how they may connect to other words. The question then is how to make sense of all that. The aha moments are iterative and I assume you have had a few including the notion of going back to larger categories. I am curious about what your proposal contain in terms if data analysis strategy.
Hey Conzalo, my research proposal stated IPA analysis as this is was I really wanted to do but time constraints have meant that I have to do Thematic Analysis instead. I haven't stoppped having aha moments.
That is your first level coding, open coding. Now move to your second level - axial coding. 3000 codes is not much. the second level will pave the way to your interpretation.
In my opinion, the real question is: which are your objectives? Research aims always give the first logic answers to question as "how many codes". E.g., one obj. can be "Understand the relation between A and B", the logic answer is "I need to know the relation, so I have to pay attention to every time, in some sense, interviewed talk about that relation. Inmediatly you realize that there are several ways of focusing that relations (it come from the data), so the logic answer would be divide the general topic "relation" in other as "emotive relation", "familiar relation", "professional relation", and so on (yes, is perhaps a too simple example, sorry). Next, probably you find that every of these divisions could be subdivided in other more, in order to fit correctly the meaning of the data, and finally, in order to understand "the relation...", that is, your objetive. Please excuse me wether I did not understand correctly the question.
If by "codes" you mean categories of analysis (a word or phrase such as "adolescence" which you use as a label to apply to sections of text -to a sentence or series of sentences-), then in my opinion 3,000 codes is overdoing it, especially if you have time constraints. I and many people I know often code in the following way:
1.) read all interviews, taking notes on ideas and topics they include;
2.) define a small number of general codes or "tracks", large themes (either a priori, based on your research question/interview guide/theoretical framework; or let the tracks or themes emerge from the data), usually something like 7-12 themes, perhaps a bit more, and apply to the interviews (label large chunks or passages of the interviews with these themes or tracks or large codes);
3.) then define more specific codes (divide larger themes into subtopics, sub-themes; again this can be done a priori based on a theoretical framework or a literature review or both, or the sub-themes can emerge from the data, or often both, defining a series of specific codes a priori and as these are applied, also creating new specific codes/themes that emerge from the data); the number of specific codes varies depending on your research question, the density or richness of your data, and (I think) your personal preference/style of coding (this is not a recommendation, but I find that I often end up with around 35 specific codes);
4.) after applying the specific codes and generating the outputs in a qualitative coding program such as Atlas.ti or NVIVO (either just asking the program to generate a document with all the segments of text coded with each code, and reading through each of these documents; or stratifying by some characteristic, such as "all the interviews with adolescent girls" and "code: local construction of adolescence", then ""all the interviews with adolescent boys" and "code: local construction of adolescence", etc. These documents (groups of text segments corresponding to each specific code) can be interpreted. In some cases, depending on the material and how much time you have to continue analyzing, you may find that even more specific codes emerge from the text of these documents, such as a typology or some other subgrouping of the text segments, such as a list of healthcare needs perceived by adolescents: supply of condoms; supply of emergency contraception; family counseling for parent-child violence; treatment for drug addiction, etc. (These are actually examples from a real study I am still doing the analysis on.)
Hope these suggestions help. I also always recommend Johnny Saldaña's Codebook for qualitative researchers (the title is something like that), from Sage publications.
It may seem like semantics, but, how many themes did you identify? If you have 3000 codes you probably have a lot of overlap. For instance I recently combined 12 codes such as anger, frustration etc. under a theme of emotional response which could then be discussed.
Hey Gayle, it doesn't sound like semantics at all. I am currently in the process of moving my 3000 codes into meaningful groups. I will then address each group to ascertain if the code belongs there, should be renamed or belongs in multipul groups. Thank god for the tech support at Hyper Resaearch who have informed me that there is a simple way to collapse my codes should I choose to do so.
Also thank you to everyone for your comments and I apologise that I have not had the time to reply individually.
So it transpires that I need to collapse my codes to move forward with the software I am using so I am now in the process of applying generic codes generated from the data across the 12 interviews. Wish someone had put that in the research chapters. If anyone wants to consult with me from a dyslexic dyspraxic perspective I'd be happy to help.
Jochen, I would be very interested to know what you mean exactly? Nobody is using 'grounded theory' in this instance, so your answer is puzzling but in anycase, first level coding is 'exploration' in action. Of course Tara needs to collapse her coding and I get the feeling that she is on the way! How do YOU, Jochen, analyse your qualitative data? Or perhaps you do not go down the methodological path in question?
Coding has been developed in the context of the grounded theory approach, and the 'explorative' coding comes from there, too. This is what I was actually aiming at. I am extremely uncomfortable with research that starts with exploration, i.e. not uses existing knowledge about a subject matter systematically. Prior knowledge on a subject either gets ignored or gets used implicitly and unconsciously.
In the case at hand, one would expect codes derived from theory (which some textbooks recommend to use in the initial stage of coding, see e.g. Miles and Huberman) to be used besides codes constructed by the coder. These codes can be expected to do the job of reducing complexity, which is always important in qualitative data analysis.
And yes, the problem of using or not using theory can be traced back to grounded theory, which never resolved it.
I try to avoid coding - been there, done that, was unhappy. I use qualitative content analysis because it gives theory a stronger role in analyzing data but keeps the procedure open to unexpected information. But then, I already use theory when formulating my research question.
My understanding is that you have undertaken an indepth, meticulous and thorough analysis of the data you have received during your interviews. While coding, I usually bear in mind my research objectives, and code accordingly. It is a laborious process, but it does aid in keeping focussed on the overall aim of the research you have undertaken.
Your number of codes for this many interviews is right on target--especially if your data were quite carried. there is no such thing as too many codes and they should be as differentiated in the first coding as possible. You can always collapse them into broader categories and use the items you have as subsets of larger codes.
Depends with the length of your interviews (all twelve) but i agree with the other commentators that it appears 3000 is too many. Group related codes into themes. I am using Atlas.ti and it works fine since even after collapsing your codes into fewer themes you still have access to your individual 3,000 codes. Hope this helps
Betania, I am interested in your answer. Just now I am doing grounded theory analyses, would you like to give an example of how the analysis in the scheme form so that I can understand easier? I think this one would be useful for others
Greetings, Tara--The first time I started using software for coding interviews, I also proliferated codes like crazy. In retrospect, part of the problem I had was that I chose the coding "unit" as the word or phrase. Like you, it sounds like I was reading various sources and other books that were a bit fuzzy on the actual mechanics of coding, particularly the grounded theoretic literature who emphasized "in vivo" codes, which are codes generated from the actual words of participants. Eventually, I found that there were simply too many codes to practically manage to do a meaningful analysis. As a result of this very painful learning experiences, I modified my approach in two ways. First, I started coding larger units. Depending on the project, I now code sentences or perhaps even paragraphs. Second, as soon as I collect the data, I force myself to sit down to write a short summary or reflection about important things I learned or ideas the data gave me. I then use that summary/reflection to help generate some preliminary codes for that data. As the project continues, after each datum is collected and each summary/reflection written, I compare the new data & reflections with the ones I already have, modifying old codes to include more types of data and in more general ways and creating new codes that capture new dimensions I hadn't noticed previously. One problem with adding new codes, particularly late in a project, is that it requires doubling back to earlier data to re-code for consistency. This is one problem with a more inductive approach to coding. Overall, don't be discouraged. Everyone has to find her or his own way in dealing with qualitative data. Coding is a process that is supposed to HELP generate meaning that can then be reported and disseminated. If my experience taught me anything, it was that I must be patient and use other people's advice as simply that--advice. Relax and dig in. This is a process, and you're learning by doing, which is always the hardest. Good luck!
Hi Christopher, summary that you mean is the same with memo writing? Is it done after all coding process finished for one participant? Can we finish coding process until theoretical coding for one participant?
Imami--I think I understand your question. The idea of "memos: are typically discussed as preliminary to analysis, as proto-analysis, or as somehow formative for other aspects of the qualitative data management or qualitative data analysis process. I really like Kathy Charmaz's book "Constructing Grounded Theory" for her discussion of the use of memos in the analytic process. Note, however, I am not so keen on her discussion of coding, as in my experience, when I followed that process the result was code proliferation and data fragmentation which resulted in months of stalled progress. Rather, the process I outline above is one that is tailored to code development towards developing specific and adequate code structure for a project. I typically mix inductive with deductive processes, especially in code development, and I am fiercely empirical, meaning that all codes must be grounded in actual data. Given those orientations, the notion of "theoretical coding" is not helpful for me. Rather, I use each datum as an opportunity to confirm and expand the coding structure overall. I aim to balance parsimony with specificity, but this can be challenging. Coding should help generate insight and be helpful for analysis, but too often coding becomes its own activity. It's so easy to lose sight of the forest for the trees!
Thank you Christopher, your explanation can at least make me more understand about coding. Would you mind if you describe what you has been done, perhaps with a sample image or whatever, maybe not on the web but directly through email?
David Rennie discusses the ways and means of coding larger units in his piece on Grounded Theory:
Rennie, L. D. (2006). The Grounded Theory Method. In Fischer, C. T. (Ed.), Qualitative Research Methods for Psychologists: Introduction through Empirical Studies (pp. 59-78). Burlington, USA: Academic Press.
Yes, Tara, 2000 codes is definitely far, far too much, especially in first order analysis, no matter how many interviews. I assume that you're a relatively inexperienced qualitative researcher? When people start learning qualitative analysis they easily become obsessed with codes, themes, sub-themes, sub-sub-themes etc.. This is a typical beginner’s mistake. Remember, codes and themes are nothing but heuristic devices which help you to better understand your data. Now, what are 2000 codes going to tell you? How will you manage further analysis? How will you look for the relationships between and within your themes (codes)? I personally believe that it’s better to have wider categories, especially in the beginning, rather than a zillion fragmented excerpts. This is because the greatest danger of qualitative research is to lose holistic perspective of your data. Also, small fragments out of context very easily obtain meanings of their own that your interviewees never intended. Start widely, probably with the themes elicited by your questions and perhaps few others that you know have arisen inductively, and then continue to refine your analysis.
3000 codes!!!! You will never finish your write up!!!! I agree with the 1st two commentors. You did and mini-mincro analyses and you need to aggregarte your codes like they suggested. You didnt do the coding wrongly, you just over analysed i guess. I did similar when i started coding five years ago but read over the titles of the codes and see commonalities and merge the codes till you have as few as you can. You should also be able to pick out major themes and minor themes that way.
Hi, coding is actually to assign the unique number to every variable and its options, which may be depicted on an excel sheet in a better way, after assigning codes to each and every question the whole data may incorporate with any statistical software i.e. SPSS etc. and as per data requirement we can apply the desired test.
HI, codes should seek answers to the broad categories of your research questions. In qualitative analysis, the number of codes under each category should not be more than 4-5.
3000 codes - that is one meticulous code. but i would not say it is too much or too little or just right. I would say - it depends on your no of respondents and the amount of data interview (transcriptions) you look at. But I would suggest you reduce to few hundreds at least to move away from the very initial level of coding, which in NVIVO we call it a free nodes stage. move up the the level to tree nodes, reduce the code into themes - which means reduce from 3000 labels to fewer. this reduction would make your data more sense. Then, reduce it even further to another level till you find a comprehensive yet representative codes. This stage is call data management stage. pls refer Ritchie, O'Conner and Spencer (2003). Secondly, move to the descriptive accounts stage, followed by the explanation stage (last level of abstraction). hope that helps.
Well, considering the no of interviews conducted (i.e. 12), I would say you are too detailed in terms of categorising your label. You are taking the words of the respondents literally. It is kind of too much. You need to reduce it to fewer categories, then classified it into fewer themes. My suggestion is that; look at all the labels and put it in baskets (similar meaning in one basket of its own). Aim - reduce the many labels into fewer categories (the so call baskets). Next, repeat the process (again use similar procedure - use the 'basket' concept). Aim - deduce the 'baskets' into smaller no of 'baskets'. Third step - do the same procedure till go get less than hundreds baskets. Next, classified the 'baskets' into themes'. AIm- to make sense of the categories of 'labels'. Again use the 'basket; concept but this time make sense of the classification to another level of abstraction in order to make sense of the categories.These are called 'data management'. What you are doing is ' to deduce the data many labels to small representative meaning i.e classification of data. Next follow the suggestion of Ritchie et al, as suggested earlier, or to descriptive account then explanation accounts (detail procedure can be gathered from Ritchie). The last two procedures are actually processs of level of abstraction. But please re organized your data then you go to the last two steps before you make sense of your data. good luck and be patient with your data, sometimes the data can gets to you but with lots of patient, it is worth while at the end.
It sounds like you have done a line by line coding system to have 3,000 codes from the 12 interviews. Whilst this isn't time wasted, it can and should be refined by going through a more thematic coding process (which can occur several times). This helps to condense the key themes into more manageable chunks. I'd refer back to your research aims and objectives too, as this will help you discard any irrelevant information and identify the key phrases, codes and points throughout your transcripts.
I guess it all depends if you are going about the coding in a very inductive way and remaining open to the different meanings in the text, so in this sense, 3000 is not too much. You can look at those codes and then use the literature and your research aims to help organise them and as you do this you will see them collapse into more distinctive categories and or themes. As others have indicated it all depends on your methodology also.
It is not uncommon to over-code especially when you are doing line coding in your first round of open coding. You have a rich data base but it also means you have a major task ahead in your second round of axial coding to pull together your word and line codes into categories and their relationships. If there are multiple coders then come together to discuss and collapse codes into the categories that lend to developing a set of themes and sub-themes. You are now close to a final third round of selective coding of the themes with exemplars.
You have a rich data set and one that will yield substantive insight and a potential grounded theory. Hope this helps though there are many ways to approach this analysis.
Difficult to answer. Short answer, no not too long depending on set of factors. .
For example, what was the length of each interview. A 3 hour interview yields far more data and transcription than a 2, 1 or less hours of interview data? Transcripts can vary in length depending on whether you use every literal word or you leave out utterances without meaning or redundancy in narrative.Example, " I have to think about that, uh well let me think, hmm let me see.... can you repeat the question..." So you see that is irrelevant in transcribing but adds to the word count. However, if exact narrative is essential to interpret an informants recall as a n aim of the study, then that literal response is essential.
Other Factors to Consider to consider Code Count
How many coders? If several coders were used was there any debriefing for concurrences and consensus building? Did you use Atlas, NVivo and if so have you quantified your 3000 words or lines. These are powerful qual programs where length of codes is not an issue and in fact can give you a more robust analyses. Were you using line, word or category coding. How many rounds of coding? Did you reduce word count after each round of coding starting with Open, Axial and then Selective.
Other Considerations
Was this a semi or unstructured open interview. Was there a framework with broad predetermined categories that provided guard rails to keep you in bounds of your exploratory interview. Was there a model or qualitative approach, you were using as this can be very relevant to coding. What was the approach and what are the parameters of the approach.
Lastly What does the qualitative research literature tell us about coding .
So my view is the length is not a an issue or a problem - it depends on the nature, purpose and aim of study, the methodology and analyses protocol.
In my opinion, its good at first level to be verbose with codes, so your 3K codes is fine....don't get caught under coded when you move to next level and axial.