Qualitative Coding#
Qualitative coding is the process of assigning labels (codes) to segments of qualitative data (e.g., interview transcripts) to identify patterns, themes, or categories. It forms the foundation for thematic analysis, a method for interpreting meaning across a dataset.
This guide assumes a pragmatic approach to thematic analysis, compatible with either realist or constructivist perspectives.
WIP: This page is a work in progress. It could be extended with additional methods (e.g., grounded theory, content analysis), for now it focuses on interview coding.
Terms#
- Codebook: Collection of all code definitions, often with short descriptions to help coders with coding.
- Code: A label assigned to a segment of data.
- Coding (in the context of a codebook): The process of applying codes to data segments (not to be confused with writing programming code).
- Coders: Person assigning codes to qualitative data. Often done by multiple coders to reduce biases.
- Theme: A coherent and meaningful pattern of codes relevant to the research question.
- Pattern: Recurring content across interviews that may inform themes.
- Conflict: Differences in codes assigned to a qualitative data point by different coders.
Qualitative Methods#
Thematic Analysis#
Thematic analysis is a flexible method for identifying and interpreting patterns across qualitative data.
While it can be conducted inductively or deductively, many researchers (including us) adopt a hybrid coding approach that blends both. For details on how hybrid coding integrates into this process, see the section below.
1. Familiarization#
- Read transcripts repeatedly.
- Note initial observations or interesting features.
- Capture first impressions in memos.
2. Generating Initial Codes#
- Assign short labels to data segments.
- Code inclusively (prefer longer, richer segments).
3. Searching for Themes#
- Group similar codes.
- Identify higher-level patterns across the data.
- Create candidate themes and sub-themes.
4. Reviewing Themes#
- Re-examine coded data for consistency within and across themes.
- Validate themes against the full dataset.
- Split, merge, or discard themes as needed.
5. Defining and Naming Themes#
- Refine each theme’s scope and boundaries.
- Write theme descriptions with supporting excerpts.
- Ensure themes collectively address the research questions.
6. Writing the Report#
- Present themes clearly and coherently.
- Use direct quotes to support interpretation.
- Integrate findings with existing literature.
Coding Approaches#
| Approach | Source of Codes |
|---|---|
| Top-down | Theoretical frameworks, prior research, hypotheses |
| Bottom-up | Participant language, in-situ patterns |
| Hybrid | Combines both, like interview guide + codes that emerge from transcripts |
Hybrid Coding#
We most commonly use a hybrid coding approach for interviews: Hybrid coding integrates both top-down (deductive) and bottom-up (inductive) approaches to generate codes and themes.
Within thematic analysis, hybrid coding provides the operational mechanism for iteratively refining codes and themes as the dataset expands.
1. Framework Preparation#
- Define initial codes based on:
- Research questions
- Conceptual models
- Prior literature or coding schema
- Document intended scope of each code
- Prepare a flexible initial codebook
2. Data Immersion#
- Read transcripts with minimal preconceptions
- Write margin notes to capture impressions
- Note contradictions with the initial framework
3. Initial Coding Pass#
- Code using both predefined and open codes
- Flag segments that resist clean categorization
- Mark emergent patterns for codebook expansion
4. Codebook Reconciliation#
- Evaluate emergent codes:
- Collapse, rename, or refine existing codes
- Add new inductive codes with definitions and examples
- Ensure coherence across data while retaining new insights
5. Theme Development#
- Group codes into candidate themes (both expected and emergent)
- Compare thematic structure against both theory and data
- Identify tensions, absences, and expansions
6. Iterative Refinement#
- Validate themes with data excerpts
- Check for representativeness across participants
- Revisit definitions and refine boundaries iteratively
Codebook#
A codebook is a structured document that defines each code used in the analysis.
The high-level structure depends on the data collection approach. E.g., as distinct question sections for a survey with multiple open-ended questions or as major topics for a recorded interview. On a low-level, codebooks are often hierarchical structured by sections, codes, and subcodes.
Research Artifact: Codebooks are considered research artifacts and usually need to be shared in publications or supplementary materials.
Your codebook does not need to be perfect during the research process, but it should be clear and well-documented enough for others when submitting your work for review.
Good practices:
- Store the codebook in a shared, versioned document (e.g., Google Doc).
- Include a version number and date in the header.
- Maintain a short changelog summarizing edits, merges, and additions.
- Highlight new or revised codes (commonly in yellow) for easy identification.
- Review changes collaboratively and/or announce them in a shared channel (Slack) to maintain consistency across coders.
Codebook Examples#
In the wild#
- Shared in the group drive
- OS Interviews: https://publications.teamusec.de/2022-oakland-sec-oss/files/code-book.pdf
- Industry Interviews: https://publications.teamusec.de/2023-oakland-oss-consumers/pdf/codebook.pdf
- Appendix of https://dwermke.com/pdf/conf-usenix-ramulu24.pdf
Code examples#
Minimal Example: Brief example for a general theme. At minimum you want:
- Section headers (so the codebook is easy to navigate)
- Code names and identifiers (makes discussion easier, especially with multiple coders and similarly named codes)
- Short descriptions (if not obvious from the code name)
- Subcodes (if applicable) and their identifiers
C1 Security Attitudes#
C1.2 Security Concerns Subcodes:
C1.2-1 data breachesC1.2-2 unauthorized accessC1.2-3 surveillance
Full Example: Very detailed example for a specific theme, you likely don’t need this much detail for most codes, only add if necessary (e.g., complex codes with many subcodes or confusing boundaries).
C1 Security Attitudes#
C1.2 Concerns General code for security-related concerns expressed in responses to Q3, Q4, and Q5.
Subcodes include currently:
C1.2-1 data breachesC1.2-2 unauthorized accessC1.2-3 surveillanceC1.2-4 malware and phishing- Extend this list if you identify a new concern.
Source Questions: Q3, Q4, Q5 (What interview guide questions map to this)
Inclusion Criteria:
- Mentions of fear, unease, or distrust about security threats or vulnerabilities.
- Statements linking negative emotions to data handling, access control, or digital risks.
Exclusion Criteria:
- General privacy concerns without explicit security context (see C1.3).
- Positive trust statements about security systems.
- Abstract mentions of “risk” without emotional or personal framing.
Examples:
- “I’m concerned that my data could be leaked if the company gets hacked.” →
C1.2-1 data breaches- “I worry that someone could log into my account without my permission.” →
C1.2-2 unauthorized access- “I don’t trust apps that track everything I do online.” →
C1.2-3 surveillanceCounterexample:
- “I think modern systems are generally secure enough for everyday use.” → Not coded here (see
C1.3 Trust).Coding: Add the corresponding subcode(s). If a concern involves multiple threat types (e.g., privacy and unauthorized access), assign all relevant subcodes.