PPQ

PPQ#

A Problem Purpose (Research)Question (PPQ) document is an approach to writing a research plan. The document helps keeping an overview over the project and provides a single source of truth for collaborators.

Often, it is the first non-draft document established for a project and serves as seed for later, more detailed documents such as the Experiment Guide or the Data Analysis Plan.

Specifically for a PPQ, many of it’s structures correspond directly to sections in the corresponding publication.

Problem#

The problem statement illustrates what’s wrong, what’s missing, what we don’t know or need to know better, and what needs to be done.

Example: Problem

The trust in open source software is enormous. In 2020, 95% of IT departments and companies consider open source software as strategically important to their organization’s overall enterprise infrastructure software strategy. Open source has advantages compared to proprietary software; a strong argument for open source software is that everyone can read the source code in theory and analyze an OSS projects’ security. Some OSS projects make the results of security audits public and deploy measures to improve code security. The Linux Foundation provides a guideline to improve trust and security in open source projects. However, incidents in the past illustrate that OSS may provide a false sense of security.

The Heartbleed bug in the OpenSSL library was introduced into the software in 2012 and publicly disclosed in April 2014. A buffer overflow vulnerability allowed attackers to access values in the memory of web servers all over the Internet. The consequences were disastrous and affected thousands of computer systems and the data of billions of users worldwide. However, it likely was caused due to a lack of attention from a single developer and passed all security measures that were in place in the OpenSSL open source project.

An incident in the event-stream library that is part of the NPM package manager for the JavaScript programming language in 2018 demonstrated the feasibility to inject malicious code into OSS projects. A malicious committer intentionally modified event-stream to then depend on another malicious package that was specifically crafted for the attack.

The most recent episode is a questionable scientific experiment conducted by a team of computer security researchers from the University of Minnesota in the U.S.. They intended to investigate the feasibility of introducing vulnerable commits to the Linux kernel. While they could demonstrate that it was possible to trick the Linux kernel developers into accepting their malicious code, their experimental design led to heated debates in the Linux kernel development and academic computer security research community finally resulting in banning the University of Minnesota from contributing to the Linux kernel and a withdrawn IEEE S&P’21 paper.

Although the above incidents lead to vulnerable OSS projects, they illustrate that blindly trusting OSS might lead to severe consequences. However, to date it is most unknown which formal and visible but also hidden and invisible security measures OSS projects deploy to protect their users.

Purpose#

A purpose statement conveys the overall intent of your research in a single sentence. It indicates what you intend to accomplish and establishes the central direction for your research. The purpose statement should flow logically from your Problem Statement.

  • Achieve some set of objectives
  • Answer some set of questions
  • Test some set of hypotheses
Example: Purpose
We aim to shed light on trust and security practices in OSS projects – by evaluation of both visible and invisible security measures. We base our investigation on the guideline for “Improving Trust and Security in Open Source Projects” by the Linux Foundation.

Novelty#

Subset of the purpose statement. Highlights why the intended direction is novel research.

Example: Novelty
We are the first to systematically investigate both technical and non-technical measures OSS projects employ to improve security and establish trust.

Research Questions#

Purpose statements are realized through a set of research objectives or questions. Clearly formulated questions will help focus your research and guide your methodology.

Research questions serve to narrow the purpose statement and are linked directly to the research findings. Research questions (or objectives) are the specific things you will achieve (or questions you will answer) in your research in order to accomplish your overall purpose.

Typically, publications in usable security & privacy consider no more than 2 to 3 main research questions.

Example: Research Questions
  1. “How are open source projects structured behind-the-scenes?”

    Due to their community-driven nature, open source projects include structures and processes that are not inherently visible on a repository level. We are interested in the why and how of behind-the-scenes interactions and decisions, especially in the context of security and trust.

  2. “If and what guidance and policies are provided by open source projects?”

    Often changing contributors and loose team structures lead to challenges in distributing project-internal knowledge in open source projects. We want to examine guidance and (security) policies provided by open source projects of any size, as well as identify their established roles and responsibilities.

  3. “How do open source projects approach security and trust challenges?”

    Open source projects face unique challenges in terms of security and trust due to their open nature, including code submissions from mostly unknown entities. We are interested in what organizational and technical measures open source projects employ to establish trust between contributors and how they react or plan to react to arising security and trust challenges.

Additional Sections#

Some additional sections that can appear in a PPQ. Mostly as an extension for the PPQ or as an overlap with later documents like Alignment or Data Analysis Plan.

Methodology#

Example: Methodology
  • Mixed-methods:
    • Analyze OSS project repositories/commits
      • (SC-1) Accept Pull Requests only
      • (SC-2) Use of protected branches
      • (SC-3) Use digitally signed commits
      • (SC-4) Assign security issues immediately
      • (SC-5) Use private issues for security issues
      • (SC-8) Use a secrets management system
    • Analyze OSS project documentation
      • (SP-1) Security policy availability
      • (SP-2) Security readme files with security policy available
      • (KYC-4) Has a contributor license agreement
      • (KYC-5) Publish list of contributors and their contributions
    • Conduct interviews with project owners/leads
      • How does the OSS project establish trust for commits/committers?
        • (KYC-1) Verify identity of all contributors
        • (KYC-2) Require strong authentication for commits
        • (KYC-3) Role based access control / least privilege principle
        • (SC-1) Accept Pull Requests only
        • (SC-2) Use of protected branches
        • (SC-3) Use digitally signed commits
        • (SC-4) Assign security issues immediately
        • (SC-5) Use private issues for security issues
        • (SC-8) Use a secrets management system
      • How does the OSS project handle untrustworthy commits/committers?

Data/Sample#

Example: Data/Sample

OSS projects:

  • Get project list from GitHub/GitLab/OpenHub
    • random/stratified sampling
      • Activity
        • At least 20 committers in the last 6 months
        • At least 40 commits in the last 6 months
      • New member onboarding
        • At least one new committer last month
      • Popularity
        • stars/forks
      • Content:
        • Languages
        • Topics
      • Additionally for later? CONTRIBUTING.md, README.md (might need to consider also .txt, .adoc, .rst) → some ML/clustering?
    • Huge sample for all the things we can do automatically
    • Smaller sample for all the things that require manual analysis

Interviewees:

  • Get project owners/leads from GitHub/GitLab projects
    • Goal: ~15 interviewees
    • Sample for diversity:
      • Parameters: OSS project age, # of contributors, popularity, security focus, ???
  • Initial contacts
    • [REDACTED]

Limitations#

Example: Limitations
  • Self-reported biases
    • Over- & under-reporting
    • Sample bias
    • Social-desirability bias
  • Convenience sample
    • Participants not necessarily representative for OS contributors
    • Participants that agree to speak with us could be more/less security-conscious than the average developer.
  • Languages: English and German
    • No insight into non-English or non-German communities (negligible, English is “working language” in OS)
  • Sensitive questions regarding security and trust incidents
    • Highlight to participants that we are not judging security, just interested in their opinion.
    • Emphasize that participants can skip or terminate at any time.
Example: Related Work

Security of OSS projects

  • On the Feasibility of Stealthily Introducing Vulnerabilities in Open-Source Software via Hypocrite Commits
  • A Large-Scale Study of Modern Code Review and Security in Open Source Projects
  • The Sound of Silence: Mining Security Vulnerabilities from Secret Integration Channels in Open-Source Projects
  • Learning secure programming in open source software communities: a socio-technical view
  • Why modern open source projects fail

Trust

  • How can contributors to open-source communities be trusted? On the assumption, inference, and substitution of trust
  • Trust in Open Source Software Development Communities: A Comprehensive Analysis
  • Trust issues in open source software development
  • The role of trust in OSS communities — Case Linux Kernel community

Template#

Generic Template
# PPQ

## Problem

## Purpose

### Novelty

## Research Question(s)

1.
2.
3.

## Methodology

## Data/Sample

## Limitations

## Related Work

Worked Example#

Example: The example below is for a hypothetical, exploratory study with IoT developers. Experiments would likely include scraping of related GitHub repositories and a mix of large-scale survey and in-depth interviews.

Problem#

Technology under the wide umbrella of IoT had a massive surge in the last few years (From smart speakers to climate control).

Nonetheless, entering the IoT developer job market is probably easier than ever before, thanks to a huge selection of taught courses and programming libraries available.

Security & privacy in IoT development is relevant because:

  • Ease of access through free online courses & generally high demand.
  • With ease of access comes a high impact for security and privacy: for IoT devices, selecting the infrastructure supplier and writing code within the platforms constraints is the hard part, renegading security to a second-class concern (and privacy even lower).
  • Add in the generally low specifications of IoT devices and a “Just ship it” attitude common in many startups, and you end with a dangerous mix of overconfidence and privacy violations.

Purpose#

Investigate how developers interact with IoT in the context of security/privacy. We aim to shed light on what developers know about IoT sec/priv, what they do to improve sec/priv for their platforms and their experiences with sec/priv incidences for their shipped models.

Novelty#

In the field of traditional software security, developers are emerging as a main focus for researchers, as empowering developers can have a large effect of security & privacy of their software.

For IoT plattforms, the context changes, as code security is no longer a focus for developers, but data acquisition, staying within low specifications, and interfacing with out-of-date APIs. This research would be fairly novel, as not many researchers have looked at helping IoT developers with security & privacy, especially not from an usability view point.

Questions#

Research Questions: The questions below include sub questions & thoughts. For the actual publication, the 3 main questions will likely serve as the structural guide in intro and discussion.
  1. What do developers know about, and how do they interact with IoT code (“Who are they?”)
    1. Language/Library usage
    2. Tool usage
    3. (Enhance with GitHub demographics)
  2. If and how do developers try to develop secure & private IoT code (“How do they do it?”)
    1. How do they interact with IoT platforms
      1. Development practices
      2. What are their threat models?
    2. What do they do for security/privacy?
      1. Differential privacy?
      2. Data collection?
      3. Secure protocols?
  3. What are developers’ opinions and misconceptions about security & privacy on IoT platforms (“What do they think?”)
    1. What do IoT developers think about security/privacy
    2. What do they know about attacks against IoT platforms?
    3. Expected (future) development of IoT.