Sarah Scheffler

About me

I will be an assistant professor at Carnegie Mellon University starting in Fall 2024! I am jointly appointed between Software and Societal Systems (in the School of Computer Science) and Engineering and Public Policy (in the College of Engineering). If you are interested in joining me as a student in the Fall of 2024, please first poke around this page to see what kind of work I do, and apply to either department at CMU. Please also feel free to email me at sscheff AT csail DOT mit DOT edu. My ideal student has a strong background or interest in cryptography and its intersection with societal, legal, and policy issues.

I am currently a postdoctoral research associate at MIT's Internet Policy Research Initiative. Formerly, I was a postdoctoral research associate at Princeton University's Center for Information Technology Policy.

I am a studying applied cryptographer working at the intersection of computer science, policy, and law. My interdisciplinary work includes policy and technical analysis of end-to-end encrypted content moderation, compelled decryption, privacy policies, and autonomous weapon systems. I also do "pure" applied cryptography, including work on zero-knowledge proofs, multi-party computation, private set intersection, and hash combiners.

I obtained my Ph.D. from Boston University, advised by Prof. Mayank Varia, in 2021. During my time at BU I was a Ph.D. student in the BUsec group, I organized the BUsec Seminar for security, cryptography, and privacy, as well as the Multi-Party Computation Reading Group, and I was an active member of the Cyber Security, Law, and Society Alliance.



Publications

Group Moderation under End-to-End Encryption
Sarah Scheffler, Jonathan Mayer
ACM CS/Law 2024 (Forthcoming)

SoK: Content Moderation for End-to-End Encryption
Sarah Scheffler, Jonathan Mayer
PoPETS 2023

Public Verification for Private Hash Matching
Sarah Scheffler, Anunay Kulshrestha, Jonathan Mayer
IEEE S&P 2023

Can the government compel decryption? Don't trust --- verify
Aloni Cohen, Sarah Scheffler, Mayank Varia
ACM CS/Law 2022

Formalizing Human Ingenuity: A Quantitative Framework for Copyright Law's Substantial Similarity
Eran Tromer, Sarah Scheffler, Mayank Varia
ACM CS/Law 2022

TurboIKOS: Improved Non-interactive Zero Knowledge with Sublinear Memory
Yaron Gvili, Julie Ha, Sarah Scheffler, Mayank Varia, Ziling Yang, Xinyuan Zhang
ACNS 2021

BooLigero: Improved Sublinear Zero Knowledge Proofs for Boolean Circuits
Yaron Gvili, Sarah Scheffler, Mayank Varia
Financial Crypto 2021

Protecting Cryptography against Compelled Self-Incrimination
Sarah Scheffler, Mayank Varia
USENIX Security 2021

Arithmetic Expression Construction
Leo Alcock, Sualeh Asif, Jeffrey Bosboom, Josh Brunner, Charlotte Chen, Erik D. Demaine, Rogers Epstein, Adam Hesterberg, Lior Hirschfeld, William Hu, Jayson Lynch, Sarah Scheffler, Lillian Zhang
ISAAC 2020

Case Study: Disclosure of Indirect Device Fingerprinting in Privacy Policies
Julissa Milligan, Sarah Scheffler, Andrew Sellars, Trishita Tiwari, Ari Trachtenberg, Mayank Varia
STAST 2019

PSPACE-completeness of Pulling Blocks to Reach a Goal
Joshua Ani, Sualeh Asif, Erik D. Demaine, Yevhenii Diomidov, Dylan Hendrickson, Jayson Lynch, Sarah Scheffler, Adam Suhl
JCDCG^3 2019 / JIP 2020

From Soft Classifiers to Hard Decisions: How fair can we be?
Ran Canetti, Aloni Cohen, Nishanth Dikkala, Govind Ramnarayan, Sarah Scheffler, Adam Smith
ACM FAT* 2019

The Unintended Consequences of Email Spam Prevention
Sarah Scheffler, Sean Smith, Yossi Gilad, Sharon Goldberg
PAM 2018



Current Research

Transparency and Public Verifiability for Content Moderation in End-to-End Encryption

It is possible to build systems that analyze the content of end-to-end encrypted messaging, either by observing the content on the client side, or by obliviously processing it on the server side. However, these systems bring both technical and policy challenges for ensuring that these systems are used as advertised and without enabling mass surveillance or infringing on civil rights. Technical improvements are possible for a limited number of these issues, and the rest must be addressed at a legal or policy level.

Enabling Journalistic, Social, and Scientific Research with Applied Privacy-Preserving Computing

The last decade has seen an explosion in cryptographic and statistical tools for privacy-preserving computation: multi-party computation, differential privacy, homomorphic encryption, and (sort of) functional encryption. But these tools remain largey confined to theoretical proposals, or, at best, unwieldy codebases with high learning curves. I aim to bring this privacy-preserving computation to particular users who need high levels of privacy in their analysis, especially journalists wishing to protect sources, researchers needing to comply with privacy protocols. Actual journalists will be using the privacy-preserving functionality soon, via the Digital Witness Lab!

Choosing the right privacy-enhancing tool for the privacy-enhancing job

We have finally started to see some adoption of privacy-enhancing technologies like MPC, differential privacy, and homomorphic encryption at scale. But the deployment of these technologies does not always mean that the privacy problem is solved. What privacy gaps remain? How do we see these systems developing into the future?

Formalization of legal copyright notions

I have one minor contribution in the space of understanding U.S. Copyright's notion of "substantial similarity" a little better.



Older Projects



Improving Sublinear Zero Knowledge Proofs for Boolean Circuits without Assumptions

By combining existing ZK proof techniques with additional ``tests'' for checking the soundness of additional relations, we create a new ZK proof system for Boolean circuits or circuits in GF(2^k) that achieves a Ligero-style proof that is smaller than others in the same category of proofs that do not rely on the discrete log assumption and that do not have prohibitive prover or verifier runtime.



Protecting Cryptography from Self-Incrimination

paper

Technical analysis of the current state of affairs as to whether various cryptographic objects can be compelled in a government subpoena as a "foregone conclusion" exception to the U.S. 5th Amendment right against self-incrimination.

Resilient Password-Based Key Derivation Functions

paper / poster

Older password-based key derivation functions like PBKDF2 rely on repeated iteration of a single hash function to force the attacker to spend more resources. But thanks to Bitcoin, the cost of specialized hardware to do small, repeated functions, has gone down dramatically. Newer PBKDFs like scrypt add memory as a resource that attackers must spend in order to compute efficiently. We extend this resource consumption model to a PBKDF that consumes many resources, like CPU, storage, cache, or chip access in order to correctly derive the key from the password. Paper is forthcoming.



Privacy Against Inference-Based Device Fingerprinting

paper

Device fingerpinting methods are employed by websites in order to identify unique devices. The older, "traditional" device fingerprinting methods involve direct requests for information from a client's machine: measuring information in the HTTP request header, sending a cookie, or embedding additional web requests (e.g. for invisible pixels). All this information can be identified and blocked from being sent, in order to preserve user privacy. However, the newer wave of device fingerprinting methods use indirect approaches, by asking the user to perform a seemingly irrelevant computation and then gleaning information from the result. These methods are more insidious, more difficult to block, and somewhat sneaky in the sense that even an expert client may not be able to tell that it's happening. These methods are only described in general terms in privacy policies. More research is necessary to determine how to address these methods in a way that respects consumer privacy.



Proposing Safeguards for Government Risk-Assessment Systems

paper

This paper analyzes governmentally-regulated risk assessment systems by evaluating them on three axes: We examine the costs of the systems on individuals, the system holders, and society, we analyze the inputs to the system, and we describe the transparency (or lack thereof) within the systems. Using three case studies—the Unified Passenger system (UPAX), the COMPAS Risk & Need Assessment System, and the FICO score—we develop a standardized set of potential technical requirements to mitigate abuse and ensure individuals are treated fairly while remaining within the constraints levied by the system’s purpose.



Dismantling the False Distinction between Traditional Programming and Machine Learning in Lethal Autonomous Weapons

paper / poster

Contrary to my expectations at the start of the project, the type of programming used to create lethal autonomous weapons does not inherently affect their ability to comply with International Humanitarian Law. Traditional programming, machine learning, and artificial intelligence are distinct, overlapping techniques in programming autonomous weapons, and the use of one technique over another should not affect the standard used to determine whether a given lethal autonomous weapon complies with the Law of Armed Conflict. Rather, the same (strict) standards should apply to all lethal autonomous weapons, and their outward performance in accordance with the law should be the sole determinant of legality.



From Soft Classifiers to Hard Decisions: How Fair Can We Be?

paper / poster

Question: When your machine learning algorithm is "calibrated" for different protected groups, can this calibrated score be post-processed in a "fair" way? Answer: In general, no. But you can achieve some partial fairness properties (such as equalizing the positive predictive value across groups), or you can defer on some inputs and guarantee good expected fairness properties for the non-deferred outputs. Paper and poster forthcoming.



The Unintended Consequences of Email Spam Prevention

paper / website / talk / poster

To combat DNS cache poisoning attacks and exploitation of the DNS as an amplifier in DoS attacks, many recursive DNS resolvers are configured as "closed" and refuse to answer queries made by hosts outside their organization. This work presents a technique to induce DNS queries within an organization, using the organization's email service and the Sender Policy Framework (SPF) email spam-checking mechanism. We use this technique to study closed resolvers, verifying that most closed DNS resolvers have deployed common DNS poisoning defense techniques, but showing that SPF is often deployed in a way that allows an external attacker to cause the organization's resolver to issue numerous DNS queries to a victim IP address by sending a single email to any address within the organization's domain.



Proactively-secure Accumulo with Cryptographic Enforcement

code

At the MIT Lincoln Laboratory, as assistant research staff, I worked in the Secure and Resilient Systems and Technology group within the Cybersecurity and Information Sciences division to assist in the implementation, testing, and release of a library that adds confidentiality and integrity guarantees to the Accumulo database, protecting it against a malicious server or sysadmin. Earlier in the project, I also implemented Oblivious RAM (Path ORAM) for Accumulo.



Quantifying Latent Fingerprint Quality

talk

As a capstone project at Harvey Mudd College, I worked with a team of four students for the MITRE Corporation on a project to design, implement, and test a system that uses image processing and machine learning techniques to evaluate the suitability of crime scene fingerprint images for identification by Automated Fingerprint Identification Systems.



Statistical Testing of Cryptographic Entropy Sources

talk

As a summer undergraduate research fellow at the National Institute of Standards and Technology (NIST), I worked with Dr. Allen Roginsky in the Computer Security Division to improve NIST's statistical tests for entropy sources for use in cryptographic random number generators. I also made adjustments to the process for generating large primes used in cryptography.