Research area

Privacy-Preserving AI

Mapping vague intent into structured booking, then matching it against the right service — all without storing raw histories. Joint intent extraction, probabilistic reasoning, statistical aggregates, differential privacy, synthetic data.

Privacy-Preserving AI research at Fixmeapp sits at the seam between intent and matching. How does a system understand what a user wants from vague natural language? How does it find the right service for them, accurately, without compiling a detailed profile? Two active KTH theses run in parallel here — one on dual-layer intention inference and probabilistic reasoning, one on the privacy-engineering techniques that make the rest of it operational.

2 theses in this area · jump to

Intent Inference and Probabilistic Reasoning AI Privacy Engineering

Thesis 01

Intent Inference and Probabilistic Reasoning AI

KTH, Royal Institute of Technology2026In progress

The dual-layer inference framework pairs a language-level extractor performing joint intent and slot modeling on raw text with a probabilistic reasoner that maintains a belief distribution over plausible goal hypotheses. Rather than 6 follow-up questions, the system reasons internally — estimating likelihoods that extracted variables match system states, quantifying uncertainty under partial observability, and iteratively narrowing hypotheses until a confident booking decision emerges.

The work engages closely with the active-clarification literature (Ask-to-Clarify; AT-CoT by Tang & Soulier; BALI by Ghose et al.) but leans architecturally toward minimizing user disruption: clarification only when expected information gain exceeds interaction cost (the dual-mode formulation of Fang et al.). Token-level confidence calibration (FineCE, Han et al.) mitigates hallucinated overconfidence; the Planner-Composer-Evaluator (PCE) framework structures fragmented implicit assumptions into an evaluable probabilistic decision tree; counterfactual reasoning (CRED, Tung et al.) generates "what-if" hypotheses to proactively narrow the belief distribution.

The same architecture stores user history as statistical aggregates rather than retaining raw dialogue or personally identifying information. Empirical work from the LMP2 audit framework (Staufer) shows large language models memorize sensitive personal data by default — the matching architecture here is designed to make that memorization structurally impossible. The thesis studies dual-tower retrieval (PrivGemo-style, keeping raw user knowledge graphs strictly local while transmitting only de-identified views to cloud reasoners), temporal decay of stale preferences (Stability and Safety Governed Memory, Lam et al.), and Bayesian trust scoring against memory poisoning.

Thesis 02

Privacy Engineering

KTH, Royal Institute of Technology2026In progress

This thesis tackles the techniques side of privacy-preserving AI: anonymization, differential privacy, and synthetic data generation. The goal is a privacy-engineering toolkit applicable across the platform — from analytics to recommendations to research datasets — without exposing the individuals behind the signals.

On the architectural side it studies design principles for a centralized data-sharing platform on a social-booking surface: privacy-by-design, GDPR alignment, and the specific system components needed to make opt-in meaningful in production rather than only at prototype size. It investigates how privacy-engineering principles translate from research literature into a real architecture — and how ε-differential privacy, anonymization patterns, and synthetic data generation combine into a production-grade toolkit.

Across the theses in this area

Key themes

Joint intent + slot modeling from vague natural language

Belief distributions over goal hypotheses (POMDP-style reasoning)

Cost-aware clarification: ask only when uncertainty reduction > interaction cost

Counterfactual hypothesis generation (CRED-style preference learning)

Token-level confidence calibration (FineCE / Monte Carlo sampling)

Statistical-aggregate representations over raw interaction logs

Temporal decay of stale user preferences (SSGM-style memory governance)

Dual-tower retrieval: local user graphs + anonymized cloud reasoning (PrivGemo)

Differential privacy & synthetic data generation as production techniques

Empirical privacy auditing of LLM memorization (LMP2 framework)

Collaborate on this area

Related research areas

Opt-in Layer

Consent as granular, revocable tokens. Local-first architectures. User-driven data flows that providers and customers can verify and reset themselves — including withdrawal from data-marketplace participation at any time.

Ethical Data Economy

Cryptographically verifiable, incentivized data contribution — Merkle-tree-based proof-of-concept where users own their data, choose what to share, and earn rewards for participation. Built on the FACT principles.

All research areas