Senior Data Engineer, Princeton Accelerator (Contract)
Building the Telescope into the Information Ecosystem
The Opportunity
The Accelerator at Princeton’s School of Public and International Affairs (SPIA) is building a first-of-its-kind research platform—a living dataset of social media activity that helps researchers understand how platforms like Telegram and YouTube shape the information environment.
This is greenfield work. We're designing novel data infrastructure from scratch—solving problems without established playbooks, building systems that will define how researchers access social media data at scale.
We're looking for a senior level data engineer to expand and improve the critical data infrastructure: owning architectural decisions, tuning pipelines for performance and cost, expanding our ingestion capabilities (transcripts, sampled media, ML-ready datasets), and establishing engineering standards.
The datasets you help build will accelerate research on pressing social science topics relevant to the social media space that shapes our collective information ecosystem.
The Work
Pipeline Performance & Expansion
Optimize existing Databricks pipelines for cost and performance (we're a lean research operation and DBU-conscious)
Expand ingestion scope: YouTube transcripts, multi-media sampling, new metadata sources
Design for ML readiness—researchers and data scientists need gold-layer datasets they can query and model against
Engineering Standards & CI/CD
Standardize our CI/CD pipelines and integrate them into the product workflow
Establish patterns for testing, deployment, and monitoring that the team can sustain
Document what you build so it outlasts your engagement
Team & Researcher Support
Work alongside a small engineering team—your experience and mentorship matter
Collaborate with researchers to understand what they need from the data
Communicate clearly; we value engineers who can translate technical tradeoffs for non-technical stakeholders
What We're Looking For
Technical leader who can own architectural decisions
Deep Databricks experience: PySpark, Delta Lake, cost optimization, cluster tuning
You've designed and built pipelines at scale (10+ TB).
Deep experience with CI/CD, testing, and what "maintainable" actually means
Mentor mindset
Clear communicator who can work with researchers and junior engineers alike
Extras
Experience with social media data, APIs, or unstructured text at scale
Background in research or academic environments
Interest in the societal impact of your engineering work
Azure, GitHub Actions experience
Structure
~30 hours/week, January through June 2026
Remote; engineering and product team Atlanta, GA area, with research and operations in Princeton, NJ
Reports to Director of Engineering
Competitive rate commensurate with experience.
To Apply
Formal proposals not required, just send the following:
Your resume (2 pages max, please)
A few sentences on why this interests you
One example of a pipeline or system you're proud of building. A link to a GitHub repo or website would be great.
Email: info@researchaccelerator.org
Deadline: ASAP
Source: Idealist