CareerPlanAI job matching β†’

Senior Software Engineer, Data Seattle, WA View job

Allen AI Institute
πŸ—“ Posted 2026-06-04
Apply on Allen AI Institute β†—

Role: Senior Software Engineer, Data
Location: Seattle, WA

Role Overview:
The Allen Institute for AI (Ai2) is hiring a Senior Data Engineer to build the data infrastructure behind AI research agents that explore and reason over scholarly literature. You'll work on the Semantic Scholar corpus, expanding what it covers and improving the quality of what’s already there, and create the APIs and tooling that these agents rely on at scale. This role sits at the intersection of data engineering and applied ML. You'll own pipelines, design schemas, and ship production services, but you'll also apply practical ML techniques (entity resolution, text classification, embedding-based similarity) to improve data quality and enrich metadata at scale, directly shaping what the agents can do.

The Agentic Applications team builds open, production-grade systems that power scientific discovery and large-scale AI research. We focus on creating high-quality structured datasets, integrating diverse content types, and enabling downstream applications across search, citation analysis, and model training. The team combines strong engineering practices with close collaboration across Ai2’s product and research orgs to deliver tools and infrastructure used by millions of researchers and developers worldwide.

Responsibilities:
- Improve the coverage and quality of the Semantic Scholar corpus across academic papers, patents, and new domain-specific datasets
- Build and maintain scalable data pipelines for corpus integration, citation resolution, and metadata enrichment
- Develop and deploy ML models for entity disambiguation, author linking, and topic classification
- Design and extend APIs that expose structured scholarly data to academic researchers and AI agent workflows
- Contribute to dashboards and tools for evaluating data quality and model precision
- Collaborate across engineering and research teams to ensure maintainability, test coverage, and robust deployment

Requirements and Qualifications:
Required:
- Bachelor's degree and 8+ years of technical experience; relevant experience may substitute for education.
- Strong Python engineering skills, especially for building and maintaining data pipelines
- Experience with SQL and schema design in production settings (PostgreSQL preferred)
- Familiarity with ML workflows (training classifiers, tuning models, deploying for inference), particularly for large-scale or ambiguous structured datasets
- Comfortable working with structured data formats (XML/JSON/Parquet) and writing ETL code
- Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (AWS, S3, Docker)
- Strong communicator and a strong sense of ownership for results

Preferred:
- Experience with author disambiguation, entity resolution, or record linkage problems
- Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale
- Exposure to citation networks or scholarly data systems (e.g., arXiv, OpenAlex, USPTO)
- Familiarity with building APIs or data services consumed by automated or agent-based workflows

Physical Demands and Work Environment:
- Must be able to remain in a stationary position for long periods of time.
- The ability to communicate information and ideas so others will understand. Must be able to exchange accurate information in these situations.
- The ability to observe details at close range.
- Can work under deadlines.

What They Offer:
- Base salary range: $126,000 - $189,000, plus generous bonus plans.
- Medical, dental, vision, and an employee assistance program for team members and their families.
- Health savings account plan, healthcare reimbursement arrangement plan, and health care and dependent care flexible spending account plans.
- Company 401k plan.
- $125 per month to assist with commuting or internet expenses.
- $200 per month for fitness and wellbeing expenses.
- Up to ten sick days per year, up to seven personal days per year, up to 20 vacation days per year, and twelve paid holidays throughout the calendar year.
- Annual bonuses and participation in the long-term incentive plan.
- A learning organization with weekly Ai2 Academy lectures and world-class AI experts as guest speakers.
- Commitment to diversity, inclusion, and a healthy work/life balance.
- Collaborative and transparent culture.
- Office located in Seattle on the water.

Sourced via direct Β· Listed on CareerPlan, which tracks 70,000+ jobs from 20+ sources.

Get roles like this matched to your CV

Upload your CV once. CareerPlan scores every new posting against your profile with AI and surfaces the ones worth your time β€” with a recruiter-style pitch.

Try CareerPlan free β†’