Machine Learning Engineer

Subsalt

Job Details

Full-time

3/26/2024

Full Job Description

We’re solving problems that others thought were impossible with a new category of data infrastructure.

We're a seed-stage enterprise-focused data infrastructure startup, and we're looking for an experienced machine learning engineer who can shape the technical direction of our product, work directly with our customers, and help define company culture. Our ideal next teammate thrives in ambiguity, is excited by iteration, loves data, and wants to have significant influence in creating the company's foundation.

The impact you’ll have

Organizations depend on our anonymous query engine for running analytics workloads on sensitive data, and we’re building an enterprise-grade product alongside our early customers. As a part of the core team, your responsibilities will include:

Owning the design and delivery of critical components of our core product, including:

designing and implementing extensions to our generative model training and evaluation pipelines, as well as our market-leading privacy analysis pipelines - these run in customer environments and leverage technologies like Kubernetes, Ray, and Modin
directly engaging with external partners to shape our anonymization methods and implement, test, and launch the results
defining and implementing internal and customer-facing quality measures for evaluating the quality of the data we generate

Interfacing directly with customers and shaping the product to solve their problems in scalable ways
Driving the direction of the product as a very early team member, and collaborate with technical and non-technical team members and founders
Actively help the company scale as we grow!

Here are some real example projects that you would’ve worked on over the past few months:

Re-architect privacy pipelines to accommodate privacy standards beyond HIPAA. Work with other engineers to develop a system-wide design for privacy standards and modularize existing pipelines so that privacy checks can be efficiently executed in a variety of configurations.
Launch support for longitudinal datasets. This includes extending our synthesis pipelines with generative models for generating high-quality longitudinal datasets, working with external privacy auditors to define privacy checks for longitudinal data, and implementing those checks.
Interview customers and prototype functionality for scalably anonymizing free text, ex: clinical notes and customer support request tickets.
Convert pipelines from Pandas to Modin/Ray to dramatically increase how much data the system can process at once, and standardize / simplify hardware requirements for customer installs.

The company

Subsalt makes it safer, faster, and easier for companies to use their sensitive data. We’re starting in healthcare, where we unlock the use of HIPAA-restricted healthcare data for machine learning and advanced analytics, and we’re finding similar needs in other regulated industries (financial services, defense, insurance, Fortune 500s).

Our core product is the world’s first anonymous query engine, a Postgres-compatible query engine that generates anonymized, statistically-representative synthetic data on demand. We’re writing the playbook for how these novel systems work, leveraging state-of-the-art generative ML to build an infrastructure product capable of anonymizing sensitive data at scale.

We're a fully-remote company distributed across the US, and everybody can work from anywhere. We rendezvous in person each quarter to jam on ideas, plan for the future, and have some fun. We’re building a writing-heavy culture – from technical designs to internal go-to-market decisions, we’ve found that writing is a great way to clarify and communicate the path forward that leads to simpler, more productive outcomes.

Requirements

We are looking for exceptional engineers who are driven to find simple solutions to complex problems and who are excited to stretch themselves as part of a growing team at the intersection of distributed systems, data infrastructure, and machine learning.

4+ years of production-level experience in Python
Experience building scalable data pipelines capable of handling large data volumes using Kubernetes and Ray (or similar)
Strong fundamentals with machine learning frameworks like Pytorch (or similar). Bonus points for specific experience with NLP, LLMs, GANs, and other generative methods.
Experience delivering results on open-ended technical problems with high degree of autonomy
Comfortable with written and verbal communication for both technical and non-technical audiences
Thrive in high velocity, collaborative early startup environments. Excited to work directly with customers to understand and solve their problems in a scalable way.
Preferred: experience building infrastructure products where pipelines directly affect the product experience, and run in SaaS or customer environments

This is a “hands-on” role, and a big part of the job will be identifying problems, designing solutions, and then implementing them. We move quickly but intentionally, and we’ll expect you to learn new technologies and methodologies as we outgrow our existing ones.

Benefits

Fully remote, distributed team that gets together quarterly for in-person events
Generous seed-stage equity grants
Healthcare, dental
Ambitious but intentional culture focused on shipping high-quality software that solves real problems
$3k home office improvement budget

Get 10x more interviews and get hired faster.

JobHire.AI is the first-ever AI-powered job search automation platformthat finds and applies to relevant job openings until you're hired.

Registration