Senior Solutions Architect (AI/ML)
Proximity Works
Santa Clara, california
Job Details
Full-time
Full Job Description
We are looking for a Senior Solutions Architect to design, develop, and scale innovative AI/ML-driven solutions. You will be responsible for architecting highly scalable, low-latency distributed systems optimized for AI/ML workloads. As a key technical leader, you will solve complex challenges, influence next-generation AI/ML infrastructures, and guide cross-functional teams to deliver state-of-the-art solutions for fast-growing startups and enterprise companies.
Be at the forefront of shaping next-generation AI/ML infrastructures, driving solutions for high-impact products across diverse industries. You'll have the opportunity to influence key architectural decisions and enable real-world applications that scale globally, ensuring innovation and efficiency at every step.
Requirements
You'll be responsible for —
Driving end-to-end GenAI architecture and implementation:
- Design and deploy multi-agent systems using modern frameworks (LangGraph, CrewAI, AutoGen)
- Architect RAG solutions with advanced vector store integration
- Implement efficient fine-tuning strategies for foundation models
- Develop synthetic data generation pipelines for training and testing
Leading ML infrastructure and deployment:
- Design high-performance model serving architectures
- Implement distributed training and inference systems
- Establish MLOps practices and pipelines
- Optimize cloud resource utilization and costs
- Set up monitoring and observability solutions
Driving technical excellence and innovation:
- Define architectural standards and best practices
- Lead technical decision-making for AI/ML initiatives
- Ensure scalability and reliability of AI systems
- Implement AI governance and security measures
- Guide teams on advanced AI concepts and implementations
Overseeing production AI systems:
- Manage model deployment and versioning
- Implement A/B testing frameworks
- Monitor system performance and model drift
- Optimize inference latency and throughput
- Ensure high availability and fault tolerance
Fostering collaboration and growth:
- Mentor engineering teams on AI architecture
- Collaborate with stakeholders on technical strategy
- Drive innovation in AI/ML solutions
- Share knowledge through documentation and training
- Lead technical reviews and architecture discussions
You need —
8+ years experience in software engineering or architecture, including:
- 4+ years leading cross-functional GenAI/ML teams
- Production experience with distributed AI systems
- Enterprise-scale AI architecture implementation
To lead and architect enterprise-scale GenAI/ML solutions, focusing on:
- Multi-agent orchestration using LangGraph, CrewAI, and AutoGen
- Workflow automation with LlamaIndex, LangChain, and LangFlow
- Agent coordination using LETTA framework
- Integration of specialized agents for reasoning, planning, and execution
To design and implement sophisticated AI architectures incorporating:
Advanced RAG systems using:
- Vector databases (Chroma, Weaviate, Pinecone, Milvus)
- Hybrid search with BM25 and semantic embeddings
- Self-querying and recursive retrieval patterns
Fine-tuning strategies for foundation models:
- PEFT methods (LoRA, QLoRA, Adapter-tuning)
- Parameter-efficient training approaches
- Instruction fine-tuning and RLHF
Multi-agent frameworks integrating:
- Tool-use and reasoning chains
- Memory systems (short-term and long-term)
- Meta-prompting and reflection mechanisms
- Agent communication protocols
Expertise advanced data generation and synthesis:
- Synthetic data generation using Arigilla and PersonaHub
- Privacy-preserving data synthesis
- Domain-specific data augmentation
- Quality assessment of synthetic data
- Data balancing and bias mitigation
To architect high-performance ML serving infrastructure focusing on:
- Model serving platforms (BentoML, Ray Serve, Triton)
- Real-time processing with Ray, Kafka, and Spark Streaming
- Distributed training using Horovod, DeepSpeed, and FSDP
- vLLM and TGI for efficient inference
- Integration patterns for hybrid cloud-edge deployments
To drive cloud architecture decisions across:
- Kubernetes orchestration with Kubeflow and KServe
- Serverless ML with AWS Lambda, Azure Functions, Cloud Run
- Auto-scaling using HPA, KEDA, and custom metrics
- Resource optimization with Nvidia Triton and TensorRT
- MLOps platforms (MLflow, Weights & Biases, DVC)
Benefits
Bonus points for —
- Research publications in AI/ML
- Open-source project maintenance
- Technical blog posts on AI architecture
- Conference presentations
- AI community leadership
What you get —
- Best in class salary: We hire only the best, and we pay accordingly.
- Proximity Talks: Meet other designers, engineers, and product geeks — and learn from experts in the field.
- Keep on learning with a world-class team: Work with the best in the field, challenge yourself constantly, and learn something new every day.
About us —
We are Proximity — a global team of coders, designers, product managers, geeks, and experts. We solve complex problems and build cutting-edge tech at scale. Here's a quick guide to getting to know us better:
- Watch our CEO, Hardik Jagda, tell you all about Proximity.
- Read about Proximity's values and meet some of our Proxonauts here.
- Explore our website, blog, and the design wing — Studio Proximity.
- Get behind the scenes with us on Instagram! Follow @ProxWrks and @H.Jagda