ML InfrastructureSkolkovo / remoteFull-time

ML Systems / Infrastructure Engineer (AI / LLM)

About the role

YappiX is hiring an ML Systems / Infrastructure Engineer to build and operate infrastructure for AI systems, LLMs, new AI architectures, and high-performance training and inference workflows.

We need someone who understands not only code, but also GPU behavior, memory limits, latency, batching, reproducibility, and distributed systems.

This role is for an engineer who can turn research ideas into reliable engineering systems.

Responsibilities

  • build infrastructure for model training, inference, and evaluation
  • work on GPU performance, memory efficiency, latency, and throughput
  • create reproducible environments for research and production
  • maintain containers, CI/CD, deployment workflows, and internal pipelines
  • support distributed training, distributed inference, and systems-level optimization
  • help the research team move fast without creating infrastructure chaos
  • improve observability, reliability, and engineering reproducibility

Requirements

  • strong Python
  • solid Linux
  • Docker, Git, CI/CD
  • understanding of GPU memory, inference optimization, and distributed systems
  • experience with ML infrastructure, AI/LLM pipelines, or systems engineering
  • strong engineering discipline and attention to detail
  • ability to find bottlenecks independently and solve them

Nice to have

  • CUDA
  • Triton
  • DeepSpeed
  • Ray
  • vLLM
  • Kubernetes
  • Prometheus / Grafana
  • experience with GPU clusters and orchestration

You may not be a fit if

  • you only know standard backend deployment patterns
  • you do not understand the difference between a research prototype and a production system
  • you wait for perfect specs instead of solving the real problem

What we offer

  • a chance to build the AI-first infrastructure layer of a serious technical team
  • work at the intersection of research, infrastructure, and new AI systems
  • meaningful ownership and technical influence
  • a compact team and fast iteration cycles
  • remote / Skolkovo / remote

How to apply

Send your CV, GitHub, and a short note about infrastructure problems you solved to hr@yappix.ru or via https://yappix.ru/en/contact