C++ Engineer - AI Runtime (Ukraine)

Jobs
baasi

baasi

-

🌎 Remote

Posted on: 3 October, 2025

C++ Engineer - AI Runtime (Ukraine)

This a Full Remote job, the offer is available from: Ukraine

About Us

We are a stealth-mode startup building next-generation infrastructure for the AI industry. Our team has decades of experience in software, systems, and deep tech. We are working on a new kind of AI runtime that pushes the boundaries of performance and flexibility making advanced models portable, efficient, and customizable for real-world deployment.

If you want to be part of a small, fast-moving team shaping the future of applied AI systems, this is your opportunity.

Role

We are looking for a C++ Engineer, based in Ukraine, with strong systems and GPU programming background to help extend and optimize an open-source AI inference runtime. You will work on low-level internals of large language model serving, focusing on:

  • Dynamic adapter integration (e.g., LoRA/QLoRA)
  • Incremental model update mechanisms
  • Multi-session inference caching and scheduling
  • GPU performance improvements (Tensor Cores, CUDA/ROCm)

This is a hands-on role: you will be designing, coding, profiling, and iterating on high-performance inference code that runs directly on CPUs and GPUs.

Responsibilities

  • Implement support for runtime adapter loading (LoRA), enabling models to be customized on the fly without retraining or model merges.
  • Design and implement mechanisms for incremental model deltas, allowing models to be extended and updated efficiently.
  • Extend runtime to handle multi-session execution, with isolation and caching strategies for concurrent users.
  • Optimize core math kernels and memory layouts to improve inference performance on CPU and GPU backends.
  • Collaborate with backend and infrastructure engineers to integrate your work into APIs and orchestration layers.
  • Write benchmarks, unit tests, and profiling tools to ensure correctness and measure performance gains.
  • Contribute to system architecture discussions and help define the roadmap for future runtime features.

Requirements

  • Strong proficiency in modern C++ (C++14/17/20) and systems programming.
  • Solid understanding of low-level performance optimization: memory management, multithreading, SIMD, cache efficiency.
  • Experience with CUDA and/or ROCm/HIP GPU programming.
  • Familiarity with linear algebra kernels (matrix multiply, attention) and how they map to hardware acceleration (Tensor Cores, BLAS libraries, etc.).
  • Exposure to machine learning inference frameworks (e.g., llama.cpp, TensorRT, ONNX Runtime, TVM, PyTorch internals) is a plus.
  • Comfortable working in a Unix/Linux environment; experience with build systems (CMake, Bazel) and CI pipelines.
  • Strong problem-solving and debugging skills; ability to dive deep into both code and performance traces.
  • Self-motivated and able to thrive in a fast-moving startup environment.

Nice to Have

  • Experience implementing LoRA or adapter-based fine-tuning in inference runtimes.
  • Knowledge of quantization methods and deploying quantized models efficiently.
  • Background in distributed systems or multi-GPU orchestration.
  • Contributions to open-source ML/AI systems.

Why Join

  • Build core IP at the intersection of AI and systems engineering.
  • Work with a highly technical founding team on problems that are both intellectually challenging and commercially impactful.
  • Opportunity to shape the direction of a new AI platform from the ground up.
  • Competitive compensation (contract or full-time), equity potential, and flexible remote work.

Please Use this link to apply to this job: https://www.baasi.com/career/apply/3136319

This offer from “baasi” has been enriched by Jobgether.com and got a 0% flex score.

Tags:
ai
ml
Share the job:

Related Jobs