Jobs

Tether

-

Posted on: 23 June, 2025

Apply

AI Research Engineer (Model Serving & Inference - 100 Remote Spain)

AI Research Engineer (Model Serving & Inference - 100% Remote) About Us

We’re pioneering a global financial revolution. Our solutions empower businesses to seamlessly integrate reserve-backed tokens across blockchains. Our Product Suite

Tether Finance: Our trusted stablecoin USDT is used worldwide, and digital asset tokenization services are available. Innovate with Tether What We Do

Tether Power: Sustainable energy solutions for Bitcoin mining using eco-friendly practices.
Tether Data: AI and P2P technology solutions like KEET for secure data sharing.
Tether Education: Digital learning platforms for global access.
Tether Evolution: Merging technology and human potential for innovative futures. The Role

As an AI model team member, you will innovate in model serving and inference architectures for advanced AI systems. Focus on optimizing deployment and inference for responsiveness, efficiency, and scalability across diverse applications. Responsibilities:

Design and deploy high-performance model serving architectures suitable for various environments, ensuring reduced latency and memory footprint.
Monitor and test inference pipelines, tracking key metrics such as response latency, throughput, and memory usage; document results and compare against benchmarks.
Prepare test datasets and simulation scenarios for real-world deployment challenges, especially on low-resource devices, to evaluate model performance comprehensively.
Analyze and optimize computational efficiency, addressing bottlenecks related to processing and memory, to enhance scalability and reliability.
Collaborate with cross-functional teams to integrate optimized inference frameworks into production, defining success metrics like improved real-world performance and robustness. Requirements:
PhD in NLP, Machine Learning, or related areas, with a proven track record in AI R&D and publications.
Extensive experience in large-scale model serving and inference optimization, demonstrating improvements in latency, throughput, and memory footprint, especially on resource-constrained devices.
Deep understanding of modern serving architectures and optimization techniques, including low-latency, high-throughput methods, and memory management.
Strong expertise in C / C++, Triton, ThunderKittens, CUDA; practical experience in deploying inference pipelines on resource-constrained devices.
Ability to apply empirical research to overcome challenges like latency and memory constraints, designing evaluation frameworks and iterating on solutions. Join Our Team

Collaborate with top talent, push boundaries, and set industry standards. If you excel in English and want to contribute to cutting-edge platforms, we’re your place.

Tags:

ai

ml

Apply

Share the job: