Jobs

Modulai

- Full Time

Posted on: 11 November, 2024

Apply

Master Thesis Project - 2025

About Modulai

Modulai’s clients range from startups to multinational companies. They all share that machine learning is central to how they operate, compete, and create value. Our services range from advisory projects and feasibility studies to end-to-end development and refinement of machine learning systems and products. We use state-of-the-art techniques, always focusing on maximizing business impact, delivering solutions in areas such as credit risk, fraud detection, dynamic pricing, recommendation systems, computer vision, natural language processing, opportunity spotting, logistics optimization, up-sell, cross-sales, smart building optimization, predictive maintenance, and route planning.

Facts

When doing a master thesis project at Modulai, you are invited to all team activities such as daily stand-ups, weekly learning breakfasts, monthly AWs, and other team activities. We look forward to having you as part of our team and expect you to work as much as possible in the office.

One of the projects will be based in Gothenburg, and one in Stockholm.

We have a strong history of master’s thesis students joining Modulai for their first job in machine learning engineering. We are excited to explore this opportunity with you!

Unified mixed-modal transformers for efficient understanding and generation (STOCKHOLM)

Background & Description

Modulai is offering a master’s thesis opportunity focused on developing cutting-edge models capable of processing and generating across multiple data modalities (text, images, video and audio) within a unified framework. Current state-of-the-art multimodal models often separate tasks like visual understanding and text generation. Still, recent advancements in unified transformers demonstrate the potential to handle these tasks efficiently within a single architecture.

The project will involve designing and experimenting with mixed-modal models that incorporate both autoregressive methods for text generation and diffusion-based techniques for continuous data (such as images and video). You will explore how to fuse different types of data representations—discrete tokens for text and continuous/discrete vectors for visual data—into a unified model capable of performing tasks like text-to-image generation, visual question answering, and more.

The goal is to develop and investigate a scalable, unified mixed-modal model for a set of domains. This model should be capable of efficiently handling multiple data modalities within a single architecture. You will compare the mixed model’s performance against other state-of-the-art multimodal models and/or traditional modality-specific architecture. The comparison will focus on key factors such as overall performance, computational efficiency and potential for fine-tuning across specific domains.

ML techniques and tools

Transformer-based architectures
Diffusion models
Multimodality
Fine-tuning strategies
Python, PyTorch, Git

References:

https://arxiv.org/pdf/2405.09818

https://arxiv.org/pdf/2408.12528

https://arxiv.org/pdf/2408.11039

Large to small language model distillation thesis project (GOTHENBURG)

Background & Description

Modulai is offering a master’s thesis opportunity focused on knowledge distillation of large language models. Knowledge distillation, a concept popularised by Hinton et al. in 2015, involves transferring knowledge from a larger, complex “teacher” model to a smaller, more efficient “student” model. The student model learns to replicate the behaviour of the teacher model by minimising the differences in their output. A key advantage of knowledge distillation, as opposed to training the student model from scratch, is that the teacher model provides more informative soft labels (distributions across the vocabulary at each prediction step). These soft labels offer a stronger learning signal compared to the hard, one-hot labels available in regular pre-training or fine-tuning.

In recent years, there has been a surge of promising research in this field, particularly focusing on applying these techniques to LLMs. There are many examples recently released such as Gemini Flash, GPT4o mini and LLama 3.2 1B and 3B created using knowledge distillation. Common methods include combining knowledge distillation with weight pruning or using reinforcement learning and imitation learning to help guide the training process. However, the actual details of how the large labs create these models are in general not known.

The goal of this project is to research new distillation methods and develop a compact and efficient language model based on open weight student- and teacher-models. During the process we will shed some light into how state of the art application of knowledge distillation is done, increasing the community’s knowledge.You will familiarise yourself with the latest advances in knowledge distillation, implement techniques from research papers, and experiment with different approaches. You will compare the performance of different distillation approaches, as well as baseline models, in terms of both model quality and computational efficiency.

ML techniques and tools

Transformer-based architectures
Knowledge distillation strategies
Local LLMs
Python, PyTorch, Git

References

LLM Pruning and Distillation in Practice: The Minitron Approach: https://arxiv.org/pdf/2408.11796

Compact Language Models via Pruning and Knowledge Distillation: https://www.arxiv.org/pdf/2407.14679

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes: https://arxiv.org/pdf/2306.13649

Gemma 2: Improving Open Language Models at a Practical Size: https://arxiv.org/pdf/2408.00118

Distilling the Knowledge in a Neural Network: https://arxiv.org/pdf/1503.02531 3. Open Application within Applied Machine Learning

Applied Machine Learning projects encompass a wide range of domains, including healthcare, finance, natural language processing, computer vision, and more. This open application invites students to choose projects aligned with their interests and career goals. Do you have an idea - let us know what it’s about by describing it when applying.

Required Skills

Finishing a master’s in machine learning or a master’s in another field but with courses in machine learning and programming added

Please include the following in your application:

Link to relevant GitHub account if available.
Grades for bachelor’s and master’s.
Updated CV or an updated LinkedIn profile.
Preferred location - Stockholm / Gothenburg
Suitable candidates will be called to one interview before making a final decision. The last date for application will be the 25th of October, but if suitable candidates apply, the process will end beforehand.

Tags:

ai

ml

Apply

Share the job: