- Full Time
Posted on: 11 November, 2024
Master Thesis Project - 2025
About Modulai
Modulai’s clients range from startups to multinational companies. They all share that machine learning is central to how they operate, compete, and create value. Our services range from advisory projects and feasibility studies to end-to-end development and refinement of machine learning systems and products. We use state-of-the-art techniques, always focusing on maximizing business impact, delivering solutions in areas such as credit risk, fraud detection, dynamic pricing, recommendation systems, computer vision, natural language processing, opportunity spotting, logistics optimization, up-sell, cross-sales, smart building optimization, predictive maintenance, and route planning.
Facts
When doing a master thesis project at Modulai, you are invited to all team activities such as daily stand-ups, weekly learning breakfasts, monthly AWs, and other team activities. We look forward to having you as part of our team and expect you to work as much as possible in the office.
One of the projects will be based in Gothenburg, and one in Stockholm.
We have a strong history of master’s thesis students joining Modulai for their first job in machine learning engineering. We are excited to explore this opportunity with you!
Background & Description
Modulai is offering a master’s thesis opportunity focused on developing cutting-edge models capable of processing and generating across multiple data modalities (text, images, video and audio) within a unified framework. Current state-of-the-art multimodal models often separate tasks like visual understanding and text generation. Still, recent advancements in unified transformers demonstrate the potential to handle these tasks efficiently within a single architecture.
The project will involve designing and experimenting with mixed-modal models that incorporate both autoregressive methods for text generation and diffusion-based techniques for continuous data (such as images and video). You will explore how to fuse different types of data representations—discrete tokens for text and continuous/discrete vectors for visual data—into a unified model capable of performing tasks like text-to-image generation, visual question answering, and more.
The goal is to develop and investigate a scalable, unified mixed-modal model for a set of domains. This model should be capable of efficiently handling multiple data modalities within a single architecture. You will compare the mixed model’s performance against other state-of-the-art multimodal models and/or traditional modality-specific architecture. The comparison will focus on key factors such as overall performance, computational efficiency and potential for fine-tuning across specific domains.
ML techniques and tools
References:
https://arxiv.org/pdf/2405.09818
https://arxiv.org/pdf/2408.12528
https://arxiv.org/pdf/2408.11039
Background & Description
Modulai is offering a master’s thesis opportunity focused on knowledge distillation of large language models. Knowledge distillation, a concept popularised by Hinton et al. in 2015, involves transferring knowledge from a larger, complex “teacher” model to a smaller, more efficient “student” model. The student model learns to replicate the behaviour of the teacher model by minimising the differences in their output. A key advantage of knowledge distillation, as opposed to training the student model from scratch, is that the teacher model provides more informative soft labels (distributions across the vocabulary at each prediction step). These soft labels offer a stronger learning signal compared to the hard, one-hot labels available in regular pre-training or fine-tuning.
In recent years, there has been a surge of promising research in this field, particularly focusing on applying these techniques to LLMs. There are many examples recently released such as Gemini Flash, GPT4o mini and LLama 3.2 1B and 3B created using knowledge distillation. Common methods include combining knowledge distillation with weight pruning or using reinforcement learning and imitation learning to help guide the training process. However, the actual details of how the large labs create these models are in general not known.
The goal of this project is to research new distillation methods and develop a compact and efficient language model based on open weight student- and teacher-models. During the process we will shed some light into how state of the art application of knowledge distillation is done, increasing the community’s knowledge.You will familiarise yourself with the latest advances in knowledge distillation, implement techniques from research papers, and experiment with different approaches. You will compare the performance of different distillation approaches, as well as baseline models, in terms of both model quality and computational efficiency.
ML techniques and tools
References
LLM Pruning and Distillation in Practice: The Minitron Approach: https://arxiv.org/pdf/2408.11796
Compact Language Models via Pruning and Knowledge Distillation: https://www.arxiv.org/pdf/2407.14679
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes: https://arxiv.org/pdf/2306.13649
Gemma 2: Improving Open Language Models at a Practical Size: https://arxiv.org/pdf/2408.00118
Distilling the Knowledge in a Neural Network: https://arxiv.org/pdf/1503.02531 3. Open Application within Applied Machine Learning
Applied Machine Learning projects encompass a wide range of domains, including healthcare, finance, natural language processing, computer vision, and more. This open application invites students to choose projects aligned with their interests and career goals. Do you have an idea - let us know what it’s about by describing it when applying.
Required Skills
Finishing a master’s in machine learning or a master’s in another field but with courses in machine learning and programming added
Please include the following in your application:
Full Time - 🌎 Remote
Full Time - 🌎 Remote
Full Time - 🌎 Remote
Full Time - 🌎 Remote
Full Time - 🌎 Remote