Raul D. Steleac

Publications:

ICLR

Inter-Agent Relative Representations for Multi-Agent Option Discovery

Raul D. Steleac, Mohan Sridharan, and David Abel

In The Fourteenth International Conference on Learning Representations (ICLR), 2026

Abs URL PDF Code

Temporally extended actions improve the ability to explore and plan in single-agent settings. In multi-agent settings, the exponential growth of the joint state space with the number of agents makes coordinated behaviours even more valuable. Yet, this same exponential growth renders the design of multi-agent options particularly challenging. Existing multi-agent option discovery methods often sacrifice coordination by producing loosely coupled or fully independent behaviours. Toward addressing these limitations, we describe a novel approach for multi-agent option discovery. Specifically, we propose a joint-state abstraction that compresses the state space while preserving the information necessary to discover strongly coordinated behaviours. Our approach builds on the inductive bias that synchronisation over agent states provides a natural foundation for coordination in the absence of explicit objectives. We first approximate a fictitious state of maximal alignment with the team, the Fermat state, and use it to define a measure of spreadness, capturing team-level misalignment on each individual state dimension. Building on this representation, we then employ a neural graph Laplacian estimator to derive options that capture state synchronisation patterns between agents. We evaluate the resulting options across multiple scenarios in two simulated multi-agent domains, showing that they yield stronger downstream coordination capabilities compared to alternative option discovery methods.
IROS

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Aleksandar Krnjaic, Raul D. Steleac, Jonathan D. Thomas, Georgios Papoudakis, Lukas Schäfer, Andrew Wing Keung To, Kuan-Ho Lao, Murat Cubuktepe, Matthew Haley, Peter Börsting, and Stefano V. Albrecht

In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

Abs URL DOI PDF

We consider a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance in this task. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), and different types of order-picking paradigms (e.g. Goods-to-Person and Person-to-Goods), as the agents can learn how to cooperate optimally through experience. We develop hierarchical MARL algorithms in which a manager agent assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency over baseline MARL algorithms and overall pick rates over multiple established industry heuristics in a diverse set of warehouse configurations and different order-picking paradigms.

Professional Experience:

Machine Learning Scientist

Developed transformer-based architectures for a Natural-language pipeline that extracts valuable financial information from chats between investment banking officials and clients aiming to assist traders in their daily transactions leading to more efficient and precise deals.

Jan. 2023 – Aug. 2023
London, UK

Healx

Machine Learning Engineer

Developed NLP methods to construct biomedical knowledge graphs for drug discovery in rare diseases. Designed and implemented a Contextual Entity Linking transformer-based architecture that successfully disambiguates and maps in-sentence entities to internal biomedical ontologies.

Nov. 2021 – Jan. 2023
Cambridge, UK

Intel

Software Development Intern

Contributed to the development of two versions of the Intel Movidius Visual Processing Unit chip, used to accelerate computations inside neural networks for real-time applications like drones and robots.

Dec. 2018 – Aug. 2020
Timișoara, Romania

Nokia Networks

Junior Software Developer

Investigated and resolved software issues in C++ within the Fault Detection and Alarm Raising department, applying object-oriented methodologies.

Oct. 2017 – Dec. 2018
Timișoara, Romania

Education:

University of Edinburgh

PhD in Robotics and Autonomous Systems

Sept. 2023 – Present

Imperial College London

MSc in Computing (Artificial Intelligence and Machine Learning)

Grade: Distinction.

Relevant courses: Reinforcement Learning, Deep Learning, Probabilistic Inference, Computer Vision, Natural Language Processing.

Thesis: Curriculum Reinforcement Learning in Tabular Methods.

Oct. 2020 – Oct. 2021

Polytechnic University Timisoara

BEng in Computers and Information Technology

Merit scholarships in 7 out of the 8 semesters.

Thesis: End-to-end Speech Emotion Recognition using BLSTMs with Attention layer and Multi-domain training.

Sept. 2016 – July 2020