PhD position: Designing fault-tolerant distributed algorithms for mobile edge computing

Keywords: Distributed system; configuration systems; consistency models; replication; distributed state/data management; edge computing.


The design of distributed systems has become increasingly important to provide fault-tolerant services with high availability. Most internet services rely on large amounts of geo-distributed resources of datacenter/cloud infrastructures to tolerate failures and enhance data availability to popular applications, like video stores and social networking. Although emerging application for mobile edge computing are likely to require similar levels of fault tolerance, the underlying resources and topology impose important constraints on the design of distributed algorithms.
Indeed, ensuring fault-tolerant distributed computing on highly mobile, constrained infrastructures, such as a swarm of drones or a constellation of satellites, is challenging. Such systems are likely to require low-latency, concurrent message exchanges and prompt data availability. However, system designers should cope with a number inherent uncertainties, including unstable, heterogeneous resources availability (including restricted computing power and energy), (very) limited bandwidth, much tighter timing constraints, and eventually frequent network partitions.
Suitable fault-tolerant distributed algorithms are therefore required in order to efficiently implement emerging services on edge systems.

Proposed research

Data availability and fault tolerance of a reliable distributed system are commonly guaranteed by a replication protocol based on replicated state machine (RSM). Such a protocol implements a consensus algorithm, like Multi-Paxos and Raft, in order to provide strong consistency throughout distributed, replicated data. In fact, strongly consistent replication is key to efficient implementation of critical distributed systems' building blocks, like distributed lock manager or transactional key-value store.
Yet, a poorly planned implementation of different algorithmic approaches (such as leaderless/leader-based, quorum systems, optimizations, etc.) often introduces a trade-off between fault resiliency and efficiency. In this project, we aim to investigate this trade-off in detail on swarms of mobile, resource-constrained devices in a systematic manner. Ultimately, the goal of the research is to design, implement and evaluate a resilient, efficient replication protocol for mobile edge computing.

Requirements and application

In this project, we intend to explore both fundamental and applied aspects of distributed computing. In particular, we aim to design novel, fault-tolerant distributed algorithms to tackle real-world, emerging problems with mobile edge computing, on swarm of unmanned aerial vehicles (UAVs) and constellations of satellites.
Candidates to this position should hold a Master's degree in Computer Science/Informatics or a related field by the starting date of the PhD. They must be excited by research in systems, distributed systems, distributed algorithms, databases, and/or programming languages, and should have an excellent academic record in one of these areas. Familiarity with machine learning and graph algorithms would be appreciated, but is not essential. Teamwork, communication skills and industrial experience is a plus.
Knowledge of French is not required.
To apply, please send the following information to position: fault-tolerant distributed algorithms):

  • Curriculum Vitæ
  • Letter of motivation that should describe the applicant's background in the areas of the project, reason for interest in the project, and future plans
  • A list of courses and grades of the last three years of study (an informal transcript is OK).
  • Names and contact details of at least two references (people who can recommend you), whom we will contact directly.
  • If relevant, a link to your publications and/or open-source developments.

Application deadline: as soon as possible (preferable by 5 July 2022).
This fully-funded PhD starts on 1 October 2022 and the duration of the contract/scholarship is 3 years.

About ENAC

The ENAC, National School of Civil Aviation, is located in Toulouse, France, the centre of the European aerospace industry (e.g., AirBus, Thales, and CNES). It offers an ideal working environment, where researchers can focus on developing new ideas, collaborations and projects.
The proposed research will be developed in the ENAC research laboratory, ENAC Lab. Our research topics include UAVs systems, aviation safety and security, sustainable transportation development, and aeronautical computer-human interactions. For further information, please consult our site.