PhD position: Designing fault-tolerant distributed algorithms for mobile edge computing
Keywords: Distributed system; configuration systems; consistency models; replication; distributed state/data management; edge computing.
Context
The design of distributed systems has become increasingly important to
provide fault-tolerant services with high availability. Most internet
services rely on large amounts of geo-distributed resources of datacenter/cloud
infrastructures to tolerate failures and enhance data availability to
popular applications, like video stores and social networking.
Although emerging application for mobile edge computing are
likely to require similar levels of fault tolerance, the
underlying resources and topology impose important constraints
on the design of distributed algorithms.
Indeed, ensuring fault-tolerant distributed
computing on highly mobile, constrained infrastructures, such as a swarm of drones or a
constellation of satellites, is challenging. Such systems are likely to
require low-latency, concurrent message exchanges and prompt data
availability. However, system designers
should cope with a number inherent uncertainties, including unstable, heterogeneous resources availability (including restricted computing power and energy),
(very) limited bandwidth, much tighter timing constraints, and eventually frequent network partitions.
Suitable fault-tolerant distributed algorithms are therefore required
in order to efficiently implement emerging services on edge systems.
Proposed research
Data availability and fault tolerance of a reliable distributed system are
commonly guaranteed by a replication protocol based on replicated state
machine (RSM). Such a protocol implements a consensus algorithm, like
Multi-Paxos and Raft, in order to provide strong consistency
throughout distributed, replicated data. In fact, strongly consistent
replication is key to efficient implementation of critical distributed
systems' building blocks, like distributed lock manager or transactional
key-value store.
Yet, a poorly planned implementation of different algorithmic approaches (such as
leaderless/leader-based, quorum systems, optimizations, etc.) often introduces
a trade-off between fault resiliency and efficiency. In this
project, we aim to investigate this trade-off in detail on swarms of
mobile, resource-constrained devices in a systematic manner. Ultimately, the goal of the research is to design, implement and evaluate a resilient, efficient replication protocol for mobile edge computing.
Requirements and application
In this project, we intend to explore both fundamental and
applied aspects of distributed computing. In particular, we aim to design novel, fault-tolerant distributed algorithms to tackle real-world, emerging problems with
mobile edge computing, on swarm of unmanned aerial vehicles (UAVs) and constellations of satellites.
Candidates to this position should hold a Master's degree in Computer
Science/Informatics or a related field by the starting date of the PhD.
They must be excited by research in systems, distributed systems,
distributed algorithms, databases, and/or programming languages, and
should have an excellent academic record in one of these areas.
Familiarity with machine learning and graph algorithms would be
appreciated, but is not essential. Teamwork, communication skills and
industrial experience is a plus.
Knowledge of French is not required.
To apply, please send the following information to silvestre@enac.fr(Subject=PhD position: fault-tolerant distributed algorithms):
- Curriculum Vitæ
- Letter of motivation that should describe the applicant's background in the areas of the project, reason for interest in the project, and future plans
- A list of courses and grades of the last three years of study (an informal transcript is OK).
- Names and contact details of at least two references (people who can recommend you), whom we will contact directly.
- If relevant, a link to your publications and/or open-source developments.
This fully-funded PhD starts on 1 October 2022 and the duration of the contract/scholarship is 3 years.
About ENAC
The ENAC, National School of Civil Aviation, is located in Toulouse,
France, the centre of the European aerospace industry (e.g., AirBus, Thales, and CNES). It offers an
ideal working environment, where researchers can focus on developing
new ideas, collaborations and projects.
The proposed research will be developed in the ENAC research laboratory,
ENAC Lab. Our research topics include UAVs systems, aviation safety and
security, sustainable transportation development, and aeronautical
computer-human interactions. For further information, please consult our
site.