PhD position: Designing high-performance techniques for mobile computing (applications on IoT/drones)

Keywords: Distributed algorithms; replication; graph algorithms; computational fluid dynamics; edge computing; dynamic networks; replicated state machine (RSM)

Context

The design of distributed systems has become increasingly important to provide reliable services with high availability. Most internet services rely on large amounts of resources of cloud-centric infrastructures to tolerate failures and enhance data availability to users. While such a cloud-centric design has been widely used to implement popular services, like video stores and social networking, this computing model alone is unlikely to fit well to emerging latency-critical applications on dynamic networks, such as mobile networks and wireless networks.
The growing popularity of cloud-centric distributed systems strongly relies on advances in hardware and software components of datacenters. For instance, Li et al. [1] propose a new high-performance replication protocol based on the assumptions that packet losses and reordering are rare inside datacenters. However, these assumptions are likely to be incorrect in dynamic networks.
In a dynamic network, such as a swarm of drones, the design of reliable distributed systems with high performance is challenging. System designers should cope with unstable, heterogeneous resources availability, eventually frequent network partitions and fast-changing data availability requirements. These limitations can lead to a dramatic performance degradation of the distributed system for clients of edge, mobile services.
Suitable techniques and algorithms are therefore required in order to guarantee high throughput and low latency for emerging distributed systems on dynamic networks.

Proposed research

This research project focuses on load balancing techniques and scheduling protocols for achieving high-performance, reliable distributed systems in dynamic networks.
The availability and fault tolerance of a reliable distributed system are commonly guaranteed by a replication protocol based on replicated state machine (RSM). Such a protocol implements a consensus algorithm, like Fast Paxos [2] and Raft [3], in order to provide strong consistency throughout distributed, replicated data. In fact, strongly consistent replication is key to efficient implementation of critical distributed systems' building blocks, like distributed lock manager or transactional key-value store.
The body of research effort on load balancing techniques and scheduling protocols has yielded significant performance gains in cloud-centric systems. Aspnes et al. [4] develop on-line load balancing techniques to minimize the maximum load of requests in distributed computing clusters. Regarding RSM, Alchieri et al. [5] propose a scheduling protocol that simplifies the work done by the scheduler while improving the performance of the system. Yet, further research is needed for evaluating these techniques and protocols in dynamic edge computing, where the availability of nodes' resources is barely unpredictable and the network topologies evolve continuously.
In order to provide a valuable trade-off between enforcing strong consistency and providing high-performance, we are highly interested in investigating new load balancing techniques and scheduling protocols for emerging mobile distributed systems. To conduct this exciting, promising research, we will combine ideas from two disciplines: distributed algorithms and computational fluid dynamics. Therefore, we intend to review the techniques to dynamically handle operations throughout moving replicas based on fast, advanced simulation of system messages flows to enforce ordering and timing constraints.

Requirements and application

In this research, we intend to explore both a fundamental and an applied aspects. In particular, we aim to run real experiments with fleet of unmanned aerial vehicles (UAVs), commonly known as drones, in the UAV experimental flight facility of our campus.
Candidates to this position should hold a Master's degree in Computer Science/Informatics, Mathematics, Physics or a related field by the starting date of the PhD. They must be excited by research in distributed systems, distributed algorithms, computational fluid dynamics, and/or programming languages, and should have an excellent academic record in one of these areas. Familiarity with machine learning and graph algorithms would be appreciated but they are not essential. Teamwork and communication skills are key to this position, and industrial experience is a plus.
Knowledge of French is not required.
To apply, please send the following information to silvestre@enac.fr:

  • Curriculum Vitæ
  • Letter of motivation that should describe the applicant's background in the areas of the project, reasons for interest in the project, and future plans
  • A list of courses and grades of the last two years of study (an informal transcript is enough).
  • Names and contact details of at least two references (people who can recommend you), whom we will contact directly.
  • If relevant, a link to your publications and/or open-source developments.

Application deadline: 31 May 2019.
This PhD starts on 1 October 2019. The duration of this project is three years and it will be fully funded by the French government.

About Onera and ENAC research laboratories

Onera, the French Aerospace Lab, and ENAC, National School of Civil Aviation, are both located in Toulouse, France, the centre of the European aerospace industry. Our research laboratories offer ideal working environments, where researchers can focus on developing new ideas, collaborations and projects.
The proposed research project is a joint effort between Onera and ENAC Lab. Our common research topics include UAVs systems, sustainable transportation development, and safety and security of cyber-physical systems. For further information, please consult our sites: ENAC lab ; Onera.

References

[1] Li, J., Michael, E., Sharma, N. K., Szekeres, A., & Ports, D. R. (2016). Just say NO to Paxos overhead: replacing consensus with network ordering. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (pp. 467-483).
[2] L. Lamport. Fast Paxos. (2006). Distributed Computing, 19(2).
[3] Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm. In 2014 USENIX Annual Technical Conference (USENIX ATC 14) (pp. 305-319).
[4] Aspnes, J., Azar, Y., Fiat, A., Plotkin, S., & Waarts, O. (1997). On-line routing of virtual circuits with applications to load balancing and machine scheduling. Journal of the ACM (JACM), 44(3), 486-504.
[5] Alchieri, E., Dotti, F., & Pedone, F. (2018). Early scheduling in parallel state machine replication. In Proceedings of the ACM Symposium on Cloud Computing (pp. 82-94).