Simit and a Berliner

A personal blog by Ahmet Kerem Aksoy.

Two simits and a berliner.

DistroMash


Since my last post, I wrote my master thesis and graduated from the university. It ended up being a pretty good thesis and I received the highest grade. I decided to open source the project to grab some stars on Github. I think I should advertise it more because it works actually pretty good. Anyways, in this post, I will talk about my thesis a bit and share the project with you. If you are interested in Docker containers, distributed systems or IPFS, buckle up.


Containers are almost the da facto way of distributing software nowadays. Docker is everywhere. My local database for personal projects run in a Docker container. The development environment that we use at work runs in a Docker container. The ML models that I deployed run in Docker containers. It is so versatile and easy.


I am not specifically interested in Docker. In the beginning of 2023 I was looking for a thesis topic, where I can get my hands dirty. I wanted something that involves engineering. I found a topic from the Distributed Systems and Operating Systems group at the university. It was about writing a framework to distributed Docker images faster and better in edge computing environments. It involved not only finding solutions to a set of problems but also implementing a whole framework to realize these solutions and offer a complete system. Also, I have been hearing great things about Go and wanted to learn it for a long time. So this could be my frame to do this.


I picked up the topic and it turned out better than I expected. I learned bunch of stuff about peer-to-peer networks, distributed systems, IPFS and Docker. I found solutions to all the problems that I stated in my thesis and created a functional system. I called the system DistroMash. Distro representing distribution and Mash representing a mesh of nodes but with a instead of e, as in mashed potato.


DistroMash architecture

In simple terms, DistroMash distributes Docker image to an environment faster than the official Docker registry does under bad network conditions. It is especially useful in edge environments as the uplink connection to the cloud is slow and prone to congestions. DistroMash achieves its superior performance by distributing the download load to a peer-to-peer network using IPFS. Furthermore, DistroMash proposes a framework including strategies on how to distribute Docker images to the environment such as distribute certain images to certain nodes or to a percentage of the environment. The proposed framework offers support for multiple platforms and architectures.


If you want to check DistroMash I leave the Github link to the repository here.


As I am being lazy writing this, I am leaving the abstract of my thesis below just in case you are interested in:


Container-based virtualization has emerged as a widely adopted and effective approach for deploying applications on edge devices with the proliferation of IoT use cases. The lightweight and portable nature of containers makes them well-suited for deployments in edge environments featuring heterogeneous devices with limited resources. A significant limitation of container deployments in large edge environments is scaling the distribution of images from a centralized registry to edge devices. Large amounts of download requests create a bottlenecked link to the centralized registry, leading to increased deployment times. Previous work tackles this problem by decentralizing the container image distribution, employing peer-to-peer connections between the participating nodes. Nodes share image layers between themselves without a central authority, avoiding trips to the central registry. However, several challenges must be addressed and integrated into existing flows to apply decentralized container image distribution solutions in edge environments in a non-intrusive and flexible way. The lack of support for distributing container images across diverse platforms and architectures presents difficulties in coping with the heterogeneous characteristics of edge sites. The absence of decentralized container image discovery procedures poses challenges in identifying images within a peer-to-peer network that spans an edge site. Lastly, the deficiency of deployment strategies that intelligently distribute container images within edge sites, aligning with the diverse requirements of various IoT use cases and applications while accommodating the constraints of edge devices, results in an optimized and inefficient distribution process. Devising such strategies would ultimately reduce container start times, facilitating faster deployments. To fill this gap, DistroMash is designed and implemented in this thesis. DistroMash is a decentralized container image distribution and replication system that incorporates multi-platform support, container image discovery, and replication strategies that are managed by a REST API. The proposed solution is evaluated against multiple baselines in a simulated edge environment. The conducted evaluation shows that DistroMash accelerates container image distribution while fulfilling the necessary conditions for integration with edge environments. DistroMash is open-sourced to facilitate and motivate further research and collaboration in the field of container image distribution and replication in edge computing.


Arrivederci!