Logo
Product & Platform Engineering
Product & Platform Engineering
Product Thinking | Workflows
Explore More
Data & AI
Data & AI
Data Engineering | ML | Analytics | Visualization
Explore More
Digital Engineering
Digital Engineering
Responsive Front End | Backend | Microservices
Explore More
Mobility
Mobility
Native | Cross Platform | Internet of Things
Explore More

Featured Insights

How BigThinkCode transforms Demand Planning for Supply Chain management?
How BigThinkCode transforms Demand Planning for Supply Chain management?
View More
AI-Driven Logistics: Redefining Supply Chain Operations
AI-Driven Logistics: Redefining Supply Chain Operations
View More
Real-Time Personalization with Streaming Data: How E-Commerce Uses API Logs to Boost Sales
Real-Time Personalization with Streaming Data: How E-Commerce Uses API Logs to Boost Sales
View More
AgriTech Innovations
E-commerce & Retails
Healthcare
Industrial Automation 4.0
Banking
Education
Transportation & logistics
Insurance
Life Sciences
Travel & hospitality
Investment Banking
Telecom
Public Sector
Financial Institutions
Supply Chain Management (SCM)
Business Intelligence Systems (BIS)
Open source & community driven
Marketing & Sales
CompanyInsightsContactJoin us
  • What we doSub Menu
    • Product & Platform Engineering
    • Data & AI
    • Digital Engineering
    • Mobility
  • Who we work withSub Menu
    • AgriTech Innovations
    • E-commerce & Retails
    • Healthcare
    • Industrial Automation 4.0
    • Banking
    • Education
    • Transportation & logistics
    • Insurance
    • Life Sciences
    • Travel & hospitality
    • Investment Banking
    • Telecom
    • Public Sector
    • Financial Institutions
    • Supply Chain Management (SCM)
    • Business Intelligence Systems (BIS)
    • Open source & community driven
    • Marketing & Sales
  • Company
  • Insights
  • Contact
  • Join us

The Anatomy of a Distributed Chat Application built with Elixir's Libcluster and Horde

A chat application built with Libcluster and Horde, which is highly resilient to node failures and ensures flawless user experience.

The Anatomy of a Distributed Chat Application built with Elixir's Libcluster and Horde

Introduction

ElixirChat is a Phoenix LiveView-based communication application that enables users to create rooms and allows chat with other users in those rooms. The app is deployed in a distributed environment that manages distributed state, but ensuring consistency throughout the active session of a chat can be challenging when nodes are transient. In this blog, we will discuss how we address this challenge by using LibclusterLibcluster, Horde SupervisorHorde Supervisor, and Horde Registry.

Clustering with Libcluster and Horde Registry

One way we address the challenge of maintaining a consistent state in a distributed environment is by using libcluster. Libcluster is a library that automatically creates clusters of Elixir replicas that are aware of each other. With libcluster's cluster strategies, BEAM nodes can communicate with each other, allowing Horde to manage persistent processes across all replicas. To facilitate clustering, a sample cluster formation using k8s is provided as part of the repo.

Cluster formation using k8s
def start(_type, _args) do
  topologies = [
    k8s_chat: [
      strategy: Elixir.Cluster.Strategy.Kubernetes.DNS,
      config: [
        service: "chat-svc-headless",
        application_name: "chat",
        polling_interval: 3_000          
      ]
    ]
  ]

ChatServer is where a chat room's message is saved/managed in memory. ChatServer is a process that gets created for each room and its reference is stored in the horde registry. These ChatServers (which are GenServers) can exist in any node i.e., a chat room can be present in any of the nodes in the libcluster cluster.

Dynamic Supervision with Horde Supervisor

To enable the supervisor to supervise long-running processes (i.e., chats) across the dynamic cluster and monitor them, we use a distributed dynamic supervisor called Horde Supervisor. It supervises child processes across multiple nodes in a cluster, and dynamically restructures the supervision tree based on node membership changes. When a node goes down, the signal is caught by the ChatServer’s Genserver and StateHandoff.handoff is invoked. This ensures that the application maintains its resilience even in the face of node failures.

code
defmodule Chat.StateHandoff do
  use GenServer
  require Logger

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end
  
  def child_spec(opts \\ []) do
    %{
      id: __MODULE__,
      start: {__MODULE__, :start_link, [opts]}
    }
  end
        
  # join this crdt with one on another node by adding it as a neighbour
  def join(other_node) do
    # the second element of the tuple, { __MODULE__ , node} is a syntax that 
    # identifies the process named __MODULE__ running on the other node other_node 
    Logger.warn("Joining StateHandoff at #{inspect other_node}") 
    GenServer.call(__MODULE__, { :add_neighbours, { __MODULE__, other_node } })
  end

  # store a room_id and messages in the handoff crdt
  def handoff (room_id, messages) do
    GenServer.call(__MODULE__,{ :handoff, room_id, messages })
  end

  # pickup the stored messages for a room
  def pickup (room_id) do
    GenServer.call(__MODULE__, { :pickup, room_id }) 
  end
  

Distributed Process Registry with Horde Registry and State Management using DeltaCRDT

To manage distributed state, the state of chat rooms (messages) is kept in memory, and a handoff occurs when a replica containing the state fails due to node shutdown. The normal shutdown message sent by the node is intercepted, and the DeltaCRDTDeltaCRDT is handed off with the current state before being put to sleep so it can synchronize with other nodes. We use the DeltaCRDT to maintain the state, which is kept synchronized throughout the cluster using a Conflict-free Replicated Data Type (CRDT). DeltaCRDT library helps in synchronizing the state. We can imagine it as a distributed map that is stored/synchronized across all nodes.

Flow Diagram 1
Flow Diagram 2

The above diagrams show that each node has its respective instance of Horde Registry, DeltaCRDT, and a supervisor. The ChatServers' GenServers are registered in the Horde Registry to keep track of them across all replicas/nodes. DeltaCRDT is used for synchronizing states across multiple nodes. Finally, the supervisors monitor the chat processes across all nodes and provide quick recovery in case of node failures.

Conclusion

Overall, the combination of Libcluster, Horde Supervisor, and Horde Registry allows the ElixirChat application to maintain a consistent state across a dynamic cluster of nodes, making it highly resilient to node failures and ensuring a seamless user experience. With this setup, we are confident that we can provide a reliable chat service to our users while also being able to scale horizontally as needed.

Talk to us for more insights

What more? Your business success story is right next here. We're just a ping away. Let's get connected.

Contact Us

Quick Links

  • Featured Insights
  • Contact us

Domains

  • Financial Institutions & Insurance
  • Healthcare
  • E-commerce & Retails
  • Open source & Community driven
  • Industrial Automation 4.0
  • AgriTech Innovations
  • Public Sector
  • Education
  • Banking
  • Life Sciences
  • Transportation and logistics
  • Travel and hospitality

Services

  • Product & Platform Engineering
  • Data & Artificial Intelligence
  • Digital Engineering
  • Mobility

Company

  • About us
  • Careers
  • Privacy Policy
  • Information Security Policy
  • Reach Us

Reach Us

India
Chennai, TamilNadu
United States
Wilmington, DE
Email
info@bigthinkcode.com
Copyright © 2025 BigThinkCode Technologies.