Distributed SQLite with Elixir

Published 2023-01-08

I recently read about a newer SQLite database tool called Litestream which creates backups of the database in S3 compatible storage after every transaction. Litestream restores from that backup, so when scaling horizontally to a new server, the latest version of the DB will be available. Fly.io seems like the ideal platform and Elixir the perfect language for this solution.

Why

fast data access - the SQLite DB resides on the same server as the app and, with Fly, the app can live close to the user
simple local development - with SQLite, you don’t need to install DB servers locally
low maintenance - SQLite doesn’t require much maintenance
highly distributed - Elixir makes it all possible via it’s distribution capabilities

Getting started

See the companion repository for reference

Create a new phoenix app that uses SQLite as the database:

$ mix phx.new distributed_sqlite --database sqlite3

Launch it as a new fly app:

$ fly launch

type in an app name (or just take the default)
choose any region you like
choose ‘N’ when asked if you want a Postgres database
choose ‘N’ when asked if you want a Redis instance
choose ‘N’ to deploy now

An environment variable, DATABASE_PATH, is needed to indicate which file to use for the SQLite database. Open the fly.toml file and add DATABASE_PATH = /app/distributed_sqlite.db (use any database name you want here) under the [env] section and try to deploy.

$ flyctl deploy

Success. A Phoenix app running on Fly using SQLite. Now, to see it in action.

Counter data model

Lets build a simple, naive counter that just counts the views of each page.

$ mix phx.gen.schema Counter.PageCount page_counts page:string count:integer  $ mix ecto.migrate

Add a Counter module which can add page view counts

# lib/distributed_sqlite/counter.ex  defmodule DistributedSqlite.Counter do
  alias DistributedSqlite.Counter.PageCount
  alias DistributedSqlite.Repo

  def count_page_view(page_name) do
    page_count = Repo.get_by(PageCount, page: page_name)
    case page_count do
      nil -> 
       %PageCount{}
       |> PageCount.changeset(%{count: 1, page: page_name})
       |> Repo.insert()
      %PageCount{} = page_count ->
        page_count
        |> PageCount.changeset(%{count: page_count.count + 1})
        |> Repo.update()
    end
  end
end

Update the page_controller to count views

  # lib/distributed_sqlite_web/controllers/page_controller.ex    alias DistributedSqlite.Counter

  def index(conn, _params) do
    page_view = Counter.count_page_view("home")
    render(conn, "index.html", page_count: page_view.count)
  end

Display the counter on the page

<!-- lib/distributed_sqlite_web/templates/page/index.html.heex -->  <h1> Page Views <%= @view_count %> </h1>

Now deploy again using flyctl deploy and then browse to your site to verify that the count shows and updates when refreshing.

Restoring the DB on deploy

The next problem to deal with is that the database will be wiped on our next deploy since it’s using ephemeral storage.

One way to resolve this is to use a persistent volume (which should be done for production apps). But since this post is all about Litestream, so lets set that up and see how can help with this.

The first step is to create a bucket in some S3 compatible storage. I like to use Digital Ocean Spaces for this, but you can also use AWS if you want.

See Litestream docs for more options

Setup a Digital Ocean space

When you have your storage setup, you’ll need 3 pieces of information

the bucket URL
the access key
the secret key

A configuration file, litestream.yml, is needed for Litestream to function. Create one in the root of the project with the following text (remember to replace the path with YOUR path):

access-key-id: ${LITESTREAM_ACCESS_KEY_ID}  secret-access-key: ${LITESTREAM_SECRET_ACCESS_KEY}

dbs:
  - path: /app/distributed_sql.db
    replicas:
      - url: ${REPLICA_URL}

Now set the three variables in Fly to the values you recorded from when setting up the bucket.

$ flyctl secrets set REPLICA_URL=... LITESTREAM_ACCESS_KEY_ID=... LITESTREAM_SECRET_ACCESS_KEY=...

Next, add Litestream to our Docker image. Add the following lines to Dockerfile as part of the builder phase:

ADD https://github.com/benbjohnson/litestream/releases/download/v0.3.9/litestream-v0.3.9-linux-amd64-static.tar.gz /tmp/litestream.tar.gz  RUN tar -C /usr/local/bin -xzf /tmp/litestream.tar.gz

And in the runner phase add:

COPY --from=builder /usr/local/bin/litestream /usr/local/bin/litestream  COPY litestream.yml /etc/litestream.yml

Update the starting script so that the elixir release is a sub-process of litestream. The easiest way I’ve found to do this is to create a run script called run.sh with the following content:

#!/bin/bash  set -e

# Restore the database if it does not already exist.
if [ -f /app/distributed_sql.db ]; then
  echo "Database already exists, skipping restore"
else
  echo "No database found, restoring from replica if exists"
  litestream restore -v -if-replica-exists -o /app/distributed_sql.db "${REPLICA_URL}"
fi

# Run migrations
/app/bin/migrate

# Run litestream with your app as the subprocess.
exec litestream replicate -exec "/app/bin/server"

Be sure to remove the migration script from fly.toml since it runs in the run.sh script now.

Now update the Dockerfile to use this new script to start the app:

COPY run.sh /scripts/run.sh  RUN chmod 755 /scripts/run.sh

CMD ["/scripts/run.sh"]

Deploying should now start using Litestream to restore the database on deploys and push backups when data changes. You can verify in the monitoring interface of fly. Look for something similar to the image below:

Fly logs showing that Litestream is running

Distributing

With all of this in place, things would work great when running one instance of you app. But as soon as you add another node things get out of whack. Lets see this in action. Scale the app to 2 and see what happens to the data.

$ flyctl scale count 2

Enough refreshing the browser or opening in different windows/tabs and you’ll start to see discrepancies in the view count. This is because we are not replicating the data between the instances. This is a problem Elixir is built for.

Setup the Cluster

Follow the Fly guide to get clustering working

Once clustered, we can begin to replicate our database calls. Add a new GenServer with the following content:

# /lib/distributed_sqlite/repo_replication.ex  defmodule DistributedSqlite.RepoReplication do
  @moduledoc """
  Run on each node to handle replicating Repo writes
  """
  use GenServer

  alias DistributedSqlite.Repo

  def start_link(args) do
    GenServer.start_link(__MODULE__, args, name: __MODULE__)
  end

  @impl true
  def init(_args) do
    {:ok, []}
  end

  def handle_cast({:replicate, query, :insert}, state) do
    Repo.insert!(query)
    {:noreply, state}
  end

  def handle_cast({:replicate, changeset, :update}, state) do
    Repo.update!(changeset)
    {:noreply, state}
  end
end

Make sure to start it in the application.ex

# lib/distributed_sqlite/application.ex  children = [
  ...,
  {DistributedSqlite.RepoReplication, []}
]

Open the DistributedSqlite.Repo module and add a replicate/2 function

@doc """  Replicate the query on the the other nodes in the cluster
"""
def replicate({:ok, data_to_replicate} = ret, operation) when operation in [:insert, :update] do
  _ =
    for node <- Node.list() do
      GenServer.cast(
        {DistributedSqlite.RepoReplication, node},
        {:replicate, data_to_replicate, operation}
      )
    end

  ret
end

def replicate({:error, _changeset} = ret, _), do: ret

def replicate(%Ecto.Changeset{} = changeset, operation) when operation in [:insert, :update] do
  _ =
    for node <- Node.list() do
      GenServer.cast(
        {DistributedSqlite.RepoReplication, node},
        {:replicate, changeset, operation}
      )
    end

  {:ok, changeset}
end

def replicate(schema, :insert) do
  _ =
    for node <- Node.list() do
      GenServer.cast(
        {DistributedSqlite.RepoReplication, node},
        {:replicate, schema, :insert}
      )
    end

  {:ok, schema}
end

This function can be piped into from an Repo.insert or the result of a Repo.update. Try this in the DistributedSqlite.Counter module:

case page_count do    nil ->
    %PageCount{}
    |> PageCount.changeset(%{count: 1, page: page_name})
    |> Repo.insert()
    |> Repo.replicate(:insert)

  %PageCount{} = page_count ->
    page_count
    |> PageCount.changeset(%{count: page_count.count + 1})
    |> Repo.update()
    |> case do
      {:ok, cnt} ->
        cnt
        |> PageCount.replicate_changeset()
        |> Repo.replicate(:update)

        {:ok, cnt}
  end
end

With this in place, deploy again. Your data is now consistent no matter which node serves the traffic.

Wrap up

I’m not sure how far this can be pushed and there are downsides to this approach, but I plan on continuing this journey that puts my data as close to the app as possible.

There is another worth exploring that removes the need to replicate the data on the application side. It is still in beta, but I plan on trying it out soon as well.

Another approach - Litefs