I recently read about litestream and wanted to try this on Fly with Elixir.
Litestream allows us to backup our SQLite database to any S3 compatible storage after every transaction. It will also restore from that backup. Meaning that anytime we scale our app on Fly, we can restore the latest version of the database.
Why
- fast data access
- simple local development
- low maintenance
- Elixir makes it all possible via it’s distribution capabilities
Getting Started
See the companion repo for reference.
Create a new phoenix app that uses SQLite3 as the database:
$ mix phx.new distributed_sqlite --database sqlite3
Launch a new fly application:
$ fly launch
- type in an app name (or just take the default)
- choose any region you like
- choose N when asked if you want a Postgres database
- choose N when asked if you want a Redis instance
- choose N to deploy now
An environment varilable DATABASE_PATH
is needed to indicate which file to use for the SQLite database. Open the fly.toml
file and add DATABASE_PATH = /app/distributed_sqlite.db
(use any database name you want here) under the [env]
section and try to deploy.
$ flyctl deploy
Success! A Phoenix app running on Fly using SQLite! Now, to see it in action.
Counter Data Model
Lets build a simple, naive counter that just counts the views of each page.
$ mix phx.gen.schema Counter.PageCount page_counts page:string count:integer
$ mix ecto.migrate
Add a Counter
module where we can add page view counts
# lib/distributed_sqlite/counter.ex
defmodule DistributedSqlite.Counter do
alias DistributedSqlite.Counter.PageCount
alias DistributedSqlite.Repo
def count_page_view(page_name) do
page_count = Repo.get_by(PageCount, page: page_name)
case page_count do
nil ->
%PageCount{}
|> PageCount.changeset(%{count: 1, page: page_name})
|> Repo.insert()
%PageCount{} = page_count ->
page_count
|> PageCount.changeset(%{count: page_count.count + 1})
|> Repo.update()
end
end
end
and update the page_controller to count views
# lib/distributed_sqlite_web/controllers/page_controller.ex
alias DistributedSqlite.Counter
def index(conn, _params) do
page_view = Counter.count_page_view("home")
render(conn, "index.html", page_count: page_view.count)
end
lastly, we can display the counter on our page
<!-- lib/distributed_sqlite_web/templates/page/index.html.heex -->
<h1> Page Views <%= @view_count %> </h1>
Now deploy again using flyctl deploy
and then browse to your site to validate that the count is showing and updating when refreshing.
Restoring the Database on Deploy
The next problem to deal with is that the database will be wiped on our next deploy since it’s using ephemeral storage.
One way to resolve this is to use a persistent volume (which should be done for production apps). But since this post is all about litestream lets set that up and see how it helps us here.
The first step is to create a bucket in some S3 compatible storage. I like to use Digital Ocean spaces for this, but you can also use AWS if you want. See litestream docs for more options
You’ll need 3 things:
- the bucket url
- the access key
- the secret key
Next, add litestream to our Docker image. Add the following lines to Dockerfile
as part of the builder
phase:
ADD https://github.com/benbjohnson/litestream/releases/download/v0.3.9/litestream-v0.3.9-linux-amd64-static.tar.gz /tmp/litestream.tar.gz
RUN tar -C /usr/local/bin -xzf /tmp/litestream.tar.gz
And in the runner
phase add:
COPY --from=builder /usr/local/bin/litestream /usr/local/bin/litestream
COPY litestream.yml /etc/litestream.yml
We still need to create the litestream.yml
file, lets do that now.
access-key-id: ${LITESTREAM_ACCESS_KEY_ID}
secret-access-key: ${LITESTREAM_SECRET_ACCESS_KEY}
dbs:
- path: /app/distributed_sql.db
replicas:
- url: ${REPLICA_URL}
Now set the three variables in Fly to the values you recorded from when setting up the bucket.
$ flyctl secrets set REPLICA_URL=... LITESTREAM_ACCESS_KEY_ID=... LITESTREAM_SECRET_ACCESS_KEY=...
Finally, update the starting script so that the elixir release is a sub-process of litestream. The easiest way I’ve found to do this is to create a run script called run.sh
with the following content:
#!/bin/bash
set -e
# Restore the database if it does not already exist.
if [ -f /app/distributed_sql.db ]; then
echo "Database already exists, skipping restore"
else
echo "No database found, restoring from replica if exists"
litestream restore -v -if-replica-exists -o /app/distributed_sql.db "${REPLICA_URL}"
fi
# Run migrations
/app/bin/migrate
# Run litestream with your app as the subprocess.
exec litestream replicate -exec "/app/bin/server"
Be sure to remove the migration script from fly.toml since we it runs in the run.sh script now.
Now update the Dockerfile
to use this new script to start the app:
COPY run.sh /scripts/run.sh
RUN chmod 755 /scripts/run.sh
CMD ["/scripts/run.sh"]
Deploying should now start using litestream to restore the database on deploys and push backups when data changes. You can verify in the monitoring interface of fly. Look for something similar to the image below:
Distributing
With all of this in place, things would work great when running one instance of you app. But as soon as you add another node things get out of wack.
Lets see this in action – Scale the app to 2 and see what happens to the data.
$ flyctl scale count 2
Enough refreshing or opening in different browser sessions and you’ll start to see discrepencies in the view count. This is because we are not replicating the data between the instances. This is a problem Elixir is built for…
Follow the Fly guide to Clustering Your Application to get clustering working correctly.
Once your app is clustered, we can begin to replicate our database calls. Add a new GenServer with the following content:
# /lib/distributed_sqlite/repo_replication.ex
defmodule DistributedSqlite.RepoReplication do
@moduledoc """
Run on each node to handle replicating Repo writes
"""
use GenServer
alias DistributedSqlite.Repo
def start_link(args) do
GenServer.start_link(__MODULE__, args, name: __MODULE__)
end
@impl true
def init(_args) do
{:ok, []}
end
def handle_cast({:replicate, query, :insert}, state) do
Repo.insert!(query)
{:noreply, state}
end
def handle_cast({:replicate, changeset, :update}, state) do
Repo.update!(changeset)
{:noreply, state}
end
end
and make sure to start it in the application.ex
# lib/distributed_sqlite/application.ex
children = [
...,
{DistributedSqlite.RepoReplication, []}
]
Open the DistributedSqlite.Repo
file and add a replicate/2
function
@doc """
Replicate the query on the the other nodes in the cluster
"""
def replicate({:ok, data_to_replicate} = ret, operation) when operation in [:insert, :update] do
_ =
for node <- Node.list() do
GenServer.cast(
{DistributedSqlite.RepoReplication, node},
{:replicate, data_to_replicate, operation}
)
end
ret
end
def replicate({:error, _changeset} = ret, _), do: ret
def replicate(%Ecto.Changeset{} = changeset, operation) when operation in [:insert, :update] do
_ =
for node <- Node.list() do
GenServer.cast(
{DistributedSqlite.RepoReplication, node},
{:replicate, changeset, operation}
)
end
{:ok, changeset}
end
def replicate(schema, :insert) do
_ =
for node <- Node.list() do
GenServer.cast(
{DistributedSqlite.RepoReplication, node},
{:replicate, schema, :insert}
)
end
{:ok, schema}
end
This gives us a function we can pipe into from an Repo.insert
or the result of a Repo.update
. Try this in the DistributedSqlite.Counter
module:
case page_count do
nil ->
%PageCount{}
|> PageCount.changeset(%{count: 1, page: page_name})
|> Repo.insert()
|> Repo.replicate(:insert)
%PageCount{} = page_count ->
page_count
|> PageCount.changeset(%{count: page_count.count + 1})
|> Repo.update()
|> case do
{:ok, cnt} ->
cnt
|> PageCount.replicate_changeset()
|> Repo.replicate(:update)
{:ok, cnt}
end
end
With all of this in place, deploy again. Your data should now be consistent no matter which node your traffic is served from!
Wrap Up
I’m not sure how far this can be pushed and there are downsides to this approach, but I plan on continuing this journey that puts my data as close to the application as possible.
There is another approach worth exploring that removes the need to replicate the data on the application side. It is still in beta, but I plan on trying it out soon as well.