BBBESPOKE BYTES

Self-hosting Immich for photos

A journey of missing parts

By bob on , 2463 words, 12 minutes to read


NOTE This post contains several product links. These are not sponsored/affiliate links, and I am not associated with the companies mentioned in any way.


Self-hosting everything because I can

I've been on a journey of moving more things in my life from cloud-hosted to self-hosting, starting with various "smart" devices in my house. While the cloud is convenient, there's always risks of a third-party company suddenly changing its policies or locking you out of your account.

The first device I replaced was my myQ garage door opener (replaced with Konnected), since the myQ cannot be controlled with Home Assistant (and is actively hostile towards it). The next device to go was my Ring doorbell (replaced with Reolink), because I was tired of paying Ring/Amazon a monthly fee just to use a doorbell. Also, both the myQ and Ring are pretty major security and privacy risks if you don't trust a third-party company with access to your house.

But this post isn't about smart devices, it's about pictures.

Self-hosted memories

Of all the various forms of data we have to keep track of, my wife and I decided the top priority was all our old pictures. We have pictures backed up on Google Photos and Amazon photos, and our phones and various old .zip archives, but none of it was really organized or fully under our control. My wife has also been getting warnings about her Google account storage space, the majority of which is taken up by pictures.

While browsing various self-hosting forums, one app kept appearing as a solution for hosting pictures: Immich.

Immich logo

There is a banner at the top of the Immich website that currently states:

⚠️ The project is under very active development. Expect bugs and changes. Do not use it as the only way to store your photos and videos!

While so far my experience with Immich has been generally good, there's a reason for this warning. The project is not fully polished and I've encountered several (mostly minor) bugs so far. Nothing show-stopping, but don't expect a completely smooth experience out of the box.

Lots of install options, but no obvious easy path

At the time of this writing, the install guide has eight different options for installing Immich, though the Quick start guide does point you toward the Docker Compose option. There is an install script which is marked "Experimental", and seems to basically just do the steps in the Docker Compose document. There are no package manager (apt, brew, etc) install options.

The install instructions are to "create a directory of your choice" to hold the relevant files, and just wget them from the GitHub release into that directory. I picked /opt/immich instead of the relative ./immich-app that the install guide (and script) suggests. I left the postgres folder as the relative ./postgres, but changed the UPLOAD_LOCATION to my external storage. This is the most important setting to change if you're running Immich on a host with multiple filesystems and you want to store all of your pictures on your RAID/NAS/USB/whatever.

A little help from my friends (scripts)

Ok, logging in and typing docker compose up -d every time isn't really a good option. At the very least, I want something to start this on boot.

/etc/systemd/system/immich.service
[Unit]
Description=Immich
After=docker.service
Requires=docker.service

[Service]
Type=simple
WorkingDirectory=/opt/immich
ExecStart=/usr/bin/docker compose up
ExecStop=/usr/bin/docker compose stop

[Install]
WantedBy=multi-user.target

The upgrade instructions just say to docker compose pull to get the latest images, but I also noticed that the image tag for the redis container in the docker-compose.yml file changed in the first release that I upgraded to. So apparently that needs to be updated sometimes, which the documentation does not mention at all. I've settled on this for an upgrade script for now:

/opt/immich/upgrade.sh
#!/bin/bash

set -euo pipefail

cd "$(dirname "$0")"

systemctl stop immich

wget -q --show-progress -O docker-compose.yml https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml

docker compose pull

systemctl start immich

docker image prune -f

This obviously won't work if you have done any customization of your docker-compose.yml file, like using one of the CUDA-accelerated ML variants, so YMMV.

Bulk importing 25 years of photos, and deduplication hell

Up to around 2015, I stored all of my photos on Flickr. For reasons that I can't remember anymore, I stopped using it and moved everything to Google Photos. Before I left, I made an export of all of my data. The export consists of the actual image files, and a bunch of .json files containing all of the photo and album metadata. I added descriptions/comments and had everything sorted into albums, which never made it over to Google Photos.

Meanwhile, almost every picture I've taken since then has been on an Android phone, and the data has been moved from phone to phone every upgrade without losing anything (luckily). I also have miscellaneous folders and archives of backed-up photos which I don't remember being managed by anything (possibly f-spot).

Immich has a Node-based CLI that can recursively import from a directory, and there is also an immich-go version that can import from directories, zipfiles, or specifically Google Photos takeout archives. The web interface lets you upload multiple files from a standard file-picker dialog, but doesn't handle uploading archives. Everything is also available through an API, and there is a simple Python-based example of uploading a file.

Don't do what did.

Things would have been a lot simpler if I only had one unique copy of my photos, but I had at least 3, possibly 4, slightly different copies of important photos (like wedding pictures). I have the originals, the slightly reduced "storage saver" version from Google Photos, and the Flickr versions that also seem to be a slightly lower (or at least different) quality from the originals. Both the web interface and the immich-go tool are smart enough to skip files that are 100% identical, but will happily upload a re-encoded version of the same picture. That sounds like it would be a complete disaster, but Immich runs a background de-duplication job that uses ML to match nearly identical pictures and suggest them for de-duplication. So there's a way to fix it, but it's still miserable if you have thousands of duplicates to work through.

The order I ended up importing my files was:

  1. originals from my phone camera, via the mobile app
  2. miscellaneous backup folders and archives
  3. Flickr backup archives

By the time I got to importing the Flickr archives, immich-go was flagging a good number of them as duplicates and not uploading them. Which would be fine, except that all of the album information was associated with the Flickr file names. I also wanted to copy over names and descriptions of photos from Flickr, which would be helpful when trying to figure out what a blurry photo from 2006 is supposed to be.

Since I hadn't specified the Flickr import to go into an album or anything, the only way I could figure out how to identify the imported photos was that the Flickr filenames all had a 10-digit ID in the names that matched the photo IDs in the .json files. That only matched about half of the pictures, though. The other half were flagged as duplicates by immich-go. At this point, I probably should have just wiped my install and started over, but I just kept stumbling on. I eventually found that I could match on timestamps and find almost every imported photo. I could then make API calls to create albums and set descriptions for everything I had Flickr data for.

(It actually involved a lot more stumbling around than that, as I had to do several rounds of fine-tuning the filename and timestamp matching, then updating the albums and photos I had already created. Again, things might have been easier to just wipe and re-do. But this section is long enough already.)

Random Immich API codes

Immich has API documentation, but not a lot of guidance on how to actually accomplish anything. Since the web interface uses the API, I ended up figuring a lot out by just watching network traffic. My Flickr import script is a huge mess of commented out single-use code, but there's a few useful snippets:

For starters, there is no "list all photos" API endpoint. Photos are "assets" and assets are grouped in buckets in the timeline. If you want to get all of them (like if you're trying to do bulk matching of names and timestamps) you have to get all assets for each bucket:

def get_all_assets():
    headers = {
        'Accept': 'application/json',
        'x-api-key': API_KEY
    }

    buckets = requests.get(f"{BASE_URL}/timeline/buckets?size=MONTH", headers=headers)
    buckets.raise_for_status()
    for bucket in buckets.json():
        assets = requests.get(f"{BASE_URL}/timeline/bucket?size=MONTH&timeBucket={bucket['timeBucket']}", headers=headers)
        assets.raise_for_status()
        for asset in assets.json():
            yield asset

Setting the description or changing the time for an asset in the web interface uses the same PUT /assets/:id endpoint:

def set_asset_data(id, description, date_taken):
    headers = {
        'Accept': 'application/json',
        'x-api-key': API_KEY
    }

    data = {
        "description": description,
        "dateTimeOriginal": date_taken,
    }

    response = requests.put(f"{BASE_URL}/assets/{id}", headers=headers, json=data)
    response.raise_for_status()

Omit description or dateTimeOriginal to only set one or the other.

Albums from the list /albums endpoint do not actually contain the list of assets in the album, you have to fetch each album individually to get that.

def get_albums():
    headers = {
        'Accept': 'application/json',
        'x-api-key': API_KEY
    }

    albums = requests.get(f"{BASE_URL}/albums", headers=headers)
    albums.raise_for_status()
    for album in albums.json():
        response = requests.get(f"{BASE_URL}/albums/{album['id']}", headers=headers)
        response.raise_for_status()
        yield response.json()

Side quest: what happened to my storage space?

At some point, I ran df -h to get an idea of how much space I was using on my external storage after all of these imports. I got a different surprise, though:

~$ df -h
Filesystem                         Size  Used Avail Use% Mounted on
tmpfs                              392M  6.9M  385M   2% /run
/dev/mapper/ubuntu--vg-ubuntu--lv   15G   14G  1.0G  93% /
tmpfs                              2.0G     0  2.0G   0% /dev/shm
tmpfs                              5.0M     0  5.0M   0% /run/lock
/dev/sda2                          2.0G  183M  1.7G  10% /boot
/dev/sdb1                          5.5T  386G  4.8T   8% /media/external
tmpfs                              392M   12K  392M   1% /run/user/1000

Wait, my root filesystem is almost full? Immich uses some decent sized Docker images, plus it pulls some large ML models to do face detection, etc. Still, it shouldn't be using that much storage. This is running on a Proxmox VM with the default 32GB thin-provisioned drive. Where did it all go?

Let's see what Docker is doing:

~$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          5         5         3.521GB   74.77MB (2%)
Containers      6         4         24.46kB   0B (0%)
Local Volumes   2         2         801.5MB   0B (0%)
Build Cache     0         0         0B        0B

Ok, that's not really a lot. Let's call it a round 5 GB. There's a few other things on this box, but not 32 GB worth. Where is the rest?

...

There's a hint in the previous df -h output. There is a 2 GB boot partition, and the root filesystem is 15 GB, not 30. The rest of the 32 GB drive is mysteriously missing.

~$ lsblk
NAME                      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda                         8:0    0   32G  0 disk
├─sda1                      8:1    0    1M  0 part
├─sda2                      8:2    0    2G  0 part /boot
└─sda3                      8:3    0   30G  0 part
  └─ubuntu--vg-ubuntu--lv 252:0    0   15G  0 lvm  /

Ok, so the Ubuntu Server install defaults to using LVM and using the whole drive. Except that it doesn't use the whole drive. It makes a volume group for the whole drive, but only makes a logical volume that uses half of the space, leaving the other half unallocated. Apparently this is by design. Surprising and infuriating.

Fortunately, LVM makes this easy to fix without needing to take the server offline.

(Thank you, Stack Exchange):

~# lvextend -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv
[snip]
~# resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
[snip]

These commands complete almost instantly and resize the volume and filesystem to fill the rest of the drive.

~$ df -h
Filesystem                         Size  Used Avail Use% Mounted on
tmpfs                              392M  6.9M  385M   2% /run
/dev/mapper/ubuntu--vg-ubuntu--lv   30G   12G   17G  43% /
tmpfs                              2.0G     0  2.0G   0% /dev/shm
tmpfs                              5.0M     0  5.0M   0% /run/lock
/dev/sda2                          2.0G  183M  1.7G  10% /boot
/dev/sdb1                          5.5T  386G  4.8T   8% /media/external
tmpfs                              392M   12K  392M   1% /run/user/1000

Much better.

Backing up 50 GB of photos

So we have cloud "backups" with Google and Amazon, but the goal here was to reduce our dependencies on those services and keep everything under our control. So we still need some kind of remote backup, just something a little more private.

For now, I'm using rclone with Backblaze B2. The pricing page only gives you a price of $6/TB, but it's actually priced as increments of GB-per-hour. And the first 10 GB are free. So storing 50 GB of photos works out to just a few pennies.

Just in case, I set caps on everything to $1/day. I didn't expect to hit anywhere near that, but wanted to make sure I didn't start incurring huge charges by accident.

The backup job is a simple cron task for every day at 3am (Immich runs a database backup at 2am by default).

0 3 * * * rclone --b2-hard-delete sync /media/external/immich/ b2:REDACTED

Yes, I'm deleting and not hiding files. Maybe not a good idea for backups, but the database backups would otherwise be adding a new 80 MB file every day forever.

The next day, I noticed I was being charged $0.09 for the day in "Class C Transaction costs". Apparently this is related to listing files, and while you get 2,500 for free every day, I was running nearly 25,000!

Class C Transaction costs

In the rclone B2 docs, there is this very brief note:

--fast-list

This remote supports --fast-list which allows you to use fewer transactions in exchange for more memory. See the rclone docs for more details.

Yes, I would like to use fewer transactions, thank you.

0 3 * * * rclone --b2-hard-delete --fast-list sync /media/external/immich/ b2:REDACTED

I only noticed the transaction costs as I was writing this blog post, so I'll have to check back tomorrow and see if that's resolved. I did run that command as a one-off and it completed without running out of memory and only incremented the "class C transaction" count by a few. So here's hoping.

UPDATE: That worked, it's only showing 40 class C transactions for today.

Bugs, updates, other thoughts

Remember all of those duplicates that I had to clear out? Well, after deleting them, they stayed stuck in a broken state in the mobile app. I ended up clearing the app data to get rid of them. This is a known issue. The mobile app is also missing a lot of features that are only available through the web interface.

In the week (?) since I set up Immich, they've had two more releases. While it's nice to see that the project is being actively developed, it's a bit annoying to get an update notification every other day. Neither upgrade including breaking changes, though I did notice the redis tag update for Docker Compose, which prompted me to create my upgrade script. The release history shows quite a few breaking changes, so reading the release notes for each update will be necessary to avoid any surprises.

Neither of these should be surprising, as the project page very clearly warns you of these exact things. I'm still overall very impressed with the project, and might start helping out on issues where I can.