How to set up your own article archiving service – and why I did…

IMAG3S/Getty Images

Now that Pocket is shutting down, what are those of us who rely on article archiving to do? You could try switching to another cloud archiving service like Raindrop.io, but as nice a service as it is, it’s run by just one dude in Kazakhstan.

If a billion-dollar company like Mozilla can’t be bothered to keep its Pocket archiving service running, it’s something of a risk to rely on a lone developer, no matter how talented or well-intentioned.

Instead, how about self-hosting your own article archiving service on your own computer gear? That way, you own it all and nobody can shut it down.

Also: Pocket is shutting down – here’s how to retrieve what little data you still can

As it turns out, there’s an open-source project (of course there is!) called ArchiveBox that does just that. In this article, I’ll show you how to set it up. In a subsequent article, I’ll show you how to get whatever data you managed to recover from Pocket into ArchiveBox.

Where you can run ArchiveBox

ArchiveBox has native distributions for both Linux and MacOS. You can also run it as a Docker container on any device running Docker, including Windows. Since I have an old Mac mini converted to a Linux-based homelab Docker server, that’s where I set mine up.

If you’re not familiar with Docker and how to install it, or how to install Linux, or how to convert an old Mac to Linux, ZDNET’s Jack Wallen has you covered. Click the links in the previous sentence to come up to speed.

Docker is a container service. Basically, it’s like a virtual machine (VM), but without the machine. When you set up a VM, you’re configuring an entire computer emulator and an entire operating system install. So if you’re running a bunch of VMs, you have a lot of overhead (and potentially OS licenses) to deal with.

A container service like Docker just adds the application-specific layer on top of any other operating system. You can run the same Docker container on a Linux box, a Mac box, or a Windows PC, and it will generally be happy.

As much as I use VMs for certain types of projects, Docker is the perfect environment for something like ArchiveBox.

Installing ArchiveBox

The first thing I did was connect to my homelab server, and create a directory for the ArchiveBox data. I want the data to live outside the Docker container so it can be properly backed up as part of the machine’s file system.

mkdir -p /opt/archivebox/data

That command creates a folder called archivebox in the directory called /opt, and then creates a data folder inside that.

Next, it’s time to set up the ArchiveBox container. I manage Docker both by the command line and by using a container management tool called Portainer. I had ChatGPT give me the instructions for setting up the ArchiveBox container.

Also: How to install Ubuntu Linux (It’s easy!)

Here’s what I asked ChatGPT:

“I want to install ArchiveBox on my Docker system in Portainer. How can I set it up so the archive data is accessible outside the container, but also to the ArchiveBox app?”

It gave me guidelines.

I connected to my Portainer instance and pasted in the lines provided by ChatGPT. A few minutes later, I had a working ArchiveBox install.

create-stack — Screenshot by David Gewirtz/ZDNET

I did run into a few snags. All I did was tell ChatGPT that “ArchiveBox isn’t working,” and it prompted me through a series of steps to fix the problem. About 10 minutes later, I was able to get into ArchiveBox through the web interface.

it-be-alive — Screenshot by David Gewirtz/ZDNET

Creating the superuser

I tried saving a web page, but was prompted to login.

Since I had not previously set up a login and also had not set up a user to be admin, I asked ChatGPT for the correct command line and pasted it into a terminal window on the server.

superuser — Screenshot by David Gewirtz/ZDNET

And then, I was in. It was time to add my first URL.

Also: The best AI for coding (including two new top picks – and what not to use)

Archiving pages via the web interface

Archiving URLs via the web interface is not ideal, but it’s a good place to start. Later, I’ll show you how to add a Chrome extension to do it more organically. To add a URL, click the Add button. You’ll see this form.

top-form — Screenshot by David Gewirtz/ZDNET

This is the top part of the form. You can add one or a bunch of URLs. I started with just one of my more recent articles to see how it worked. Once you paste in the URL, scroll down.

bottom-form — Screenshot by David Gewirtz/ZDNET

Here, you’ll see the various formats that ArchiveBox can save your content into. Since I’m not yet sure what I like best, I just left them all unselected. When no format is selected, ArchiveBox saves them all.

Initially, you’ll see the indicator shown below. ArchiveBox lists an item as Pending if it’s still snagging the data for local storage. I found that if you wait a few minutes, the pending goes away and the full listing is available.

Archiving pages via Chrome extension

What gave Pocket its power was the ability to tap a button and archive a page to the Pocket database. Fortunately, most of that capability is also available for ArchiveBox. ArchiveBox will let you tap a button in the extensions bar to save the page. You can even add tags.

extension-button — Screenshot by David Gewirtz/ZDNET

Unfortunately, ArchiveBox doesn’t allow you to select a link, right-click it, and save it. I loved that feature for storing a batch of pages at once. Even so, for a quick alternative solution, ArchiveBox should prove worthy.

To install the extension, go to the Chrome Web Store and search for ArchiveBox.

Go ahead and install it. You’ll see a new icon that looks like the ArchiveBox logo.

added — Screenshot by David Gewirtz/ZDNET

Next, right-click on the button and choose Options. You’ll get this setup form.

configure-extension — Screenshot by David Gewirtz/ZDNET

Put the URL to your instance of ArchiveBox into the field marked with the arrow. Note, your URL probably ends with :8000 — for port 8000. Do not fill in the API key. ArchiveBox no longer seems to need it. That tripped me up for a while, but when I finally left the field empty, it all worked.

Also: Failing well and 3 other ways AI can help you solve your big business problems

You can also set up some automatic capture magic. If you scroll down on the configuration form, you’ll see this wizardry.

This will allow you to set the Chrome extension to automatically archive pages you visit if their URL matches a particular regular expression (a formula that describes string patterns).

Saved pages

Once you’ve saved a few pages, your main ArchiveBox interface will look something like this.

Here, I’ve saved three of my recent articles in all the formats that ArchiveBox allows. You can search for articles, tag them, and filter by a wide range of attributes. Unfortunately, there doesn’t appear to be a native ArchiveBox reading app for either iOS or Android.

If you click on any page, you’ll see a bunch of viewing options.

You can click on whichever one you prefer and read the article. I clicked on the Chrome > Single File option and was able to easily read my article, pretty much formatted as ZDNET originally intended. I will probably settle on that format as my main format, because I don’t want to use up all my server’s space on multiple copies of tens of thousands of articles.

formatted — Screenshot by David Gewirtz/ZDNET

You might want to select the formats you want to save to reduce storage usage, but for now, it’s nice to have options.

Where your data lives

The big advantage of running ArchiveBox is that your data is under your control. Because we set up the archive to be outside the ArchiveBox container, all of the data is stored in the filesystem.

data-storage — Screenshot by David Gewirtz/ZDNET

In the screenshot above, I just clicked into the Archive Data directory and dug down until I found one of the article folders. All your data lives here.

Also: The best cloud storage services: Expert tested

You can back it up, make extra copies of it, and own it for as long as you want. No billion-dollar company is going to decide you can’t have it anymore. It’s yours.

You don’t even need ArchiveBox to view your archives. I just dug down into my folder structure and double-clicked the HTML file. The full article was there.

no-server — Screenshot by David Gewirtz/ZDNET

Finding the right file might be a bit of a pain, but it’s all there. And yes, I’m aware I’m running Firefox on the homelab machine. I’ll change that to Chrome as soon as I get a few spare minutes.

Stay tuned. I’m trying to import all 24,480 articles from Pocket’s pitiful URL listings. Once I get it working, I’ll report back and share a step-by-step process that you can use with your ArchiveBox installation.

Also: The best ways to transfer large files between your laptop and desktop: 3 options

Did you rely on Pocket before Mozilla announced it was shutting the service down? How are you handling the transition? Have you tried setting up ArchiveBox or another self-hosted tool to keep your reading list alive? What do you think about taking full control of your data instead of trusting a third-party service? Let us know in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Get the morning’s top stories in your inbox each day with our Tech Today newsletter.

Source link

How to set up your own article archiving service – and why I did (RIP, Pocket)

Where you can run ArchiveBox

Installing ArchiveBox

Creating the superuser

Archiving pages via the web interface

Archiving pages via Chrome extension

Saved pages

Where your data lives

Leave a Comment Cancel reply

VMWARE

Helping Public Sector Organisations Define Cloud Strategy

How to change the VLAN ID of the Service Console in ESX from the command line/console

Cisco UCS and Vmware Interfaces (Vnics) HA Design Considerations

Troubleshooting network and TCP/UDP port connectivity issues on ESX/ESXi(2020669)

vSphere Client Parameters

Configuration Templates

CUE Licenses

Trouble shooting Unity Express with Call Manager Integeration & Operational Issues

CME Configuration Example: SIP Trunks to Viatalk and VoIP.ms

SIP Phone registration – CME Configuration

CUE Voicemail + VPIM networking (CUE to unity)

Related Post

Where you can run ArchiveBox

Installing ArchiveBox

Creating the superuser

Archiving pages via the web interface

Archiving pages via Chrome extension

Saved pages

Where your data lives

Leave a Comment Cancel reply

VMWARE

Configuration Templates