Offline Websites


I created an “offline” copy of Wikipedia and several other reference websites. It's a local copy of the sites, hosted on a Raspberry Pi computer in my house, which can be powered by rechargeable batteries. Picture of the project, which is a Raspberry Pi and a USB hard drive

Technology

I recently stumbled upon something called the ZIM file format. It's a compressed, searchable, archive file that allows for full-text searching of the compressed data.

All of English Wikipedia can be downloaded into an 83 GB .zim file, which is much smaller than the unused USB hard drive I just happen to have.

When combined with a slick tool called Kiwix, static webpages can be searched and rendered directly from .zim files. List of ZIM a few files, adding up to 100 GB

Hardware

The host computer for this project is a Raspberry Pi 2 Model B which was acquired years ago for a long-forgotten project. I'm happy to put it to use. These computers are low-power, so I don't mind running it all the time, and more than powerful enough for this job.

In addition to the Pi, there's an adorable wooden case, which has also been sitting around for years.

The hard drive is an old 1TB Western Digital MyPassport USB hard drive. This one is nice because it gets all of its power from the Pi over USB, so there's no extra A/C plug to deal with.

Process

Here's what I did:

Set up Pi

  1. Downloaded a Raspberry Pi OS Image. This is the operating system that will run on the Pi
  2. Extracted the .img and verified the SHA256 digital signature using sha256sum —check
  3. Used dd to copy the image to my SD card

At this point, I have a fresh install of Raspberry Pi OS on my SD card, so the next step is to plug my Pi into a keyboard and monitor and go through the first-time setup for it. This is pretty straight-forward, setting up language and time zone and so on.

Set up kiwix-serve

I downloaded the prebuilt kiwix-tools package, which contains kiwix-serve (it's not currently available in package managers because of some deprecated dependency).

The prebuilts are available here.

Next I used wget to download a small .zim file to use for testing, the simple English version of Wikipedia without images, from here. It's only about 500 MB.

kiwix-manage is used to build a library XML file for kiwix-serve to index. The syntax is simple, “kiwix-manage library.xml add <ZIM file>” creates a library file if one doesn't already exist.

After that, the server is ready to launch: kiwix-serve --port <portnumber> --library <library-name>

For testing, I used port 8080. After using ip addr show to find my Pi's IP address, I was able to open the server in my PC's web browser, and it worked!

Setup hard drive

I mounted the hard drive, and then used wget to download some larger .zimfiles from the same place as before. This took an hour or two.

Adding the newly-downloaded files to my library.xml file was easy:

for f in /media/zimfiles/*.zim ; do kiwix-manage library.xml add $f ; done

Finishing up

After a little more testing, I wrote scripts that would mount the hard drive and start the server. I moved these scripts to /usb/local/bin and invoked them from /etc/rc.local so they'll run on startup.

Conclusion

So it's pretty easy to locally-host my own copy of Wikipedia. Is that useful? Well maybe. It certainly seems like it could be useful. ISPs aren't always reliable, and my power goes out pretty often, so having a local battery-powered host could be useful. It's also very portable, and the pages work fine on my iPhone.

phone screenshot

This project gave me an opportunity to continue building my system administration and Linux skills, which is very welcome. I ran into a number of hangups, including a malfunctioning SD card and forgetting to enable SSH.

Next Steps