GitHub has announced the launch of the Arctic Code Vault. The company will take a snapshot of all active repositories on February 2, 2020 and store the result in the Norwegian Arctic World Archive. The intention is that the code will be kept there for a thousand years.
With the Arctic Code Vault, GitHub aims to preserve open source software for future generations and the project also serves to highlight the importance of the open source community. The mine storage is for long-term archiving for which GitHub plans to update every five years or more. The February 2, 2020 snapshot therefore appears to be the first in a series, in addition to monthly to annual updates from the Internet Archive and direct storage of GitHub repositories in multiple data centers worldwide.
For the snapshot, GitHub stores not only every active public GitHub repository, but also a portion of the ‘dormant’ repositories, with the choice being based on the number of stars, dependencies and the opinion of an advisory expert panel. The snapshot contains the default branch head of each repository, minus binaries larger than 100KB. The files are stored in a single tar file and most of the data is captured via qr encoding.
The data is stored in the Arctic World Archive. This is located in a mine located 250 meters deep in a mountain slope on Spitsbergen in the Arctic Ocean. GitHub has partnered with Norwegian mining company Store Norske Spitsbergen Kulkompani and Piql, which specializes in long-term storage.
For this, Piql uses film based on silver halides and polyester with a length of more than a kilometer. Before converting to the correct qr file format, the company performs a virus scan and generates checksums for the files for verification. Then a ‘piqlWriter’ writes the data to the special movie, at 40MB/s. A new checksum is generated for each frame. After writing, the film is put in a piqlBox, a specially protected cartridge. The shelf life would be 500 years, but simulated tests from Piql would indicate that twice that is feasible. In the cold, dry environment of the Norwegian mine, that period would be even longer.
There is a piqlReader for retrieving the data. To help future generations to recover the files, the source code of the reader software is included at the beginning of each movie, in both digital and human-readable form. The specifications of the file format are also stored that way.