SeriesSeeker data

May 13, 2026: Data parsed!

After I managed to salvage the unstructured data from the Wayback Machine (see about for that tale), I was left with a pile of HTML that was distinctly un-data-like. The headers were in place, so I knew there must be a way to add some structure back to the data -- but that feels like a Python problem, and alas, my coding knowledge is very css-html. reached out on bsky asking if any coders could help me out, and Jayme Howard came to the rescue! He parsed the HTML into JSON, removing all the copyright material and leaving just the crowd-sourced data. You can see the full dataset.

May 14, 2026: Site launch!

The day I actually launched this site! And archived it with the Wayback Machine right away.

May 22, 2026: We're in the papers!

Rascal article about the project published! This is very exciting because it makes me feel cool, but also because it's a great way to tell more people about the importance of digital preservation.

June 1, 2026: Data cleaned!

Stefan reached out after reading the Rascal article and offered to help. I pointed out that the JSON could use some tidying (some of the fields had been merged because of the formatting), and he cleaned the JSON!. The original JSON will remain available.

From Stefan:
Specific changes:

Title

Separated out the computer-formatted title from the human-readable title (i.e. "20-sided-stories" vs "20 Sided Stories"). Kept the computer-formatted version as the Title field, added new Title (Clean) field for the human-readable version

Actual Title (Clean) data was taken from the Type & Channels field as it was more consistent there

Episode Frequency & Length

Separated into two fields called Episode Frequency and Episode Length

Type & Channels

Separated into multiple fields - Media Type, Distribution Channels, Title (Clean), and Release Year

This also had a duplicate of the Description that I just dropped

Rules & Sources

Separated into new fields - Rules, Sources, and Tags

Not all series had data for all three fields so I did some brute force overriding to get the data into the right place

Format, Setting & Vulgarity

Separated into new fields - Format, Setting, and Vulgarity

Similar to Rules & Sources, not all series had data for all three fields so I did some extra logic to get it working. Some entries listed one of the fields as "Not Applicable" - I ended up dropping these because it was not possible to tell whether it applied to the Format field or the Setting field and really isn't all that informational anyway

Cast & Player Characters

Separated into two fields called Cast and Player Characters

Audio Quality & Equipment

Separated into two fields called Audio Quality & Equipment and Audio Tags

Anything not listed here should be preserved exactly as it was.

Updates

May 13, 2026: Data parsed!

May 14, 2026: Site launch!

May 22, 2026: We're in the papers!

June 1, 2026: Data cleaned!