Skip to main content

up-solid down-solid
Date Favorited
Community Data
by Shital Shah
eye 971
favorite 1
comment 1
Dump of Hacker News stories and comments up to 2014-05-29 From the HN post: Downloading All of Hacker News Posts and Comments
( 1 reviews )
Topics: hackernews, archive, stories, comments
WARCZone: Outsider WARCs
eye 198
favorite 2
comment 0
Dagobah is a large archive of ancient 4chan flash animations, dating all the way back to 2008 when the site was first founded. Anyone can upload files to this site. Because of it's 13099+ collection containing flash animations that date from 4chan's earliest history, the Bibliotheca Anonoma is conducting a contingency archival of the site. We used custom built Python scraping scripts to reduce strain on the server, and avoid the many pitfalls encountered by scraping an automatically generated...
The Dataset Collection
eye 4,193
favorite 2
comment 0
I took the Reddit comment archive and converted all the JSON into one SQLite database using this program that I wrote: I ran a few tests to make sure the number of database rows matches the number of JSON records. "SELECT MAX(rowid) FROM comment" and "SELECT COUNT(id) FROM comment" both return 1659361605. This gives me some confidence as to the integrity of the dataset, but I cannot be 100% sure. The compressed size is 163G....