Background: AOL Proudly Releases Massive Amounts of Private Data, on TechCrunch. More reading: AOL Releases Search Logs from 500,000 Users, AOL Just Did the Unthinkable - Boycott AOL?, Aol Releases Googles most prized Keyword List... Google is gonna get mega spammed., AOL "Proudly Releases Massive Amounts of Private Data", AOL Releases Search Logs from 500,000 Users, AOL's own description, with typo

This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.

The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research.

Read the rest of AOL's original description here, from the "U500k_README.txt" file that comes with the archive.

BitTorrent [mininova.org] // BitTorrent [thepiratebay.org] // BitTorrent [isohunt.com] // BitTorrent [meganova.org] // BitTorrent [tr.searching.com]

eDonkey mirror // eDonkey mirror

Mirrored link
[atrus.org]
(439Mb)

Mirrored link
[aolsearchlogs.cloudsites.com]
(439Mb)

Mirrored link
[sexygeeks.be]
(439Mb)

Mirrored link
[upodcast.be]
(439Mb)

Mirrored link
[leafyhost.com]
(439Mb)

user-ct-test-collection-01.txt.gz
user-ct-test-collection-02.txt.gz
user-ct-test-collection-03.txt.gz
user-ct-test-collection-04.txt.gz
user-ct-test-collection-05.txt.gz
user-ct-test-collection-06.txt.gz
user-ct-test-collection-07.txt.gz
user-ct-test-collection-08.txt.gz
user-ct-test-collection-09.txt.gz
user-ct-test-collection-10.txt.gz
Summary

aol-data.tar.bz2
aol-data.tar.bz2.md5
aol-data.tar.gz
aol-data.tar.gz.md5
fack.org/AOL-user-ct-collection/

user-ct-test-collection-01.txt.gz
user-ct-test-collection-02.txt.gz
user-ct-test-collection-03.txt.gz
user-ct-test-collection-04.txt.gz
user-ct-test-collection-05.txt.gz
user-ct-test-collection-06.txt.gz
user-ct-test-collection-07.txt.gz
user-ct-test-collection-08.txt.gz
user-ct-test-collection-09.txt.gz
user-ct-test-collection-10.txt.gz
user-ct-test-collection-01.txt.gz     user-ct-test-collection-02.txt.gz
user-ct-test-collection-03.txt.gz     user-ct-test-collection-04.txt.gz
user-ct-test-collection-05.txt.gz     user-ct-test-collection-06.txt.gz
user-ct-test-collection-07.txt.gz     user-ct-test-collection-08.txt.gz
user-ct-test-collection-09.txt.gz     user-ct-test-collection-10.txt.gz
Original link [research.aol.com] (439Mb) Down as of 8/6 11:30pm EST

MD5: 31cd27ce12c3a3f2df62a38050ce4c0a

Please mirror this file, and send me an Internet down the tubes by dialing greg@poly9.com so that I can include your link here.

<< Back to blog post to discuss this