Find Similar Users on del.icio.us
Download: delicious_mates.py

On the social bookmarking site del.icio.us, you can add other users to your network to see their recent bookmarks aggregated on one page. There are two kinds of people in my network: 1. Friends and 2. Users I don’t know personally, but who regularly post interesting links. People who bookmark the same things that I bookmark are likely to have similar interests and are thus likely to continue bookmarking interesting things in the future. Since the information on who bookmarked what URL is public, the process of finding people with similar interests can be automated.
Urban Hafner brought up this idea three years ago, but I could not find an implementation. Yesterday, I wrote a short Python (2.5) script that implements the following ideas:
- Look at every link in your bookmarks: Who bookmarked the same page? Add these users to a list of people possibly similar to you.
- The more bookmarks another user has in common with you, the higher your similarity.
- The smaller the number of people who bookmarked a page, the more significant the fact that another user has this bookmark in common with you.
- If a user has lots of bookmarks, common bookmarks are less remarkable. The percentage of common links counts.
Here is what you need to do to find people whose interests are similar to yours:
- Download delicious_mates.py
- Run python ./delicious_mates.py
- Wait — this takes some time.
Then, what you will see will look something like this:
andreas> python ./delicious_mates.py
Your del.icio.us username? andreas.s
Your del.icio.us password?
Fetching list of bookmarks ... (485)
Fetching list of users for each bookmark ...
1. http://en.wikipedia.org/wiki/Langton's_ant (7)
2. http://www.nytimes.com/2008/05/13/science/13coat.html?_r=2&partner=rssnyt&emc=rss&oref=slogin&oref=login (0)
3. http://sifter.org/~simon/journal/20080509.2.html (2)
4. http://www.intercult.su.se/cultaptation/tournament.php (16)
5. http://www.pnas.org/cgi/content/abstract/0801268105v1 (49)
6. http://atlas-conferences.com/c/a/n/i/15.htm (0)
[..]
480. http://lifeboat.com/ex/main (129)
481. http://prize.hutter1.net/ (137)
482. http://www.psg.com/~dlamkins/sl/cover.html (194)
483. http://www.idsia.ch/~juergen/ (157)
484. http://sl4.org/wiki/ShannonInformation (1)
485. http://www.scottaaronson.com/writings/ (13)
Finding 50 candidates from list of 49937 users ...
rainer (42/9367) ok
siggiB (36/1006) ok
fhtagn (35/2471) ok
anissimov (35/6281) ok
jbone (41/19982) ok
invisibleandpink (16/203) ok
irchans (16/610) ok
ferrouswheel (14/946) ok
[..]
eggywat (14/4290) ok
Cunya (8/3133) ok
getpost (13/9787) ok
hannu (12/3915) ok
lispmeister (11/2967) ok
dean.vanniekerk (12/5873) ok
rgrant (9/1769) ok
Top 50 del.icio.us mates:
username weight # common bookmarks # total bookmarks % common
——————————————————————————————————————————————————————————————————————————————————————————————
siggiB 54.92937 36 1022 3.52250
invisibleandpink 53.05635 16 203 7.88177
fhtagn 20.61878 36 2474 1.45513
irchans 15.61606 16 611 2.61866
fogeli 12.85109 16 628 2.54777
ferrouswheel 7.99177 14 946 1.47992
rainer 7.60971 43 9423 0.45633
anissimov 6.46154 34 6309 0.53891
pdorrell 4.10251 19 3041 0.62479
jefallbright 3.87392 12 1193 1.00587
ladro 3.76458 12 1526 0.78637
miguel1626 3.04347 11 1039 1.05871
jbone 2.62672 41 20039 0.20460
hartmut 2.15612 9 1316 0.68389
asciilifeform 2.04304 10 1581 0.63251
tmalin 1.89627 12 2181 0.55021
herrmann 1.84991 21 5410 0.38817
jas0nm 1.82646 20 3808 0.52521
[..]
If you look at the script, you will find a few settings you might want to change. For each of these holds: The higher you set them, the more time it takes for the script to finish.
-
MAX_MATES
is the maximum number of similar users the script suggests.
-
MAX_BOOKMARKS
defines how many of your bookmarks the script will look at.
-
BOOKMARK_FILTER
defines which types of bookmarks are analyzed. Remove
, "no"
from
{"shared" : [None, "yes", "no"]}to exclude private bookmarks.
-
MATE_MIN_BOOKMARKS
sets a minimum for the number of bookmarks a del.icio.us user needs to have before he can be considered to be similar to you.
-
MATE_MIN_COMMON
sets a minimum to the number of bookmarks a user has to have in common with you to be included in the list of similar users.
The script needs two Python modules, the parser BeautifulSoup and Michael Noll’s del.icio.us Python API. If the script does not find one of the modules, it will download the missing module to the current directory and import it from there. If you don’t like this because you believe this is a security nightmare (which it is), don’t run delicious_mates.py install the two modules beforehand.
Do you know some Javascript and have spare time? I would love to see the script converted into a direc.tor-like bookmarklet. The need to download and run a Python script makes finding similar users more complicated than it chould be.
The feature I like best about the online bookshelf LibraryThing is its Unsuggester: Name a book you have read and it suggests those books that are least likely to be on your bookshelf. I like it because it is a means to counteract the temptation to adjust your sources of information such that whatever you read reinforces your point of view. Seeing how easy it is to give in to this temptation, is a script that makes it easier to surround yourself with like-minded people just one more sign of a general trend towards biased, largely isolated online communities?

79 comments
geht? (Ja. Dovetailer über alle möglichen Programme, angefangen mit dem kürzesten; die Ausgabe des kürzesten Programms, das die Zahlen und eine zusätzliche ausgibt, ist die aktuelle Hypothese.)