A very interesting

June 1, 2004 by Michael Boyle

A very interesting

new site from Cam Barrett and Joe Stump: TodaysPapers.com. Adding community features to news stories from all over the web. I wonder if Aaron’s NYTimes stuff could add something to this? Trouble is Aaron has been so very coy about his /knows/ stuff that I’m not sure I follow what it is.

Comments

Aaron of Montreal says

June 1, 2004 at 5:53 pm

What do you want it to do?

It’s not immediately clear to me what to make of ‘todayspapers’ and there are no obvious hooks to play with it programtically.

For instance, if I were to try and tie the /nytimes/knows stuff into it I could perform so brute force scraping on their search engine. But there doesn’t appear to be much “meat” in the markup and the markup itself is often ambiguous : e.g. both trackbacks and comments are wrapped in separate block containers whose ID is “trackbacks”.

On the other hand, their search engine appears to nimble enough to do what I mean for “George Bush”, “Bush, George W.” and “George W. Bush” so there might be something interesting after all…
michael says

June 1, 2004 at 9:48 pm

I didn’t look at what they’re doing in the code they’re using – I was thinking more generally in terms of the map of relationships on any specific day.

BTW can you get that in a readable format not just the thumbnail? I only see the text stuff, but obviously you make the image from something.
Michael says

June 2, 2004 at 11:47 am

Help! Need… more… info…

(about the whole thing)
Aaron of Montreal says

June 2, 2004 at 1:28 pm

Michael, I would suggest starting here :

http://aaronland.info/nytimes/

That (particularly the ‘see also’ section which roughtly outlines the project in chronological order) might help you get a better sense of what’s going on.

There are areas that are deliberately less evident than most people might like and some of that is dictated by a desire not to violate the NYT terms of service.

In a nutshell, it’s a stab at trying to find ways of representing “the shape of things” and spotting connections cum serendipity that might otherwise be missed.

I’m sure there are plenty of perfectly valid “business cases” for this kind of modelling but, frankly, they don’t interest me very much.
Michael says

June 2, 2004 at 2:15 pm

So one application could be this, right? To generate the list and look up whether there are entries for a specific person in (say) Wikipedia. By putting historical counters on any particular person or event, and comparing that with whether there is or is not an entry, you could decide that there’s a need in Wikipedia for a particular entry based on presumed demand (i.e., if the Times is writing about something/someone, people will look for further information).

Right?
Aaron of Montreal says

June 2, 2004 at 5:41 pm

Sure, that would be a perfectly good application. From there you could spin out the relations between topics even further based either on a wikipedia page’s meta/@name=keyword or just the links to other pages in their database.

Of course, it gets more complicated because you have to map :

* how names are written (Bush, George
or George Bush)

* how names are normalized in URI space
(George_W_Bush or bush_george_w)

A quick and dirty investigation that querying the wikipedia site via Google will do the right thing, however.