Thursday, October 26, 2006

Knowable? Who knew?

Wikipedia and the Speed of the Internet fortuitously combined with a sudden curiousity about just what the heck those link-looking things in the Blogger User Profiles were good for. [Don't end a sentence with a preposition -- RD. Fine, you figure out how to get rid of it -- HS]

What they do is return a list of blogger profiles having that string in that profile category.

What they are good for is demonstrating how easy the Internet makes it to obtain information that previously would have been almost, if not completely, unobtainable.

For example: I have several books in my user profile. Clicking on a title returns a list of others registered on Blogger listing that same title. Combining a little spreadsheet geekery with a half-hour, here are some statistics for a few representative titles:

Red Sky at Morning

Male: 42%, average age 27.6, percent reporting age, 50%
Female: 58%, 30.0, 40%
Average Age: 30
n = 43

The female percentage is higher than I would expect for a male coming of age story.

Master & Commander

Male: 82%, 29.0, 64%
Female: 18%, 20.2, 67%
Average Age: 29.0
n = 34

The M - F delta is not surprising, but the age delta is.

The Road to Serfdom (even those who have read the book should take a look at the link)

Male: 88%, 29.6, 53%
Female: 12%, 21.0, 20%
Average age: 29
n = 41

Same as for M & C, but the relative youth is surprising.

Cryptonomicon

Male: 91%, 33.6, 68%
Female: 9%, 35.5, 50%
Average age: 34
n = 44

No surprises here for a plot and action driven book whose female characters make cardboard cutouts look fully three dimensional in comparison.

The Bridges of Madison County

Male: 26%, 42.6, 90%
Female: 74%, 30.0, 34%
Average age: 32
n = 39

No surprises here for a character driven book whose protagonist evokes tears for actions that would produce rage if the protagonist was male. [Have you read the book? -- RD. Nope -- HS. How do you know? -- RD. Amazon reviews are gospel. -- HS.]

There are DD readers whose numeracy and overall analytical skills would quickly point out many statistical shortcomings. However, rock solid information isn't really the point here, so much as the fact that the internet is so thoroughly suffused with data that previously unobtainable information is frequently to be had virtually for the asking. The real limits are becoming human rather than logistical: the habit of asking previously unanswerable questions is no easier to come by than finding what is hiding in plain sight.

Oh, and there is one other thing. Decades after the second wave of the women's movement, you would expect books to be more gender-neutral.

Well, yes you would, if men and women were, in fact, gender neutral. It is now very easy to investigate the quality of that surmise, in case its glassy eyed nonsense isn't intuitively obvious to even the casual observer.

Does the internet provide a leg up for sense over nonsense?

2 Comments:

Blogger Harry Eagar said...

I have read 'Bridges of Madison County' and even have a picture of myself, kids and dog standing on a bridge in Madison County, taken at least 10 years before the book was published.

The book was as the Amazon reviews led Skipper to think.

As for Hayek in cartoons, sponsored by GM, how're those market forces workin' out for ya, GM?

October 26, 2006 5:57 PM  
Blogger Bret said...

At first I was shocked that out of hundreds of thousands of bloggers who use blogger.com, only 41 had read the road to serfdom.

Then I realized that I hadn't filled out my profile to include books and probably not that many others have either (who has the time for such frivolity when one can be doing more important things like writing posts? :-).

Hey Skipper asks: "Does the internet provide a leg up for sense over nonsense?"

I expect that one day there will be a service call Google Truth (or something like that), that does a network analysis to see how statements correlate across the Net and factors in reliability of those who make the statements to come up with a score for any given fact. For example, you might ask Google Truth "Is global warming a threat to humans?" and it might respond "33.4% TRUE" (I'm just pulling a number from thin air so please don't question it). It would derive this number from everything posted on the Internet. Paul Ehrlich's website would no doubt say yes, but since his predictions have never been right, his assertions would be intensely deweighted.

I sorta do that manually when I reach conclusions, most of which are probabilistic.

October 28, 2006 10:49 PM  

Post a Comment

Links to this post:

Create a Link

<< Home