To my great delight and surprise, my grant proposal to the Knight Foundation was selected for the next level of review. The proposal aims to create a performance royalty system for online journalism. Today I submitted the Full Proposal. Please have a look and post comments on their web site.
After the proposal passed the preliminary round of review, I started to seriously investigate the legal issue involved. I contacted several lawyers, both to get some insight and also to line up future collaborators. My proposal, at it's essence, involves charging search engines to index and cache web content. The performance royalty idea is just a fair way of determining the proper distribution of payments to the various news providers and journalists.
The legal issue involved is with regards to "fair use". From my non lawyer understanding, fair use means that I am allowed to reproduce a small passage of a copyrighted work under certain conditions. To determine if an action is legal under fair use, one must use a balancing test. Factors include the purpose of the use, the nature of the work, whether the use impacts the value of the copyrighted work, and the amount of the excerpt compared to the whole.
Somewhere in my academic training, I learned that fair use with regards to text can be mostly captured by the "three sentence rule". You are allowed to quote three sentences and be safe. Search Engines generally follow the three sentence rule (snippets shown on search results pages are never more than three sentences long). Based on this simple rule, search engines have argued that they should never need to pay to link. I agree for the most part. The Internet is all about linking, and charging to link and quote others would be disastrous for the Internet.
It's a bit more complicated, however, in the case of search engines. In order to generate a snippet, a search engine must cache the entire content of the document. The document might not be cached in it's original form, but the entire document is cached in a derivative form. The snippet is generated in response to a user query -- that's why the cache is necessary.
The interesting question is whether I, as a web site publisher, automatically authorize the automated caching of my copyrighted content once I stick it on the web without password protection. Does allowing people to read my web page also give a search engine crawler the right to read my web page and store it's findings? As far as I can tell, the answer to this question is unclear given the current state of the law. During my recent research, I was pointed to an excellent editorial by Bruce Brown and Bruce Sanford on this very subject of Fair Use and Search Engines.