Content Mining in Action

Every day about 15,000 journal articles and over 500 theses are published. We must use machines and software if we are going to stop being drowned! Peter Murray-Rust is a scientist-turned-activist who, with funding from The Shuttleworth Foundation, has created a community of young excited enthusiasts who are building the tools, resources and practices. Given these tools, content mining can be done by anyone and 6 young Fellows have been appointed to carry out research projects in biomedical science.

Starting from an authority list or thesaurus of terms used in your discipline (which you can  create if there isn't one) you'll be surprised how many documents you can find using TDM tools. The next step is to extract information and aggregate, and that is also possible.  In this talk, Prof Murray-Rust showed the various stages - crawl, download, normalize, search, index, and understand - everything Open Source and re-usable. His plant science slides at illustrate the possibilities.

