You are here

Identifying continuing education resources on new data standards and technologies through an automatic crawling system

This presentation will share the results of our pilot study to develop an automatic crawling system for identifying professional development resources on emerging standards and technologies for data organization and management in libraries. The list includes newer models and tools for resource discovery and access and data communication such as BIBFRAME, a replacement for the MARC format, and Semantic Web technologies such as RDF, OWL, SKOS, and SPARQL. The unique centrepiece of our project repository is development of an automated workflow to crawl for, and monitor, and classify relevant Web objects into searchable professional development categories. This automatic crawling and monitoring mechanism will support our goal of creating a functioning digital repository with scalability, currency, minimal operating costs, and sustainability. The crawling system uses an intelligent content-based algorithm to select the paths intelligently to extract, retrieve, and classify high-quality relevant Web objects based on the hyperlinks, anchor texts, and contents. The pilot digital repository as well as the findings from our iterative rapid prototyping process will be shared to highlight their potential for fulfilling the pressing continuing education needs of cataloguing and metadata communities.

Presentation Type: 
Talk
Language: 
English
Presentation Audio: 
Audio Size: 
10MB
Presentation Visual: