Abstracts
This keynote address will look at the evolution of the linguistic approach to content analysis which Crystal has been developing over the past 20 years. It begins with the knowledge management taxonomy used for the Cambridge family of general encyclopedias, and follows its transformation into an Internet taxonomy, with applications in automatic document classification, search engine assistance, e-commerce, online advertising, and Internet security. Recent developments have brought a focus on advertising, a field which has seen ideas develop from simple keyword analysis to contextual advertising and now to semantic targeting. Crystal explores the difference between these notions, and describes current issues in the way semantic targeting is evolving, including ways of handling site sensitivity, sentiment, intention, and cultural localization. [Paper expected]
In order for businesses to remain relevant in the quickly changing digital environment, content solutions that can be easily adopted and deployed are critical to maintaining a competitive edge. Unfortunately, the marketplace cannot wait for the slow development of standards to guide their needs, nor for the even slower approval process by an industry body. Instead, it requires a bold leap from an informed theoretical base to an implementable strategy.
At The Walt Disney Company, Madi Solomon used the FRBR conceptual model and the research papers of Dr. Jane Hunter to design a simple but effective moving-image metadata model that captured the several instantiations of a single intellectual property across a broad spectrum of consumables (think movies, television series, books, soundtracks, games, and theme park rides). This metadata standard was released to the public last year and has been adopted by Disney and Universal/NBC.
At Pearson Plc, the publishing conglomerate, educational content is evolving with Teachers twittering syllabi to their students, online learning modules are made available on demand, and content can be customized out of chunks from different sources. As their Director of Content Standards, Madi is tackling similar challenges in a new domain. They include how to make diverse resources easily accessible for both editors and consumers, how to maintain relationships and usage history of customizable objects, and how to successfully track these components for rights and royalty payments.
Here Comes Everything is a presentation from a business perspective: a little chaos, a lot of risk, and the expedient urgency of now. [No paper submission]
One of the characteristics of this age - the information and knowledge age - is information overload and an increase in multimedia information. Managing the increasing growth of multimedia information still poses some problems. The challenge is to propose relevant information to users among the large volume of multimedia information. Our approach consists in exploiting context awareness and annotation process to support multimedia information retrieval by supporting appropriate user-system interaction paradigms in order to better respond to user’s multimedia information need. [Paper]
Purpose - The purpose of this paper is to examine and discuss the classification of commercial popular music when large digital collections are organised for use in films.
Design/methodology/approach - A range of systems are investigated and their organization is discussed, focusing on an analysis of the metadata used by the systems and choices given to the end User to construct a query. The indexing of the music is compared to a checklist of music facets which has been derived from recent musicological literature on semiotic analysis of popular music. These facets include aspects of communication, cultural and musical expression, codes and competences.
Findings - In addition to bibliographic detail, descriptive metadata is used to organise music in these systems. Genre, subject and mood are used widely while some musical facets also appear. The extent to which attempts are being made to reflect these facets in the organization of these systems is discussed. A number of recommendations are made which may help to improve this process.
Originality/value - This paper discusses an area of creative music search which has not previously been investigated in any depth and makes recommendations based on findings and the literature which may be used in development of commercial systems as well as making a contribution to the literature. [Paper]
This paper assesses the World Bank's experiences in defining and harmonizing underlying content architecture and refining metadata models to make 30 years' worth of legacy series content discoverable as parent-child content objects within a publication portal. Content objects were analyzed and converted to XML to capture structure, meaning, and relationships within and among objects. Metadata were associated in separately refined models for parent content objects and their child content objects, in order to create multiple browse and search strategies for the user. The World Bank will share lessons and challenges in this endeavor, including the challenges of applying an institutionally based taxonomy required to express subject matter responsibilities and relationships within the World Bank. [Paper]
Purpose - To develop a prototype middleware framework between different terminology resources in order to provide a subject cross-browsing service for library portal systems.
Design/methodology/approach - Nine terminology experts were interviewed to collect appropriate knowledge to support the development of a theoretical framework for the research. Based on this, a simplified software-based prototype system was constructed incorporating the knowledge acquired. The prototype involved mappings between the computer science schedule of the Dewey Decimal Classification (which acted as a spine) and two controlled vocabularies UKAT and ACM Computing Classification. Subsequently, six further experts in the field were invited to evaluate the prototype system and provide feedback to improve the framework.
Findings - The major findings showed that given the large variety of terminology resources distributed on the web, the proposed middleware service is essential to integrate technically and semantically the different terminology resources in order to facilitate subject cross-browsing. A set of recommendations are also made outlining the important approaches and features that support such a cross browsing middleware service.
Originality/value - Cross browsing features are lacking in current library portal meta-search systems. Users are therefore deprived of this valuable retrieval provision. This research investigated the case for such a system and developed a prototype to fill this gap. [Paper]
The interoperability problem between knowledge organization systems has become more critical because of the complication and reuse of KOS. Enlightened by dictionary complication and the characteristics of Chinese language, we use connotation description instead of relationships between items in KOS. We use the conceptual primitives and related instance base of Hierarchical Network of Concepts (HNC) theory and develop both semi-automatic and full-automatic methods for different applications. It is an exploration and should be realized and revised in the future. [Paper]
This presentation will focus on the challenges of finding still digital images online.
It will first sketch the image retrieval market, briefly describing the major players in stock photography – Corbis and Getty Images, also mentioning the need to find archival
research orientated images.
It will progress to briefly deal with the typical ways images are classified to support their retrieval. This will include the classification of images based on their technical aspects, concepts depicted in images, and abstract concepts, including emotions, often applied to a range of images.
The bulk of the time will be spent looking at the challenges of finding the right images to meet given needs. This will include the challenges of browsing for images using taxonomies supplied by image vendors or tag clouds, searching for image attributes, looking for depicted content in images as well as the knotty problem of finding images based on subjective abstract concepts. Reference will also be made to the difficulty to designing targeted searches that produce precise results and the way in which image browsing is perhaps an essential part of many image retrieval processes.
Time will also be spent looking at Content Based Information Retrieval (CBIR) and observations made as to the strengths and weaknesses of this technology in image finding. [No paper submission]
Although images and video comprise an ever growing bulk of the world's digital content, most information retrieval systems rely entirely on textual metadata such as captions, annotations, and tags. In my talk I will argue that such keyword based multimedia retrieval effectively treats images as "black boxes" since all indexing and search is based on the labels associated with a given image rather than the image itself. Furthermore, manual image annotation is an expensive process which is prone to problems such as errors, inconsistencies, ambiguity, lack of context, and both over- and under-keywording. Moreover, a set of textual annotations effectively becomes immutable (not amenable to modification or re-interpretation) and is tied to the idiosyncrasies of a particular natural language.
Consequently there is great scope for systems that are able to perform image search on the basis of an automated analysis of the actual content of images, thus allowing users to search "inside the picture" just as they have become accustomed to being able to search within textual documents.
Unfortunately, most content-based image retrieval (CBIR) systems have failed to gain wide adoption. I will outline why this may be the case, with a particular emphasis on the fact that CBIR solutions have not done enough to bridge the "semantic gap" between their system's retrieval model and that of the user.
I will then discuss an approach to image search that was specifically designed to narrow this semantic gap while addressing the problems inherent in both textual and content based image retrieval. This new approach is founded on the notion of an ontological query language, combined with a set of advanced automated image analysis and classification models. The underlying ontology encompasses relational information about concepts and attributes pertaining to image content, as well as knowledge about the structure and meaning of natural language queries expressed in English. The relevance of each image in a collection with respect to a given user query is assessed probabilistically while taking into account both the reliability and salience (as it pertains to the query) of all information available for that image. The retrieval process can therefore be based both on automated image analysis and classification as well as textual or other metadata (if available). Since the ontological query language renders an original user query into a canonical representation which allows contextual disambiguation and query-specific weighting, it is able to cope with problems such as errors in the automated image classification as well as ambiguities and sparsity of textual metadata.
I will demonstrate how ontological query languages have been utilised by Imense Ltd. to provide effective image retrieval and image analysis solutions, giving rise to the ability to "search inside the picture". Our image retrieval technology also offers other unique advantages that cannot be replicated with existing metadata formats, such as probabilistic relevance assessment and spatial queries. An additional application of the technology is the ability to "auto-tag" images, i.e. to generate descriptions of image content automatically. Automated annotation offers benefits including reduced costs, greater accuracy, higher consistency, and not being tied to any particular natural language. [No paper submission]
Purpose - This paper examines image retrieval within two different contexts: a monolingual context where the language of the query is the same as the indexing language and a multilingual context where the language of the query is different from the indexing language. This study also compares two different approaches for the indexing of ordinary images representing common objects: traditional image indexing with the use of a controlled vocabulary and free image indexing using uncontrolled vocabulary.
Design/methodology/approach - This research uses three data collection methods. An analysis of the indexing terms was employed in order to examine the multiplicity of term types assigned to images. A simulation of the retrieval process involving a set of 30 images was performed with 60 participants. The quantification of the retrieval performance of each indexing approach was based on the usability measures, that is, effectiveness, efficiency and satisfaction of the user. Finally, a questionnaire was used to gather information on searcher satisfaction during and after the retrieval process.
Findings - The results of this research are twofold. The analysis of indexing terms associated with all the 3,950 images provides a comprehensive description of the characteristics of the four non-combined indexing forms used for this study. Also, the retrieval simulation results offers information about the relative performance of the six indexing forms (combined and non-combined) in terms of their effectiveness, efficiency (temporal and human) and the image searcher’s satisfaction.
Originality/value - The findings of this study suggest that in the near future, the information systems could benefit from allowing an increased coexistence of controlled vocabularies and uncontrolled vocabularies resulting from collaborative image tagging, for example, and giving the users the possibility to dynamically participate in the image indexing process, in a more user-centred way. [Paper]
The paper describes the use of Information Extraction (IE), a Natural Language Processing (NLP) technique to assist ‘rich’ semantic indexing of diverse archaeological text resources. Such unpublished online documents are often referred to as ‘Grey Literature’. Established document indexing techniques are not sufficient to satisfy user information needs that expand beyond the limits of a simple term matching search. The focus of the research is to direct a semantic-aware 'rich' indexing of diverse natural language resources with properties capable of satisfying information retrieval from on-line publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project in the UoG Hypermedia Research Unit.
The study proposes the use of knowledge resources and conceptual models to assist an Information Extraction process able to provide ‘rich’ semantic indexing of archaeological documents capable of resolving linguistic ambiguities of indexed terms. CRM CIDOC-EH, a standard core ontology in cultural heritage, and the English Heritage (EH) Thesauri for archaeological concepts are employed to drive the Information Extraction process and to support the aims of a semantic framework in which indexed terms are capable of supporting semantic-aware access to on-line resources. The paper describes the process of semantic indexing of archaeological concepts (periods and finds) in a corpus of 535 grey literature documents using a rule based Information Extraction technique facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Illustrative examples demonstrate the different stages of the process.
Initial results suggest that the combination of information extraction with knowledge resources and standard core conceptual models is capable of supporting semantic aware and linguistically disambiguate term indexing. [Paper]
The ASESG project involves design of an interoperable and specialized information system. This system has to solve the main problem of any multilingual and multidisciplinary information system, namely semantic interoperability, focused on simultaneous access to different heterogeneous collections between metadata domains and data mapping. [Paper]
The Newham Story is a web based project in Newham East London. It enables anyone with an interest in the history of the area to post stories, comments, photographs and multi-media content. The Newham Story supports an active discussion forum where people write collaborative stories and supplement and add to content that others have posted.
The Newham Story has been developed by creating a taxonomy of the area; effectively a classification of people, things, activities, places and time relating to a particular part of London. The application itself contains a controlled vocabulary that enables users to tag their content using pre-defined terms. Any tags that are chosen are then visible to other users and it is then possible to search and find content using the tags. For example, every piece of content tagged with the term 'Canning Town' (an area) or 'World War Two' (an event) or 'docks' will produce all the other content tagged in that way. Users are also able to create their own tags and part of the work of the pilot project has been to assess how folksonomies can be used to support the development of controlled vocabularies.
Since the launch of the pilot system in July 2008, over 240 people have registered to use the system. There is a growing and lively user base including many people in their seventies and eighties. As more users join, and more content is added, a richer folksonomy is being developed that is producing interesting and stimulating ways to browse and explore the content.
The Newham Story is a combination of a controlled vocabulary and a folksonomy, with many users generating content, plus mediated intervention by subject matter experts to help structure content where necessary. It continues to develop and the work on the core controlled vocabularies is ongoing. [Paper]
The Pharmaceutical Benefits Scheme (PBS) is a programme of the Australian Government that provides subsidised prescription drugs to residents of Australia. Established in 1948 the PBS now supplies approximately 140 lifesaving and disease-preventing drugs as part of its national health-care scheme and is one of the Australian Government’s fastest growing areas of health expenditure. In the 2001-2002 financial year it is estimated to cost $4.837 billion, 13.6 per cent more than it did in the previous year. In the last decade it has experienced an estimated average annual expenditure growth rate of around 14 per cent.
Restrictions for prescribing medicines apply to 778 of the medicines on the PBS. The suggestion by the level that the Australian National Audit Office (ANAO), though, is that the complex nature of the prescribing restrictions introduces an unnecessary administrative burden to prescribing, one that results in the under-utilisation of medicines in areas that would be clinically appropriate and cost-effective, and hence results in fewer health benefits being delivered to the Australian population. This growing impact and complexity of restrictions is well evidenced by the fact that the wording associated with Authority restrictions has exponentially increased from an average word count of 19.4 in 2000 to 354.0 in 2005. As a result, the ANAO suggests that many medicines are not reaching the patient populations for whom they are considered cost-effective.
In delivering business performance improvement to the Department of Health and Ageing, whose Pharmaceutical Benefits Advisory Committee (PBA) makes recommendations to its government Minister regarding wording for prescription restrictions, the area of these restrictions was targeted for investigatory analysis. This aim was to ascertain whether improvements could be made to the processes of authoring restrictions wording, specifically to increase consistency and decrease complexity through better codification of the restrictions themselves. The awareness, though, that the current restrictions contained a high variance of styles, terms, formatting, and had largely evolved over the entire life of the PBS, had limited the ability of previous investigations to both come to terms with the business processes that produced them, as well as identify commonality for the basis of a lexicon that could be drawn from for codification and support from an IT system. Analysis and categorisation of this content would have been simpler if it was written in a structured, logical and consistent way.
This is, of course, no different to normal, every day English. English can be a messy language, with exceptions to rules, different styles of writing, and a multitude of different ways to write about exactly the same thing. This apparent lack of structure means that analysis is always hard and very time consuming – even if the output is just to refine the navigation of an organisation’s website. This task is made all the more difficult if the analysis is performed by someone without domain knowledge.
Typically, the approach taken to understand and categorise content is through conduct content audits and content analysis, but an alternate approach was taken in this instance to understand both the processes that produced the prescription restrictions as well as the business taxonomy that produced it – semantic analysis.
This presentation will introduce a case study involving the analysis of medical restrictions text will be used to demonstrate the effectiveness of the use of a linguistic semantic approach, and how it informed the creation of an IT tool that would help codify the content, make it machine readable for repurposing, and introduce a higher level of standardisation. [No paper submission]
This will be a high-level overview of some of the developments in e-research and cyberinfrastructure, with emphasis on some of the opportunities for data curation and data reuse, with considerable emphasis on humanities and social sciences as well as science and engineering. It will also look at developments in "citizen science" and what might be thought of as "citizen humanities" in this context. The talk will conclude with consideration of the changing nature of publishing/authoring, particularly in the scholarly sphere, and the implications of the production of structured, reuseable, and interchangeable knowledge as part of the processes of scholarship and scholarly communication. [Paper expected]
Think of the BBC as a storytelling organisation; then think of the transition needed from storytelling in the world of linear broadcasting to that of the non-linear, hypertext world of the web. The value in a website lies not in its implicit (meta)data of the domain model but rather in the way the domain model overlaps and intersects with other domains. As ever the links are more important than the nodes because that's were the context lives: programmes:segment music:track, programmes:segment food:recipe etc. In this way we weave new 'user journeys' into and out of a domain, into and out of bbc.co.uk. From archive episodes no longer available online, to a recipe page, to a chef, to another recipe and back to a recent episode. Using well targeted content specific links we could not only escape the dead end content silos that characterised bbc.co.uk but point users back to programmes that would hopefully inform, educate and all that stuff. In building bbc.co.uk/programmes and bbc.co.uk/music in this way we have kept everything in its right place we've built a sane, maintainable, scalable, accessible site that search engines love and can be easily evolved to add new features and functionality. So to anyone considering how best to build websites we'd recommend you throw out the Photoshop and embrace Domain Driven Design and the Linked Data approach every time. Even if you never intend to publish RDF it just works. [Paper expected]
Traditional subject indexing and classification are considered infeasible in many digital collections. Automated means and social tagging are often suggested as the two possible solutions. Both, however, have disadvantages and, depending on the purpose of use or context, require additional manual input. This study investigates ways of enhancing social tagging via knowledge organization systems, with a view to improving the quality of tags for increased information discovery and retrieval performance. Benefits of using both social tags and controlled terms are also explored, including enriching knowledge organization systems with new concepts. [Paper]
Globally, biodiversity resources are inevitable digital and stored in wide variety of formats by researchers or stakeholders. In the Malaysian perspective, although awareness of digitizing the biodiversity data has long been stressed, the semantic interoperability of the biodiversity collections is still an issue to be looked into. This is essentially because when data is shared, the copyright crisis occurs hence creating a setback among researchers wanting to promote or share their findings through online presentations. Hence, this has become a hindrance for researchers in this country to share their valuable information and knowledge in this area with their peers locally or even internationally. To solve this, we present an approach to integrate data through wrapping of various datasets stored in relational databases located on networked platforms. The approach, which uses tools such as XML, PHP, ASP and HTML to integrate databases in heterogeneous environment, does not only solve copyright problems by suggesting distributed warehouses and required fields for sharing but also give the data owner the benefit of having their database under their own jurisdiction. The approach presented in this paper is important for scientists as findings in science are useful should be shared among the scientists for a better living. [Paper]
Purpose - The object of this study is to develop methods for automatically annotating the argumentative role of sentences in scientific abstracts. Working from Medline abstracts, we classified sentences into four major argumentative roles: objective, method, result, conclusion. The idea is that if the role of each sentence can be marked up, then this metadata can be used during information retrieval to seek for particular types of information such as novelty, conclusions, methodologies, aims/goals of a scientific piece of work.
Methodology - Two approaches were tested: linguistic cues and positional heuristics. Linguistic cues are lexico-syntactic patterns modeled as regular expressions implemented in a linguistic parser. Positional heuristics make use of the relative position of a sentence in the abstract to deduce its argumentative class.
Findings - Our experiments showed that positional heuristics attained a much higher degree of accuracy on Medline abstracts with an F-score of 64% whereas the linguistic cues only attained an F-score of 12%. This is mostly because sentences from different argumentative roles are not always announced by surface linguistic cues.
Research limitations/implications - A limitation to this study is that we were not able to test other methods to perform this task such as machine learning techniques which have been reported to perform better on Medline abstracts. Also, to compare the results of our study to earlier studies using Medline abstracts, the different argumentative roles present in Medline had to be mapped onto four major argumentative roles. This may have favorably biased the performance of the sentence classification by positional heuristics.
Originality/value. To the best of our knowledge, our study presents the first instance of evaluating linguistic cues and positional heuristics on the same corpus. [Paper]
Facet analysis remains virtually alone as a rational intellectual method for the construction of subject terminologies. Developed for the linear arrangement of items in libraries, the rigour of its analytical approach, and the clear delineation of relationships between concepts in a subject domain make it an excellent contender for the management of subject content in a digital environment. The robustness of the methodology is revealed in the way in which a compatible classification and thesaurus can be derived from the same source data where the structure of the terminology is marked up for machine manipulation. Experience with the Bliss Bibliographic Classification 2nd edition (BC2) shows how the underlying structure of these different formats is semantically similar, and that all the structural components of the terminology can be expressed in a machine readable manner. The systematic conceptual structure suitable for document categorization can be translated relatively easily to a language based metadata tool more appropriate for document description and tagging. To a large extent software can also affect the (human) readable output of the different types of tool, and the conversion between them. Vocabulary control in the narrower sense does, however, present some problems for a process developed for the conceptual representation of knowledge, and needs addressing. Insofar as the faceted terminology is a surrogate for the subject domain itself, a faceted system supports a very wide range of inter-concept relationships and provides an effective tool for browsing and navigation as well as query formulation and modification. However, not all of these potential relationships are currently expressible in the standards for subject representation, either those for bibliographic use, or for web representation. Current work on BC2 is examining the way in which mark-up languages can be used not only to create a version of the classification for dissemination on the web, but also to represent the potential complexity of its semantic structure. Candidate systems such as SKOS (Simple Knowledge Organization System) are not presently able to handle more than the simplest structures in a faceted terminology, and ways in which the range of relationships can be extended offers a substantial challenge. Co-operative work with scholars in the area of humanities computing suggests that existing techniques for the mark-up of texts, to support internal analysis and content representation, have much in common with facet analysis as an approach to the comparable structuring of metadata. In combination these methods may offer a solution to improving the usability of metadata tools and providing more subtle and sophisticated means of subject representation. [Paper expected]
Purpose - The research project aimed to provide a new visual representation of the Artefacts Canada digital collection, as well as a means for users to browse this content. Artefacts Canada Humanities is a database containing approximately 3.5 million records describing the different collections of Canadian museums.
Design/methodology/approach - A four-step methodology was adopted for the development of the faceted taxonomy model. First, a Best Practice Review consisting in an extensive analysis of existing terminology standards in museum communities and public Web interfaces of large cultural organizations was performed. The second step of the methodology entailed a Domain
Analysis - consisting in extracting and comparing relevant concepts from terminological authoritative sources. Thirdly, we proceeded to Term clustering & Entity Listing which involved the breaking up of the taxonomy domains into potential facets. An Incremental User Testing was also realized in order to validate and refine the taxonomy components (facets, values, and relationships).
Findings - The project resulted in a bilingual and expandable vocabulary structure that will further be used to describe the Artefacts Canada database records. The new taxonomy simplifies the representation of complex content by grouping objects into similar facets to classify all records of the Artefacts Canada database. The user-friendly bilingual taxonomy provides worldwide visitors with the mean to better access Canadian virtual museum collections.
Originality/value - Few methodological tools are available for museums who wish to adopt a faceted approach in the development of their Web sites. For practitioners, the methodology developed within this project is a direct contribution to support Web site development of large cultural organizations. [Paper]
Context and purpose of the work - This work draws attention to information retrieval philosophies and techniques allied to the records management profession, advocating a wider professional consideration of a functional approach to information management, in this instance in the development of information architectures.
Methodology - This paper draws from a hypothesis originally presented by the author (Milne, 2007a) that advocated a viewpoint whereby the application of records management techniques traditionally applied to develop business classification schemes was offered as an additional solution to organising information resources and services (within a university intranet), where earlier approaches notably subject and administrative based arrangements were found to be lacking.
The hypothesis was tested via work-based action learning and is presented here as an extended case study. This paper also draws upon evidence submitted to the Joint Information Systems Committee in support of the University of Abertay Dundee's application for consideration of the JISC award for innovation in records and information management (University of Abertay Dundee, 2007).
Findings - The original hypothesis has been tested in the workplace. Information retrieval techniques allied to records management (functional classification) were the main influence in the development of pre and post-coordinate information retrieval systems to support a wider information architecture, where the subject approach was found to be lacking. Their use within the workplace has since been extended.
Originality/value - The paper advocates the development of information retrieval as a discipline, should include a wider consideration of functional classification, as this alternative to the subject approach is largely ignored in mainstream IR works. [Paper]
Much of the recent attention devoted to Cloud Computing has been concerned with outsourcing of hardware or hosting of applications. Important as these trends are, the Cloud is capable of far more than simple replication of existing enterprise processes.
Amazon's recently announced Public Data Sets programme and the World Wide Web Consortium's (W3C) Linked Open Data community project illustrate the opportunity for re-use of public data, with licensing frameworks evolving to reflect shifting presumptions. Specifications from the Semantic Web are being put to work as enterprises such as Thomson Reuters seek to unlock value in expensively curated internal data.
What happens as increasing quantities of data become accessible, as attitudes to control and ownership morph, and as technologies evolve to enhance 'enterprise' applications with insight from beyond the firewall? Where might the balance lie between comprehensiveness and insight on one hand, and security and control on the other? [No paper submission]
Content-based image retrieval (CBIR) technologies offer many advantages over purely text-based image search. However, one of the drawbacks associated with CBIR is the increased computational cost arising from tasks such as image processing, feature extraction, image classification, and object detection and recognition. Consequently CBIR systems have suffered from a lack of scalability, which has greatly hampered their adoption for real-world public and commercial image search. At the same time, paradigms for large-scale heterogeneous distributed computing such as Grid computing, cloud computing, and utility based computing are gaining traction as a way of providing more scalable and efficient solutions to large-scale computing tasks.
In this paper, we present an approach in which a large distributed processing Grid has been used to apply a range of CBIR methods to a substantial number of images. By massively distributing the required computational task across thousands of Grid nodes, we have achieved very high throughput at relatively low overheads. This has allowed us to analyse and index about 25 million high resolution images thus far while using just two servers for storage and job submission. The CBIR system was developed by Imense Ltd. and is based on automated analysis and recognition of image content using a semantic ontology. It features a range of image processing and analysis modules, including image segmentation, region classification, scene analysis, object detection, and face recognition methods. [Paper]
IFLA FRBR Group 3 entities "represent an additional set of entities that serve as the subjects of works" (IFLA, 1999: 16). A third IFLA Working Group of the FRBR family, FRSAR (Functional Requirements for Subject Authority Records), was formed in April 2005 and charged with the task of developing functional requirements and a conceptual model for subject authority records. One of the terms of reference is to build a conceptual model of Group 3 entities within the FRBR framework as they relate to the “aboutness” of works. In this framework all three entity groups as defined by the FRBR conceptual model have the potential to be the subject of a work. In other words, Group 1, 2 and 3 entities all can have an “is-subject-of” relationship with a work. The FRSAR Working Group proposed an abstract conceptual model and presented it at the IFLA Conference in August 2007. The model was further discussed and developed by the Working Group in 2008. The draft report prepared by the FRSAR Working Group has indicated that the focus of the model is on the authority data instead of authority records, hence the abbreviation used in the report is FRSAD, i.e., Functional Requirements for Subject Authority Data. [Paper]
In this paper, the functional and relational characteristics and requirements for various types of semantic interoperability in a comprehensive international knowledge organisation system are discussed with regard to an analysis of the underlying retrieval paradigms. Furthermore, this paper analyses the potential benefits and perspectives of the selective transfer of modelling strategies from the field of semantic technologies for the refinement of relational structures of inter-system and inter-concept relations as a requirement for expressive and functional indexing languages supporting advanced types of semantic interoperability. [Paper]
The encyclopaedia of Iranian architectural history was established with the goal of increasing the accessibility of the widespread resources and documents related to Iranian architectural history and to provide a better and more productive space for collaboration of researchers and scholars, enabling them to expand and improve this encyclopaedia. The information architecture which started to get implemented is aimed to achieve three goals. First, increase the accessibility of the documents related to topics; second, the relation between concepts; third, the relation between concepts and documents. A three-layer architecture is designed to achieve the mentioned goals (EIAH cake). The underlying layer is a pool of information which is an integration of distributed digital repositories in our case. The top level is the knowledge representation level, an ontology of Iranian architectural history and the last layer which sits in the heart of this architecture is the mediator level which is responsible for establishing the relation between concepts and documents and enhancing search and semantic interoperability. The metadata model for describing resources in distributed digital repositories is customized based on Dublin Core with refinements. All documents in distributed repositories get their metadata according to this model and a detector agent (the mediator level) harvest metadata to interpret them by the ontology (the top layer). The results of this process will be presented in a semantic portal or might be used for complex search queries by end users. When this happens on a federation of distributed digital repositories, the ocean of separated documents becomes much meaningful and interpretable by human scholars. [Paper]









