Willpower
Information Information Management Consultants |
I drew up the following glossary when working on the British Standard for thesaurus construction, BS8723, with three consultant colleagues who specialise in the development and use of thesauri and other forms of structured vocabulary for information retrieval: Stella Dextre Clarke, Alan Gilchrist and Ron Davies, and I am grateful for their suggestions and comments. BS8723 has now been withdrawn and I have made some changes while working on its replacement, BS ISO 25964. I have tried as far as possible to maintain consistency with definitions in these standards, but discussions about some of the definitions are continuing, and they may change. Some of the definitions, notes and examples are my personal opinions, and my colleagues may not agree with them. I do not claim that these definitions are "correct" and that other meanings are "wrong", but I hope that they will be found to be a consistent and well-defined set which will aid communication by encouraging everyone to use the same words with the same meaning. I have retired from active consultancy and my business website at "Willpower Information" is no longer available, but I still welcome comments and feedback, which can be sent to me, Leonard Will, at L.Will@willpowerinfo.co.uk.
e.g. in a this extract
vehicles
(vehicles by number of
wheels)
- monocycles
- bicycles
- - motor bicycles
- - pedal bicycles
- tricycles
- four-wheeled vehicles
(vehicles by motive
power)
- mechanically powered vehicles
- - motor bicycles
- - motor cars
- human powered vehicles
- - pedal bicycles
- hybrid human/mechanically powered vehicles
- - mopeds
the complete array of sibling terms under vehicles consists of
monocycles
bicycles
tricycles
four-wheeled vehicles
mechanically powered vehicles
human powered vehicles
hybrid human/mechanically powered vehicles
This may be subdivided into subsets by grouping under the node labels, forming two smaller arrays:
monocycles
bicycles
tricycles
four-wheeled vehicles
and
mechanically powered vehicles
human powered vehicles
hybrid human/mechanically powered vehicles.
Another array, at a lower level, under the broader term bicycles, is composed of the two sibling terms
motor bicycles
pedal bicycles
See the note under characteristic of division on the options for dealing with hybrids such as mopeds.
British Tabulating Machine
Company
merged into International Computers
and Tabulators
Gates, Bill
use for Gates, William Henry
Gates, William Henry
use Gates, Bill
ICL
use International Computers
Limited
ICT
use International Computers and
Tabulators
International Computers and Tabulators
created by merger of British
Tabulating Machine Company and
Powers-Samas
subsequently International Computers
Limited
International Computers Limited
formerly International Computers and
Tabulators
Powers-Samas
merged into International Computers
and Tabulators
Prime Computer Inc.
Science Museum (London). Library
Victory (ship)
e.g. in the example of citation order, a compound concept is represented by the pre-coordinated string
bicycles - tyres - punctured - repairing - instruction books
In a classification scheme arranged in this way, everything on bicycles will be grouped together, but material on tyres or instruction books will be scattered. To provide index entries to allow these scattered topics to be found, we write the string in the reverse order, and successively truncate it from the left, making an index entry for each resulting substring:
instruction books - repairing - punctured tyres - bicyclesThese entries are then arranged in alphabetical order:
bicyclesEach of these index entries would be followed by the appropriate notation to link it to its place in the classification scheme. As the citation order of this classification determines that everything about tyres for bicycles will be grouped together in the classified sequence, it is not necessary for the index to have entries such as tyres - bicycles - punctured, or other combinations and permutations of the terms in the string. A chain index is thus more economical than a fully permuted index, in which a string of five terms would generate 120 index entries.
The mechanical method of generating a chain index described here may be modified by editorial intervention to suppress entries which are likely to be unsought, and to combine terms grammatically to make the index entries more readable; this has been done in the above example where punctured tyres has been used rather than punctured - tyres.e.g. In the following, "number of wheels" and "motive power" are the characteristics by which the concept of "vehicles" is divided. These are shown in the node labels (vehicles by number of wheels) and (vehicles by motive power).
The concepts in an array should be mutually exclusive, having distinct values of the characteristic of division, though lower-level concepts can occur under more than one. For example, hybrids, such as mopeds (mechanically-assisted pedal cycles) are by definition both mechanically powered and human powered. They can therefore be listed as narrower terms of both concepts, as shown below. In some cases it may be desirable to provide explictly for such hybrids, as shown in the examples under array. The scope note should clarify whether a term such as human powered vehicles is to be used for vehicles that are exclusively or partially human-powered.
vehicles
(vehicles by number of
wheels)
- vehicles without wheels
- monocycles
- bicycles
- - motor bicycles
- - pedal bicycles
- tricycles
- four-wheeled vehicles
- vehicles with more than 4 wheels
(vehicles by motive
power)
- mechanically powered vehicles
- - mopeds
- - motor bicycles
- - motor cars
- human powered vehicles
- - mopeds
- - pedal bicycles
The choice of citation order determines which concepts are the most important to be grouped together in a catalogue or list, and increases consistency in the construction of strings for similar subjects.
Citation order is usually specified in terms of the facets to which concepts belong or the roles that they play in relation to other concepts in the string. A sequence that is often appropriate, especially for technical subjects, is:
thing - kind - part - property - material - process - operation - system operated on - product - by-product - agent - space - time - form
e.g.bicycles | - | tyres | - | punctured | - | repairing | - | instruction books |
(thing) | - | (part) | - | (property) | - | (operation) | - | (form) |
When concepts from two different arrays within a single facet are to be combined, the citation order is normally such that the array listed later in the schedule is cited first (takes priority in grouping). Thus if wines were grouped by into two arrays: first (wines by colour) and second (wines by country), the combined concepts would be listed under the second of these, thus:
wines
(wines by colour)
- red wines
- white wines
(wines by
country)
- Australian wines
- - (Australian wines by
colour)
- - red Australian wines
- - white Australian wines
- French wines
- - (French wines by
colour)
- - red French wines
- - white French wines
It is possible to make additional entries under permutations of these citation orders of facets and arrays, but this not only increases the size of a catalogue but also leads to inconsistency as there is a risk that some permutations will be omitted. Some resources may be assigned one version of the complex concept and some another, so that there is not a complete list under either.
human resource management combines the idea of people with their usefulness as resources requiring management
Complex concepts are sometimes expressed in a single word, but are more often conveyed by a multi-word term.
Examples of categories that may be used for grouping
concepts into facets are: activities, disciplines,
people, materials, living organisms, objects, places
and times. e.g.
(1) animals, mice, daffodils and
bacteria could all be members of a living
organisms facet;
(2) digging, writing and cooking
could all be members of an activities facet;
(3) Paris, the United Kingdom and the
Alps could all be members of a places facet.
Categories are normally chosen so that facets are mutually exclusive; a concept cannot then occur in more than one facet. In a classification scheme, facets may be restricted to a single discipline, such as a diseases facet in medicine, or may be common facets such as people, time, place and form, which apply across all disciplines. Facets may be subdivided into mutually exclusive subfacets.
Some writers use the term "facet" to specify the role that a concept plays in a complex concept, as well as the category to which it belongs. For example, they may say that materials can belong to "raw materials" or "products" facets, and people may be in "agents" or "patients" facets. For clarity, it is better to avoid this usage, keeping the term "facet" for fundamental categories such as "materials" or "people" and specifying roles separately. Both facets and roles are used in setting up rules for citation order.
Other writers use the term "facet" to mean "attributes" or "properties", confusing them with characteristics of division. There may be multiple characteristics of division of concepts within a single facet, e.g. within a materials facet there may be a concept of wines, subdivided into several arrays, not mutually exclusive, each headed by a node label such as <wines by colour>, <wines by sweetness>, <wines by origin>, <wines by price> and so on. Any specific wine can be listed in several of these arrays. Searching by these is better called searching by parameters or characteristics rather than by facets.
Schedules are compiled for each facet, and terms or notations from these may be combined according to prescribed rules to express a complex concept.
e.g. in a monohierarchical structure, the concept pianos cannot be listed as a narrower term of both keyboard instruments and stringed instruments; a choice has to be made of one of these concepts to determine its placing.
e.g. human resource management.
Multi-word terms typically label complex concepts and are admissible in a thesaurus as preferred terms.
A node label contains one of two different types of information: either (1) the name of a facet or subfacet to which following terms belong (this type would be better called a "facet label", but unfortunately this usage is not established in the literature or standards); or (2) the attribute or characteristic of division by which an array of sibling terms has been sorted or grouped.
e.g. the following classified display starts with the facet "disciplines" and changes of facet are shown by node labels of type 1, shown in parentheses. A node label of type 2 is shown in angle brackets:
photography
- (people)
- photographic models
- - <photographic models by gender>
- - female photographic models
- - male photographic models
- photographers
- (operations)
- taking photographs
- developing
- printing
- (objects)
- cameras
- photographs
- - black and white photographs
- - colour photographs
Notation may be used to sort and/or locate concepts in a pre-determined systematic order, and optionally to display how concepts have been structured and grouped. A notation can provide the link between alphabetical and systematic lists in a thesaurus and between the alphabetical index and the classified sequence of a classification scheme.
e.g., partial schedule showing notation in the left-hand column:
P200
photography
P250 - - photographic equipment
P251 - - - camera accessories
P251.3 - - - - flash guns
P251.5 - - - - tripods
P253 - - - cameras and camera
components
P253.1 - - - - camera components
P253.13 - - - - - camera lenses
P253.15 - - - - - camera viewfinders
P253.2 - - - - cameras
This set may be any selection of numerals, upper and lower case alphabetic characters, and punctuation symbols. The larger the set of symbols on which the notation is based, the greater the number of concepts that can be represented by distinct notations of the same length.
Punctuation marks may be used in notations:
Such relationships are shown in a structured vocabulary, independently of any indexed document.
e.g. searching for wines for which the colour is red and the alcohol content is from 5% to 10%.
This type of search is for concepts that occur within one or more arrays of a single facet, e.g. narrower terms of wine in a "materials" facet grouped under the node labels (wine by colour) and (wine by alcohol content). In some systems it is possible to search for a range of values rather than just for specific values.
It is to be distinguished from searches for compound concepts which may be made up of concepts from different facets, such as wine from a "materials" facet combined with red colour and alcohol content from a "properties" facet.
Compare with monohierarchical structure. In a polyhierarchical structure, a single concept can occur at more than one place in the hierarchy. Its attributes and relationships, and specifically its scope note and its narrower and related terms, are the same wherever it occurs.
e.g. in a polyhierarchical structure, pianos may be listed as a narrower term of both keyboard instruments and stringed instruments.
e.g. when using post-coordinate indexing, a manual on bicycle repair might be assigned the three separate preferred terms
Someone searching for such a manual would compose a search statement such as (bicycles AND repairing AND instruction books). The document would also be retrieved by a search for (bicycles AND instruction books) or for any one or more of the preferred terms. Compare pre-coordinate indexing.
e.g. when using pre-coordinate indexing, a manual on bicycle repair might be assigned the indexing string made up of three preferred terms in combination:
bicycles - repairing - instruction books
This brings all aspects of repairing bicycles together in a catalogue or browsing list, and might be followed by
bicycles - repairing - tools
There would be no direct alphabetical access to this subject under repairing, instruction books, or tools. This does not mean that the individual concepts within a pre-coordinated string cannot be searched for separately, either as controlled preferred terms or as free text, but such methods are not part of the pre-coordinate indexing system. Compare post-coordinate indexing.
e.g. schools; school uniform; costs of schooling; teaching.
A preferred term should preferably be a noun or noun phrase.
Search thesauri are designed to facilitate choice of terms and/or expansion of search expressions to include terms for broader, narrower or related concepts, as well as synonyms. Optionally, a normal thesaurus may be used as a search thesaurus.
A semantic network is a way of representing an ontology. The vertices of the network represent concepts and the edges represent semantic relationships between them. The vertices are sometimes called "nodes", which are not to be confused with the node labels of a thesaurus or a faceted classification.
sciences
- biology
- chemistry
- - analytical chemistry
- geology
- physics
- - nuclear physics
- - quantum physics
Subfacets, like facets, should be defined so that they are mutually exclusive. For example, an "agents" facet might be subdivided into "individuals" and "organisations" subfacets; an "activities" facet might be subdivided into transitive "actions" and intransitive "processes" subfacets.
Some writers use the term "subfacet" as synonymous with array, or with the slightly broader meaning of the whole subtree of concepts grouped under a node label showing a characteristic of division, rather than just the first level array of sibling terms. I suggest that it should not be used with these meanings, as the intuitive meaning is "a subdivision of a facet" and we already have the terms "array" and "subtree" for the other meanings.
ISO/IEC 13250 gives the following three definitions for
"topic map":
a) A set of information resources regarded by a topic
map application as a bounded object set whose hub
document is a topic map document conforming to the SGML
architecture defined by this International
Standard.
b) Any topic map document conforming to the SGML
architecture defined by this International Standard, or
the document element (topicmap) of such a
document.
c) The document element type (topicmap) of the
topic map document architecture.
The introduction to ISO/IEC 13250 says: "In general,
the structural information conveyed by topic maps
includes:
- groupings of addressable information objects around
topics ('occurrences'), and
- relationships between topics ('associations')".
Last modified
2021-08-17 17:20
Copyright © Leonard Will, 2008 - 2021