Notes
Slide Show
Outline
1
 
2
What’s a taxonomy again?
  • Words – controlled vocabulary
  • Used as labels – descriptive metadata
  • Attached to documents or physical objects
  • Organized to aid retrieval – hierarchical structure
3
Perspectives on taxonomies
  • Taxonomist (aka Lexicographer, Thesaurus builder)
  • Information architect
  • Indexer
  • Searcher


  • Each has a different view and need for words
  • in retrieving information.
  • Each need relates to using a taxonomy for indexing.


4
What an indexer wants: organized words!
  • Organization, not a top-heavy flat term list
  • Easy navigation/browsing
  • Way to search for a known term
  • Way to find a term without knowing first word
  • Way to spot a concept without a specific word(s)
  • One term for one meaning
  • General inclusive term for multiple specific terms
  • Way to find equivalent concepts
  • Way to find related concepts
  • Way to know exact meaning of term
  • Easy way to learn the terms
5
Taxonomies for
information retrieval online
  • Conceptual framework for web content – reflects organization of knowledge in a domain
  • Foundation for information architecture
  • Often 3 levels deep – depends on domain
  • May be hidden or displayed
6
Info retrieval starts with a
knowledge organization system
  • Uncontrolled list
  • Name authority file
  • Synonym set/ring
  • Controlled vocabulary
  • Taxonomy
  • Thesaurus
  • Ontology
  • Semantic network
7
Structure of
controlled vocabularies
8
Controlled vocabulary
construction standards
  • ANSI (American National Standards Institute)
  • NISO (National Information Standards Organization)
  • ISO (International Standards Organization)
  • BS (British Standards Institute)


  • Differences are minor and diminishing.
  • ANSI/NISO Z39-19.200x revision being voted on.
9
Taxonomy as
an organization system
  • Controlled vocabulary
  • Hierarchical format
    • Parent-child relationships
  • Specific items appear as final leaves on hierarchy branches
  • Common on websites
    • Pick list
    • Browsable directory
    • Other variations
10
Taxonomy defined –              ANSI/NISO A39.19-200x
  • “A hierarchically organized vocabulary based on a classification scheme.”
11
Thesaurus as
an organization system
  • Controlled vocabulary
  • Focus on conceptual classes, not specifics
  • Hierarchy – implicit if not displayed
    • Parent-child relationships
  • Various display formats may be available
  • Network of relationships between terms guides user to find information
    • Cousins, friends, aliases
  • Scope notes, term history
  • More elaborate and informative
12
Thesaurus defined –
      ANSI/NISO A39.19-1993
  • “A controlled vocabulary of terms in natural language that are designed for postcoordination...
  • The controlled vocabulary is established by information specialists or lexicographers and is generally employed in indexing.”


13
Thesaurus defined –              ANSI/NISO A39.19-200x
  • “A controlled vocabulary arranged in a known order in which equivalence, homographic, hierarchical, and associative relationships among terms are clearly displayed and identified by standardized relationship indicators, which must be employed reciprocally.
  • Its purposes are to promote consistency in the indexing of content objects, especially for postcoordinated information storage and retrieval systems, and to facilitate browsing and searching by linking entry terms with terms. Thesauri may also facilitate the retrieval of content objects in free text searching.”
14
Standards and pragmatism
  • Standards are your friends
    • Lead to richer, more informative product
    • Promote interoperability -- Allow you to adopt or adapt other controlled vocabularies
    • Promote predictability
    • Allow repurposing within your organization and by other organizations
  • Follow thesaurus standards for taxonomy
    • Incorporate authority files / final nodes as needed
  • Your taxonomy or thesaurus must meet    your needs
15
Your taxonomy / thesaurus
end product
  • Reflects
    • scope of your concern
    • degree of precision you need
  • Facilitates
    • data storage and retrieval by vocabulary control
    • discovery of ideas
  • Promotes learning
    • preferred terminology
    • relationships among concepts
    • organized guide to your field
16
Taxonomies in business
  • “The High Cost of Not Finding Information”
  • Time wasted searching
  • Confusion about same information by different name
  • Similar/overlapping activities, products, uses
  • (Susan Feldman, KMWorld, March 2004)


  • With a unified taxonomy and consistent indexing:
  • Better searching or browsing to locate information
  • More efficient content management
  • Focused content collection through web spidering
  • Personalized content delivery
17
Talk about terms and taxonomies
  • How to choose terms
  • How to ensure term clarity, avoid ambiguity
    • Vocabulary control—why and how
  • How to format terms
  • Terms within a taxonomy—the big picture
18
How do you choose terms?
  • Importance in the subject area
  • Use in the literature, by the organization or community
  • Necessary degree of specificity or detail
  • Relationship with other controlled vocabularies


19
Vocabulary control – why?
  • “The need to control the formation and use of terms stems mainly from two basic features of natural language, namely
  • synonyms (different terms representing the same concept) and
  • polysemes or homographs (terms with the same spelling representing different concepts).”
  • ANSI/NISO Z39.19-1993
20
Vocabulary control
  • The process of organizing a list of terms
  • to indicate which of two or more synonymous terms is authorized for use;
  • to distinguish between homographs; and
  • to indicate hierarchical and associative relationships among terms in the context of a controlled vocabulary or subject heading list.


  • ANSI/NISO Z39.19-200x
21
Vocabulary control
through disambiguation
  • Synonyms – de-duplicate meanings
  • Different words for the same concept
    • President of the United States, POTUS
    • Biological technology, Biotech
  • Homographs (polysemes) – eliminate ambiguity
  • Same written word used for different meanings
    • Balloon, Box
    • Cells, Mercury, Records, Bridge/Bridges
22
Vocabulary control – how?
  • Use unambiguous terms, clear to the user group
  • Distinguish between terms that appear similar
  • Use Scope Notes when necessary
  • Use terms as elements that can be coordinated in a flexible manner
  • Create compound terms if necessary
23
One term / one concept
  • “Terms in a thesaurus should represent simple or unitary concepts…”
    • (ISO standard)
  • “Each descriptor included in a thesaurus should represent a single concept (or unit of thought). A concept may be expressed by a single-word term or by a multiword term.”
  • (ANSI/NISO Z39.19-1993)
24
A “term” synonym ring
25
So what’s a concept?
  • “A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.”
  • Three main categories
    • Abstract concepts
    • Concrete entities
    • Proper nouns
26
Concrete entities as terms
  • Things and their physical parts
    • primates
      • head
    • buildings
      • floors
  • Materials
    • cement
    • wood
    • lead
27
Abstract concepts as terms
  • Actions and events
    • evolution, skating, management, ceremonies
  • Abstract entitites
    • law, theory
  • Properties of things, materials, and actions
    • strength, efficiency
  • Disciplines and sciences
    • physics, meteorology, mathematics
  • Units of measurement
    • pounds, kilograms, miles, meters, nanoseconds


28
Proper nouns as terms
  • Individual entities – “classes of one” – expressed as proper nouns
    • San Francisco, Lake Michigan


    •   Thesaurus standards exclude proper names,          persons, and trade names à authority files.
    •   Taxonomies include them as final nodes.

29
Pop quiz – which qualify as terms?
  • figure skating
  • speed skating
  • figure skating competitions


  • schools
  • public schools
  • public school curricula


  • marketing and advertising


  • societal issues
    • information ethics, plagiarism, credibility
    • information literacy, lifelong learning
30
The term record
  • Main Term (MT)
  • Top Term (TT)
  • Broader Terms (BT)
  • Narrower Terms (NT)
  • Related Terms (RT)
    • See also (SA)
  • Scope Note (SN)
  • History (H)
  • NonPreferred Term (NP)
    • Used for (UF), See (S)
31
Build a taxonomy – simple steps
  • Get paper and pencil
    • Sharpen pencil
  • Define subject field
  • Collect terms
  • Organize terms
  • Fill in gaps
  • Flesh out and interrelate terms
  • You’re done!
32
Define subject field
  • Review representative collection of content
  • Determine:
    • Core areas
    • Peripheral topics
33
Define subject field for real life content
34
Before you go on: Build or buy?
  • Survey existing thesaurus/taxonomy resources for your domain
  • Test for
    • Scope
    • Depth
      • Make-or-break terms
    • Cost


    • Don’t reinvent the wheel!
35
Collect terms
  • Your documents and databases
  • Departmental terminology
  • Text books and their indexes (indices)
  • Book tables of contents and indexes
  • Journal quarterly indexes
  • Encyclopediae
  • Lexicons, glossaries on the topic
  • Web resources
  • Users and experts
  • Search logs
36
Gather terms from search logs
  • “Beyond the Spider: The Accidental Thesaurus”
  •      (Richard Wiggins in Information Today, Oct 2002)


  • Top ~100 search terms from search logs
  • Match to web site with appropriate answer
  • Basis for favorites or best bets, presented at the top of results list.
  • (AKA behavior-based taxonomy)


  • Not a thesaurus or taxonomy,
  • but still a useful source of terms.
37
Organize terms – roughly
  • Sort terms into several major categories – logical groups of similar concepts as          Top Terms
    • Identify core areas and peripheral topics
    • 10 – 20 to start
    • Consider moving proper names to authority files
  • Result: loose collection of terms under several main headings
    • Rough and tentative – see how it fits as you go
    • Initial gap analysis
    • Add / modify / delete as needed
38
Exercise – collect and organize terms from a magazine
  • Look over the magazine contents pages
  • List the topics this magazine covers
    • Feel free to extrapolate beyond this issue’s contents
  • Sort into a few categories
  • Organize categories into hierarchies of
  • broad à specific topics
  •    5 minutes
39
Usefulness of a term –
the “duh” factor
  • Some terms are so basic for a domain that they have little or no value
    • “Sports” in Sports Illustrated
    • “Technology” in Technology Review
    • “Golf” in Golf Magazine
    • “Information science” and “Information technology”
  • How useful will the term be for indexing?
    • Does the term apply to everything in the domain?
    • Does the term distinguish important concepts?
    • If term is needed, specify limited use conditions in Scope Note
40
Hierarchy structures –
variations on a theme
  • Not pre-determined
    • Subcategorize wines first by type, variety, region, then cost? Or first by cost and then type?
  • Varies by user group and needs
    • May have multiple views of same content
    • Standard alpha view or customized notation
  • Affects information architecture, i.e. how web site functions
41
How do terms relate?
  • Hierarchical relationships
    • -- Parents and their children
  • Equivalence relationships
  • -- Aliases
  • Associative relationships
  • -- Cousins
42
Hierarchical relationships
  • Broader Term represents the class, whole, or genus
  • Narrower Term is a member, part, or species
    • Generic relationship
    • Whole-part relationship
    • Instance relationship
  • BTs/NTs have a reciprocal relationship


43
Broader to Narrower Terms
44
Hierarchy – Generic
(genus-species) relationship
  • Inheritance or inclusion – what’s true of the parent (BT) is true for all children (NTs)
  • Applies to entities, actions, properties, agents – not just biological taxonomies


  • Value Thinking Heat treatment
  •   Cultural value     Contemplation     Annealing
  •   Economic value   Divergent thinking   Decarburization
  •   Moral value   Lateral thinking   Hardening
  •   Social value   Reasoning   Tempering
45
Generic relationship test – 1
  • Both terms in same fundamental category
  • “All-and-some” test
46
Generic relationship test – 2
47
Hierarchy – Whole-part relationship
  • Also known as meronymy or partonymy
  • Four types allowed in thesaurus standards
    • Body systems and organs
      • Ear à Middle ear
    • Geographical locations
      • Bernalillo County à Albuquerque
    • Fields of study
      • Geology à Physical geology
    • Hierarchical social structures
      • Ontario à Manitoulin District
48
Hierarchy – Instance relationship
  • General category (common noun) as BT,
  • with individual example (proper noun) as NT


  • Seas French cathedrals
  • Baltic Sea      Chartres Cathedreal
  • Caspian Sea      Rheims Cathedral
  •   Mediterranean Sea      Rouen Cathedral


49
Pop quiz – Do these
   Narrower Terms fit?
  • Museums
    • Archaeological museums
    • Ethnological museums
    • Curators
    • Scientific museums
    • Museum techniques


50
Sorting challenge
51
Polyhierarchical relationship
  • Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT)
  • Part of ISO standards, new to ANSI/NISO
52
Equivalence relationship
  • Preferred Term
    • Thesaurus term and valid for indexing
    • Thesaurus notation: USE


  • NonPreferred Term
    • Not valid for indexing
    • An alias or imposter
    • Entry point, directs user to Preferred Term
    • Thesaurus notation: UF or NPT
53
Equivalence – when to use
  • Synonyms, slang, quasi-synonyms
  • Scientific and trade names
    • Ibuprofen UF Motrin™
  • Lexical variants
    • Fiber optics UF Fibre optics
    • Mouse UF Mice
  • Upward posting of narrow concepts not specified      in taxonomy or thesaurus
    • Social class UF Elite, Middle class, Working class
54
Associative relationship
  • Related Terms (RTs) – cousins
  • “…terms related conceptually but not hierarchically, and are not part of an equivalence set” (i.e. not synonyms)
  • Both terms are valid thesaurus terms for indexing, and have reciprocal relationship
  • Expands user’s awareness, reflects thesaurus coverage of unanticipated areas
  • Standards describe specific types (see Appendix)
55
Sibling rivalry and facets
  • Format and sense of sibling Narrower Terms should be consistent
  • If siblings don’t coexist well, separate them
  • Subdivide large groups of terms into facets,
  • mutually exclusive subcategories
  • Growing demand with faceted navigation
  • Facet examples
    • Properties, Materials, Agents, Actions, Influence
    • Objects, Styles and periods, Color, Shape
    • (Art & Architecture Thesaurus)
56
Faceted classification
  • Pharmaceuticals
    • (by action)
      • Anti-inflammatory agents…
    • (by chemical structure)
      • Alkaloids…
    • (by indication)
      • Pain…
    • (by use)
      • Immunosuppression…
57
Faceting challenge
  • Paint
    • Oil paint
    • High-gloss paint
    • Interior paint
    • Matte paint
    • Latex paint
    • Semi-gloss paint
    • Exterior paint
58
Faceting challenge
  • Gardens
    • Bird garden
    • Cactus garden
    • Children’s garden
    • Butterfly garden
    • Herb garden
    • Historical gardens
    • Iris garden
    • Native American crops garden
    • Nuestro jardin
    • Shade garden
    • Wildflower garden
    • Xeriscape garden
    • Zen garden


59
Scope Notes (SN)
  • Indicate meaning of the term in the context of this thesaurus, for this audience
    • Stress – Metal, Psychological, Physiological
  • Indicate any restriction in meaning
  • Indicate range of topics covered
  • Provide direction for indexers; for terms often confused, may suggest an alternative term
  • Use only as needed – not for every term
  • Establish and stick with consistent format
  • Be concise
60
 
61
Exercise – card sort
  • Sort these items into logical categories.
  • Propose Top Terms and Broader Terms as needed.
  • Match any NonPreferred Terms to their Preferred Terms (or propose NPTs).
  • Identify Related Terms.
  • Make duplicate cards as needed for terms having Multiple Broader Terms.


62
Exercise – just when you thought you were done…
  • How well can your taxonomy expand?


63
 
64
Exercise – Make a term record
  • MT= ________________________________
  • TT= ________________________________
  • BT= ________________________________
  • NT= ________________________________
  • NT= ________________________________
  • UF= ________________________________
  • RT= ________________________________
  • SN= ________________________________
65
Talk about terms
  • Term format
  • Grammatical issues
  • Singular and plural forms
  • Spelling
  • Abbreviations and acronyms
  • Capitalization
  • Other punctuation
  • Consistency
66
Term format
  • KISS – Keep it short and simple
    • 1-2-3 words
      • Effect on search
      • Factoring, Postcoordination (coming)
  • Grammatical issues
    • Nouns and noun phrases
    • Verbish things
    • Adjectives
    • Adverbs
    • Initial articles
67
Most terms are nouns
  • Nouns or simple noun phrases
    • Adj + Noun – Art history (ANSI/NISO standard)
      • Noun + Prep + Noun – History of art (ISO standard)
    • Exceptions – Burden of proof, Coats of arms, Prisoners of war, Birds of prey, etc.
68
Other parts of speech
  • Verbs
    • Gerund form: Fishing
  • Adjectives
    • Not used in isolation
      • Very rare – (lots in Art & Architecture Thesaurus)
    • OK when combined with another term – Dental bridges
  • Adverbs
    • No, except as part of proper name –
    • Very Large Array
  • Articles
    • No, except as part of proper name –
      • El Salvador, Le Mans
69
Singular and plural forms
  • Plural form for count nouns
    • “how many” clouds, animals, highways
  • Singular form for mass nouns
    • “how much” security, oxygen, rain
  • Exceptions
    • Body parts in medicine à singular (heart, foot)
    • Unique entities à singular (Trans Canada Hwy)
    • User warrant à plural/singular (fishes)
70
Term spelling
  • Preferred spelling depends on audience
    • Multinational company may need alternative spellings in same taxonomy
  • Use most widely accepted spelling
  • Use secondary spelling as NonPreferred Term
  • Exception:
    • Proper names – Labour Party
71
Abbreviations and acronyms
  • Use only when full form is rarely seen –
    • SCUBA, LASER, DNA, LASIK
  • Use full form if abbreviation is not widely used and understood
    • Automated teller machines – for ATM
    • Driving while intoxicated – for DWI
  • Alternative becomes NonPreferred Term
  • Use and acceptance always shifting
  • Be consistent
72
Capitalization
  • Standards: use all lower case
    • Exceptions:
      • Initialisms – DNA
      • Proper names – Queen Mary
      • Trade names – Thesaurus Master™
      • Taxonomic names – Homo sapiens
  • Much variation in practice
        • (my preference)
73
Parentheses
  • Use only for
    • Parenthetical qualifiers to disambiguate homographs (same word, different meanings)
      • Bridges (Dentistry), Bridges (Roadways), Bridges (Music)
    • Different meanings for singular / plural word forms
      • Bridges [all the above] vs. Bridge (Card game)
      • Wood (Material) vs. Woods (Forest)
      • Damage (Injury) vs. Damages (Law)
    • Facet indicators
    • Part of the term – benzo(a)pyrene
    • Trademark indicator (tm) becomes ™
74
Hyphens
  • Generally avoid -- nonfiction
  • Use only if
    • Omitting the hyphen would be ambiguous
      • cocitation vs. co-occurrence
    • The hyphen is part of the term
      • n-body problem
      • p-benzoquinone
      • CD-ROM


75
Other punctuation bits
  • Apostrophes
    • Keep for possessive case
  • Diacritical marks
    • Keep if possible –
    • Québec
  • Other random marks
    • Keep if part of a proper name –
    • A&W Root Beer
    • Standard & Poors


76
Compound terms
and factored terms
  • “Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard)
  • “Compound terms should be factored (split) into simple elements…” (ANSI/NISO standard)


  • Nice in theory…
  • often unworkable
77
Compound terms
are precoordinated
  • Elements are put together to specify a concept at the indexing stage
  • Can’t change the parts


  • Water pollution
  • Library science
  • Television influence on preschoolers


  • Chicken dinner with turnips and rutabagas – no substitutions of menu items!
78
Factored terms
can be Postcoordinated
  • Elements can be put together to specify a concept at the search stage
  • Elements can be mixed and combined as needed
    • Few clothing pieces à several outfits
  • The sum of the elements reflects the concept (usually)
79
To factor or not to factor
  • Is each factor a single concept?
  • Is each factor in your thesaurus?


  • If YES, break term down to factors:
  • California highway construction à
  • California + Highways + Construction


  • If NO, or if factoring would be confusing, retain the compound term
  • Children’s television à Television + Children ??
  • Science library à Library + Science ??
80
Precoordination positives
  • User expectations – Rapid transit
    • Occurs commonly in data, splitting would be odd
    • Reflects a single concept for the audience
  • Better accuracy – captures specific concepts precisely
  • Fewer false drops
  • Term information is retained
  • (Related Terms, NonPreferred Terms, Scope Notes, etc.)
81
Precoordination negatives
  • Poorer total recall
  • Term proliferation
    • Combinations and permutations increase thesaurus size
  • Higher cost
  • Limited flexibility in expressing new concepts
82
Postcoordination pros and cons
  • Higher recall
  • Lower cost
  • Greater flexibility – enables expression of new concepts through novel combinations
  • Lower accuracy, some false drops
    • Library science NOT =   Library + Science
    • Art museums NOT =   Art + Museums
  • Postcoordination is implicit in most searches
83
About “and”
  • Avoid “and” in terms – not a single concept


    • Instead of: Children and television


    • Factor and postcoordinate


    • USE Media influence + Television + Children
84
So far you’ve got
  • Hierarchy
  • Complete term records
    • Broader and Narrower Terms
      • Polyhierarchies when needed
    • Preferred/NonPreferred Terms              (equivalence relationships)
    • Related Terms (associative relationships)
    • Scope Notes
    • Correct term format
    • Compound terms when needed
85
Notation
  • Symbols (numerals, letters, hyphens, colons, etc.)
    • 1: Apples
      • 1.1: Granny Smith
      • 1.2: Winesap
  • Adjunct to verbal expression of term
  • May represent another kind of ordering of sibling terms (non-alphabetic)
    • Chronological, positional, numeric sequence, or other logical sequence for user group
    • Same terms presented differently for different user groups, different purposes
  • Secondary to verbal concept organization
86
Review, edit, test, edit,
use, edit, and maintain, i.e. edit
  • Review
    • Users
    • Expert reviewers
  • Test
    • Index 500+ documents (more for variable writing style; fewer for strict style)
    • Monitor search log

  • Edit and maintain
    • Add term
    • Change existing term
    • Change term status
    • Delete term
    • Add term relationship
    • Delete term relationship
    • Add/modify Scope Note
    • Change overall structure
87
Automatic taxonomy construction
  • Words and phrases from documents
  • Based on frequency and co-occurrence of words
  • No semantic analysis
  • Produces list of possible terms
  • Requires editorial analysis
    • hierarchical and conceptual organization
    • association of related concepts
    • identifying and deduplicating equivalent concepts
88
A pleasant diversion…
  • For a rainy afternoon
89
Show ‘em what you’ve got –
                 displays for every user
  • Thesaurus/taxonomy views and functions depend on audience and purpose
    • taxonomists
    • indexers
    • corporate workers
    • public searchers


90
For the taxonomist
  • Hierarchy view
  • Alphabetic view
  • Permuted (KWIC) view
  • Single term record view
  • Graphical view
  • Notational view
  • Deleted terms
  • Candidate terms
  • Search to retrieve term record
  • Find term in hierarchy view
91
 
92
 
93
For the indexer
  • Search to retrieve term record
  • Access to Scope Notes, Related Terms, NonPreferred Terms
  • Hierarchy view for the big picture
  • Automated proposal of indexing terms


94
 
95
For the searcher
  • Browsable directory (Yahoo.com)
  • Faceted navigation (MOMA.org)
  • Alpha term list or terms grouped by letter
  • Drop down list with selected terms
  • Portal view – complete or partial taxonomy
    • Display terms may be identical to taxonomy terms
    • Display terms may be variants, mapped to taxonomy terms
  • Taxonomy may not be accessible – requires random guessing
96
 
97
 
98
 
99
What we’ve covered
  • Taxonomy – from different perspectives
  • Collecting and organizing concepts
  • Term choice and vocabulary control
  • Taxonomy structure, term format, term relationships
  • Factored and compound terms
  • Constructing a simple taxonomy
  • Display variations for different users
  • Appendices: Lexicographer’s lexicon, Associative relationships
100
“The Computer and the Poet”
101
 
102
Appendices
  • 1.  Lexicographer’s lexicon
  • 2. Associative relationship types
  • 3. Thesaurus Tools, Resources, Services
  • 4. Readings
103
Lexicographer’s lexicon  1
  • Terms and their relationships
  • Main Term
    • Controlled vocabulary term, valid part of the thesaurus
    • aka Descriptor, Preferred Term
  • Entry Term
    • Term that’s not a valid or preferred term, but is linked to the preferred term.
    • aka NonPreferred Term
104
Lexicographer’s lexicon  2
  • Terms and their relationships
  • Top Term (TT)
    • Broadest conceptual category, has no broader term
  • Broader Term (BT)
    • Any term that has narrower terms
  • Narrower Term (NT)
    • Any term that has broader terms
105
Lexicographer’s lexicon  3
  • Top Term
  • Broader Term
  • Narrower Term


  • You are the child of your parents, but parent of your own children.
  • In Linnaean taxonomy, the family, genus, species, etc. are static.
  • In a thesaurus or taxonomy, BT and NT labels are relative and constantly shifting.


106
Lexicographer’s lexicon  4
  • Terms and their relationships
  • (Siblings)
    • Terms that have the same parent and are on the same hierarchy level
  • (Parents, Ancestors)
    • Terms up the family tree

107
Lexicographer’s lexicon  5
  • Cross-references to terms
  • Use/Used For (U/UF)
    •   Synonym equivalence linking a valid thesaurus
    •   term (Descriptor, USE term, Preferred Term)
    •      and a term not in the thesaurus (Used For or
    •      NonPreferred Term)
    •   aka See
    • Swine à USE Pigs  (Preferred Term)
    • Pigs à USED FOR Swine  (NonPreferred Term)
108
Lexicographer’s lexicon  6
  • Cross-references to terms
  • Related Term (RT)
    • Term has a conceptual connection to another term in the thesaurus, but not a BT/NT relationship
    • aka See Also


    • Engineering RT Engineers
    • Instrumental music RT Musical composition
109
Associative relationship types 1
  • Whole-part association
      •    Operating rooms RT Surgical equipment
      •    Buildings RT Doors
  • Field of study and objects studied
    • Seismology RT Earthquakes
  • Occupation and person in occupation
  • Sport RT Athletes
  • Information science RT Special librarians
  • Operation or process and the agent or instrument
  • Velocity measurement RT Speedometers
110
Associative relationship types 2
  • Concepts and their properties
      •    Surfaces RT Textural properties
      •    Elders RT Maturity
  • Concepts linked by causal dependence
    • Injuries RT Accidents
  • Concepts and their origins
  • Information RT Information sources
  • Raw material and its product
  • Hides RT Leather
111
Associative relationship types 3
  • Action and product of the action
      •    Road construction RT Roads
      •    Landscaping RT Gardens
  • Action and its recipient
    • Data analysis RT Data
    • Teaching RT Students
  • Action or thing and its counter agent
  • Pests RT Pesticides
  • Crime RT Security measures


112
Associative relationship types 4
  • Action and its associated property
      •    Precision measurement RT Accuracy
      •    Production supervision RT Quality control
  • Concept and its opposite
    • Tolerance RT Prejudice
    • Height RT Weight
    • RT Depth

113
Thesaurus tools
  • Easy navigation
  • Use through web browser
  • Easy maintenance of hierarchy
  • Access control
114
Thesaurus tools - features
  • Required
    • Support for basic pairs of relationships
      • Related term / related term (RT/RT)
      • Broader term / narrower term (BT/NT)
      • Use / Use for (U/UF) (P/NP)
115
Thesaurus tools - features
  • Required
    • Maintenance of reciprocal relationships
    • Displays
      • Alphabetical
      • Hierarchical

116
Thesaurus tools
  • History
  • Scope Notes
  • Edit Notes
117
Thesaurus tools - features
  • Nice to have
    • Works well as plug-in
    • Easy to update
    • Polyhierarchical
    • Variable field length
    • Import & export options
118
Thesaurus tools - features
  • Nice to have
    • Multiple views & reports
      • Alpha of preferred / nonprefered terms
      • All term records
      • Candidate terms
      • Frequency counts
      • Permuted and comma delimited
      • Deleted terms
119
Thesaurus tools - features
  • Nice to have
    • User-defined term relationships
    • Unlimited number of relationships
    • Unlimited number of hierarchies per term
    • Scope notes, edit notes fields
    • Security
120
 
121
Taxonomy construction
122
 
123
 
124
Thesauri – good examples
  • Inspec - www.iee.org/publish/inspec/
  • MeSH - www.nlm.nih.gov (pre-coordinate)
  • NICEM - www.nicem.com
  • NewsIndexer - www.newsindexer.com
125
References – Information Architecture
  • Chen, H., Dumais, S., Bringing order to the web: automatically categorizing search results. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'00), ACM (2000) 145-152.
  • Rosenfeld, L.; and Morville, P. Information Architecture for the World Wide Web. O'Reilly, 1998.
  • Sullivan, D., Proven Portals: Best Practices for Planning, Designing, and Developing Enterprise Portals. Addison Wesley, 2003


126
Readings – Thesaurus Construction
  • Thesaurus Construction and Use, a Practical Manual.  Fourth edition has taxonomy information.  Aitchison, Jean - Gilchrist, Alan - Bawden, David. 0851424465
  • NISO Z39.19 (2005) standard, NOT the 2003
  • http://www.asindexing.org/site/thesbuild.shtml American Society for Indexers - a good practical approach
  • http://www.willpower.demon.co.uk/thesbibl.htm an excellent - although not recently updated reference list.
  • Books about the process also include the ones listed here http://www.asindexing.org/site/bibliog.shtml
  • There is also a series of white papers and other information on our web site at www.dataharmony.com
127
Thesaurus Services
  • A word from our sponsor
    • www.accessinn.com
    • www.dataharomny.com
  • For thesaurus and taxonomy construction services and tools
128
 
129
 
130
 
131
 
132
 
133