Granularity in Library Linked Open Data Gordon Dunsire Keynote presentation to Code4Lib 2013, 12-14 Feb 2013, Chicago, USA
Fractals Self-similar at al levels of granularity Cannot determine level: al levels are equal!
Multi-faceted granularity What is described by a bibliographic record? Or a single statement? What is the level of description? How complete is it? How detailed is the schema used? How dumb? Semantic constraints? Unconstrained? AAA! OWA! Rumsfeld and the white light!
Resource Description Framework – Linked data Triple: This resource has intended audience Juvenile Subject Predicate Object has Granularity? Coarse-grained systems consist of fewer, larger components than fine-grained systems [Wikipedia]
Subject: what is the statement about? Consortium collection RDF map Library collection Digital collection coarser Journals Subjects Access Super-Aggregate Journal title Journal index Aggregate Issue Festschrif Focus Article Resource Work Component Section Graphics Page Sub-Component Paragraph Markup finer Word RDF/XML URI Node
Predicate: what is the aspect described? coarser Membership category Super-Aggregate Access to resource Aggregate Access to content Focus Suitability rating Component Audience and usage Sub-Component Audience finer Audience of audio-visual material
Possible Audience map (partial) unc: “has note on use or audience” unc: unconstrained version rdfs: subPropertyOf isbd: International Standard isbd: “has note on Bibliographic Description unc: use or “Intended audience” audience” dct: Dublin Core terms dct: rdfs: “audience” schema: Schema.org subPropertyOf schema: “audience” rda: Resource Description and Access rda: m21: marc21rdf.info m21: “Intended “Target audience” audience” frbrer: frbrer: Functional “has intended Requirements for audience” rdfs: Bibliographic Records, subPropertyOf m21: entity-relationship model
audience of …”
What is the aspect described? coarser Resource record Super-Aggregate Manifestation record Aggregate Title and s.o.r Focus Title statement Component Title of manifestation Sub-Component Title word finer First word of title
Possible Title semantic map sP: rdfs:subPropertyOf (partial) d: rdfs:domain r: rdfs:range sP sP dc: r “Title” dct: rdfs: sP “Title” “Literal” sP eP rdaopen: isbd: “Title” “has title” sP sP rdagrp1: rdaopen: “Title sP “Title proper” (Manifestation)” sP isbd: sP “has title proper” d d d rdagrp1: “Title proper rdafrbr: (Manifestation)” “Manifestation” isbd: “Resource” d
Semantic reasoning: the sub-property ladder Semantic rule: If property1 sub-property of property2; Then data triple: Resource property1 “string” Implies data triple: Resource property2 “string” dct: dct:title “has title” Resource “Physics” rdfs: coarser subPropertyOf machine entailment dumb-up isbd: finer isbd: isbd: “has title proper” “has title proper” ”Resource” “Physics”
Data triples from multiple schema frbrer: ”has intended audience” ex:1 “Primary school” isbd: ”has note on use or audience” ex:2 “For ages 5-9” rda: ”Intended audience (Work)” ex:3 “For children aged 7-” m21: ”Target audience” m21terms: ex:4 commonaud#j “Juvenile” skos:prefLabel
Data triples entailed from sub-property map unc:”has note on use or audience” ex:1 “Primary school” unc:”has note on use or audience” ex:2 “For ages 5-9” ex:3 unc:”has note on use or audience” “For children aged 7-” ex:4 unc:”has note on use or audience” “Juvenile”
Data triples entailed from property domains ex:1 ”is a” frbrer:”Work” ex:2 ”is a” isbd:”Resource” ”is a” ex:3 rda:”Work”
What is the aspect described? coarser Super-Aggregate Creator Aggregate Author Focus Screenwriter Component Animation screenwriter Sub-Component Children’s cartoon screenwriter finer
dc:”Contributor” ? s marcrel:”Author” dc:”Creator” ? marcrel:”Author s of screenplay, etc.” r dct:”Creator” dct:”Agent” ? lcsh: ”Screenwriters” ? rdaroles:”Creator” d r s d r rda:”Work” rdaroles:”Author (Work)” [rda:”Agent”] d s r rdaroles:”Screenwriter (Work)” s: rdfs:subPropertyOf d: rdfs:domain r: rdfs:range
Machine-generated granularity Ful -text indexing: down to word level A very large multilingual ontology with 5.5 millions of concepts • A wide- coverage "encyclopedic dictionary" • Obtained from the automatic integration of WordNet and Wikipedia • Enriched with automatic translations of its concepts • Connected to the Linguistic Linked Open Data cloud!
User-generated granularity “OK for my kids (7 and 9)” “Too childish for me (age 14)” “Ideal for the child of ambitious parents” “This sucks – for kids only” “Great! Has cool stuff”
KISS Keep it simple, stupid Keep it simple and stupid? The data model is very simple: triples! The (meta)data content is complex Resource discovery is complex The Mandelbrot Set: “an example of a complex structure arising from the application of simple rules” - Wikipedia
AAA Anyone can say anything about any thing Someone wil say something about every thing In every conceivable way Linguistical y
OWA Open World Assumption: the absence of a statement is not a statement of non-existence “There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.” - Donald Rumsfeld Wil al the gaps get fil ed?