Search Engine Optimization - Anybody heard about Dublin Core?
by Norman Lyons, August 30
As I've described in some earlier articles in this SEO-related article series, Metadata, in a very broad sense, is essentially supplementary descriptive information about a resource of some type. As it relates to the context of our discussion, we've been using Metadata to provide an extended description of the information that is contained within a given webpage. For example, in previous articles, I've described the usage of the META description tag, the META keyword tag, in addition to many other META tag implementations, including the recently added META "unavailable_after" tag. In most situations, these META tags are included within our webpages with the specific intent of assisting the search engine robots and automated crawling programs to more easily and more effectively index the content contained within our webpages. The idea here is that by simplifying the job of indexing our webpage-based content, we are hopefully improving the visibility of our webpages in terms of the results that are returned to search engine queries.
So how does the Dublin Core Metadata Initiative fit into this scheme. Furthermore, why is it important to us.
First, before we get into any specifics we need to address what the Dublin Core Metadata Initiative is about. According to the Dublin Core website, "The Dublin Core Metadata Initative provides simple standards to facilitate the finding, sharing and management of information."1 However, Dublin Core metadata is not limited to any specific type of media or resource but instead can be used to describe any kind of resource. So, in essense, Dublin Core metadata is intended to further "supplement existing methods for searching and indexing Web-based metadata, regardless of whether the corresponding resource is an electronic document or a real physical object."2 How does the DC Initiative accomplish this you might ask? Simply put, DC metadata "provides card catalog-like definitions for defining the properties of objects for Web-based resource discovery systems."2 These definitions make use of the Dublin Core Metadata Element Set, otherwise known as Simple Dublin Core, which is further complemented by some additionally qualified Elements, which are commonly referred to as Qualified Dublin Core. That being said, what is the difference between Simple Dublin Core and Qualified Dublin Core? According to the Dublin Core website, "'Simple Dublin Core' is Dublin Core metadata that uses no qualifiers; only the main 15 elements of the Dublin Core Metadata Element Set are expressed as simple attribute-value pairs without any 'qualifiers' (such as encoding schemes, enumerated lists of values, or other processing clues) to provide more detailed information about a resource."2 'Qualified Dublin Core', on the other hand, "employs additional qualifiers to further refine the meaning of a resource. One use for such qualifiers are to indicate if a metadata value is a compound or structured value, rather than just a string."2
Keeping these two definitions from the Dublin Core website in mind, I've summarized the core Dublin Core metadata elements below:
DC Core Elements:
- Title
- typically describes the name by which the resource being referred to is formally known as (i.e., the title of a novel, the title of a musical composition, the title of an article, or perhaps the title of a recipe)
- Subject
- typically refers to the topic of the content of the resource in question. Commonly, the subject will be expressed in the form of keywords, keyword phrases, or possibly classification codes that accurately describe the topic of the specific resource.
- Description
- the description is typically an account, a synopsis, or a snapshot of the content of a particular resource. According to the Dublin Core description of this element, a "Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content."3
- Type
- the type of a resource typically refers to the nature, intent, or genre of the content of the specific resource being considered. According to the Dublin Core description of this element, "Type includes terms for describing general categories, functions, genres, or aggregation levels for content."3
- examples of approved types from the DCMIType vocabulary include:
- Type="Collection"
- described as an aggregation of resources
- Type="Dataset"
- described as data encoded in a defined structure, such as lists, tables, and databases.
- Type="Event"
- described as a non-persistent, time-based occurrence (possibly used to describe the purpose, location, duration, and responsible agents associated with an event
- Type="Image"
- described as a visual representation excluding text (unless the text itself has been graphically rendered).
- Type="InteractiveResource"
- described as any type of resource that requires some type of interaction from the user.
- Type="MovingImage"
- described as a visual representation which gives the viewer the impression of motion when shown in succession (i.e., animations, movies, television programs, videos, etc)
- Type="PhysicalObject"
- described as an inanimate, 3-dimensional object or substance
- Type="Service"
- described as some type of system that provides one or more functions to the user of the system.
- Type="Software"
- described as a computer program either in source form or compiled form
- Type="Sound"
- described as an audio resource intended to be heard or listened to
- Type="StillImage"
- described as a static visual representation usually taking the form of a picture, painting, drawing, graphic design, map, etc.
- Type="Text"
- described as a resource consisting primarily of words or strings of characters intended for reading or indexing.
- Source
- this attribute is generally used to reference a resource from which the current resource is derived
- Relation
- this attribute typically contains a reference to a related resource (i.e. a formal bibliographic citation)
- Coverage
- described as "the extent or scope of the content of the resource,"3 which may take the form of a spatial location, temporal period, or jurisdiction
- Creator
- the individual or entity responsible for making or creating the specified resource
- Publisher
- the individual or entity responsible for making the specified resource available to the targeted audience
- Contributor
- the individual, individuals, or entities responsible for making contributions to the content of the specified resource
- Rights
- this attribute relates to Rights Management as it pertains to the rights held in and over the specified resource.
- Date
- refers to the date associated with some event in the life-cycle of the specified resource (i.e., creation date, modification date, publish date)
- Format
- typically refers to the media-type (i.e., video, audio, print) or the dimensions of the resource
- Identifier
- generally referring to some type of unique string or numeric value conforming to a formal identification system (i.e., a unique identifier)
- Language
- referring to the language with which the content of the resource has been created with
Qualified Dublin Core elements:
- Audience
- referring to the class of entity for whom the specific resource is primarily intended for
- Provenance
- according to Dublin Core, this attribute refers to "A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation."3
- RightsHolder
- refers to the person, organization, or other entity who claims ownership or managing rights over the specific resource.
Now that we have summarized all of the core and qualified Dublin Core metadata elements, let's take a brief look at how we might use these metadata elements to capture and encapsulate some metadata for a resource. As an example, let's assume that we've created a website dedicated to providing discographic information about the rock band Led Zeppelin. If we wanted to use Dublin Core to provide additional metadata about each of the songs, this is one way we might approach this task:
<?xml version="1.0" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description about="http://purl.org/DC/documents/notes-cox-816.htm">
<dc:title>Stairway To Heaven</dc:title>
<dc:description>Stairway to Heaven is a popular rock song by the English rock group Led Zeppelin, composed by guitarist Jimmy Page and vocalist Robert Plant, and recorded on their fourth studio album, Led Zeppelin IV. It is the most requested and most played song on FM radio stations in the United States, despite never being released as a single there.</dc:description>
<dc:date>1971-11-08</dc:date>
<dc:creator>Robert Plant</dc:creator>
<dc:creator>Jimmy Page</dc:creator>
<dc:type>Sound</dc:type>
<dc:resource>Led Zeppelin IV [Resource is one song from the album Led Zeppelin IV]</dc:resource>
<dc:language>en</dc:language>
<dc:publisher>Atlantic Records</dc:publisher>
</rdf:Description>
</rdf:RDF>
As you can see from the above example, we've captured a great deal of supplementary information about a single entity or resource; namely, the song Stairway to Heaven. Next, let's assume that our website is now dedicated to providing additional metadata information about fictional books and book authors. If we wanted to use Dublin Core to provide additional metadata about a specific book entry, such as The Green Mile by Stephen King, this is once again a way we might approach this task:
<?xml version="1.0" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description about="http://purl.org/DC/documents/notes-cox-816.htm">
<dc:title>The Green Mile</dc:title>
<dc:description>"The story centers on John Coffey, an almost seven-foot black man who is convicted of raping and killing two small white girls. He is notable because of his size and also for his strange behavior; he is very quiet and prefers to keep to himself, he weeps almost constantly, and is afraid of the dark. Coffey is described as "knowing his own name and not much else", and lacks the ability to so much as tie a knot, yet he is convicted of luring the girls away from their home, disposing of the watchdog, carefully planning and using abilities he would otherwise not be expected to have. He's the calmest and mildest prisoner the wardens have ever seen, despite his hulking form. Besides John Coffey, there are two other prisoners on the cell block during the main period the book focuses on: Eduard Delacroix, a Cajun-French arsonist, rapist, and murderer who is cowardly and weak-minded, and William Wharton, a wild and dangerous multiple murderer, determined to make as much trouble as he can before he is executed. Looking back, Paul also describes his experiences with a Native American murderer named Arlen Bitterbuck, and 'The Prez', a former CEO who killed his father."4 </dc:description>
<dc:date>2000-10-01</dc:date>
<dc:creator>Stephen King</dc:creator>
<dc:type>Text</dc:type>
<dc:language>en</dc:language>
<dc:publisher>Scribner</dc:publisher>
</rdf:Description>
</rdf:RDF>
Stay tuned for more about Dublin Core, as we'll take a deeper look into where and how Dublin Core fits into our website architecture in next month's blog article.
Bibiographic references:
1 DCMI Metadata Initative - About; Retrieved August 29, 2007, from http://www.dublincore.org/about/
2 DCMI Metadata Initative - Frequently Asked Questions; Retrieved August 29, 2007, from http://www.dublincore.org/resources/faq/
3 DCMI Metadata Initative - User Guide, Chapter 4; Retrieved August 29, 2007, from http://www.dublincore.org/documents/usageguide/elements.shtml
4 Wikipedia - The Green Mile; Retrieved August 29, 2007, from http://en.wikipedia.org/wiki/The_Green_Mile_%28novel%29

