Search And Metadata
Posted 19 December 2010 - 03:21 PM
First, a definition: while popularly described as information about information it is perhaps more accurately structured information describing content context.
Given that latter definition it is obvious why SEs would - and long have - been utilising metadata. Perhaps the metadata most familiar to webdevs are the 'meta http-equiv=' and 'meta name=' tags found within the head of an HTML document. However, there are many more.
It can be interesting to see which metadata are likely considered and think how that knowledge might be helpful. For instance, for some years - long, long ago - the keywords meta tag was so helpful it was abused until largely dropped from SE consideration, although the original keyword stuffing advice lives on to confuse new comers.
This is not an area that I would get all excited about - an interesting challenge out on the fringes but most of us likely could better spend our time elsewhere. However, you might find it useful to review certain broadly accepted metadata groupings and consider if certain usages might remove ambiguity. And whether it might be best to remove or replace certain application created metadata, i.e. for privacy reasons.
* Dublin Core properties.
* Microsoft Office properties.
* Adobe eXtensible Metadata Platform (XMP).
* Exchangeable image file format (Exif)
* International Press Telecommunications Council (IPTC) NewsCodes
As SEs move into maps and geographical search Digital Geospatial Metadata might be considered a useful read.
~(_°>· (best ASCII avatar in webdev :thumbs: ) suggestion (nose wiggle or ear twitch?) to consider that the metadata utilised by Google's CustomSearch and SiteSearch is likely used to some degree within Google Search is quite logical. If you agree or care read:
* Enabling Rich Snippets in Custom Search
* Structured Custom Search
* More power to metadata
Note: read carefully, there are some interesting inferences worth further research.
Posted 21 December 2010 - 04:16 AM
Where should the line be drawn that distinguishes between meta data and other extra-document data?
Posted 21 December 2010 - 04:40 AM
When is other metadata useful for a content site? Is it ever? I read one of those recent articles about new metadata about a week ago, and to be honest, I failed to grasp what they were going on about.
Posted 21 December 2010 - 05:54 AM
There are many types (and needs) of meta data. Let's take some examples:
1. There is meta data about the document. This usually includes when you retrieved the document, what response you got, how big the download was, etc.
2. There is meta data about the document's content. Are we dealing with video or text? Is this UTF-8 encoded or what? Who authored the document? When? How do I get in touch with the author? What is the title of this work? Can I get a summary/description? Where can I go to get the latest version? Heck, what version am I dealing with anyway?
The short summary is: meta data can be (and sadly is) everywhere. A trap many people fall into is that they generate too much meta data that is not actually useful. By "useful" I mean of use to the perceived end user: a librarian certainly has a need for meta data but that meta data is not what a library user would necessarily want/use.
So what is meta data useful for?
A. Meta data helps in searching and browsing. We've had advanced search operators for ages and they're all meta data based. When you do a title: query, that's meta data being useful. That site: operator, ditto.
B. For any collection of documents, you will need meta data to manage it. I talked about this above and it is a very important point. You're probably already using meta data without you calling it that: we call it information architecture. When you go to DMOZ and browse (anyone still do that?), that's category meta data in action too. It's exactly the same when you use a Wordpress category for a page. You're, fundamentally, assigning a structure to your website's contents, and we all agree that's an important part of SEO and usability.
No doubt response headers are a type of meta data. At the very least, they help crawlers deal with the content: a redirect needs to be followed, gzip compressed content needs to be uncompressed, encoding handled correctly, etc.
All this feeds into your day-to-day managment and strategy: if you see that one source is not updating as regularly, you can slow down; if you see that your average download size is on the rise, your storage purchasing plan will need tweaking. And if you're seeing errors, this kind of meta data logging will help you and the source debug the problem.
Microformats (and other formats like RDFa, etc) are a way to communicate meta data. To use one mircoformat example that Google supports (see this), you can mark up content as a review of a product or service. There are two types of review data you can communicate: the data about any individual review and the data in aggregate. You can immediately think of the basics: what's the rating (out of 5), what was the text of the review, when was the item reviewed, and what item is under review. What the microformats standard does is that it asks you use slightly different markup in your pages to communicate which content is what.
And this is the salient point: it's about semantics. The whole point of meta data is to add another dimension your content to communicate its meaning, with this communication being useful to someone.
So that's the overview. The question is: what are we gonna do with it? For me, the fun is in mashing up meta data from multiple sources. I'll leave that for another day.
Edited by eKstreme, 21 December 2010 - 05:55 AM.
Reply to this topic
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users