Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Assigning Geographic Locations to Web Pages


  • Please log in to reply
7 replies to this topic

#1 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 18 August 2005 - 11:16 PM

Assigning Geographic Locations to Web Pages

Lars Eilstrup Rasmussen is from Google's Sydney office, and he was a lead engineer on the team that created Google Maps.

He and his brother, Jens Eilstrup Rasmussen, founded mapping startup, Where 2 Technologies, which was acquired by Google in October of 2004.

Together they put together a patent application which describes a process to assign geographical information to web pages. It was published earlier today.

If you find the following patent application interesting, you may also enjoy this one: System for automatically integrating a digital map system


United States Patent Application 20050182770
August 18, 2005

Assigning geographic location identifiers to web pages


Abstract

A system and method for assigning geographic location identifiers to web documents may include identifying a set of web documents. A geographic location identifier included within a first web document in the set of web documents may be identified. The identified geographic location identifier may be assigned to a second web document in the set of web documents based on a relevancy of the first web document to the second web document.



--------------------------------------------------------------------------------
Inventors: Lars Eilstrup Rasmussen, and Jens Eilstrup Rasmussen



The patent:

1. identifies a number of web pages;
2. looks for location information within those pages
3. assigns locations to pages which include geographic information
4. assigns locations to pages "relevant" to those pages that include geographical information.


Reasons for the patent:

Keyword-based search engines failed to geographically define web pages when trying to use:

1. Search engine manual assignment of locations to pages
2. Site owner manual assignment of locations to pages
3. Use of geographic meta tags
4. Search engines assignment of location when looking at postal addresses appearing on the same pages as the keywords.


Assignment of geographic location identifiers

"Geographic location identifiers" on web pages can be assigned to other pages which might or might not include geographic identifiers, after relevancy criteria is looked at, allowing pages without location information to be included in a geography based search. Those relevancy factors may include:

1. relative distance between documents,
2. the terminology used, and
3. Whether the page is on the same site.


A geographic location identifier may be:

1. a partial or complete postal address,
2. telephone number,
3. area code,
4. airport codes
5. landmark identifiers
4. other values tied to physical locations, such as longitude and latitude.
5. or based upon hyperlinks between pages without geo information that seem related to these pages which do have location information.


Other documents, such as directories may be useful in associating location identifiers.

Pattern matching may be used to associate documents examining text that matches standard formats for addresses and other information that tends to describe location.


Standardization

Those location identifiers may then be standardized into a common, predefined format

Example: addresses without zip codes may have the appropriate zip code added.

Example 2: Misspellings and other possible errors that can be identified may be corrected.

These standardized formats may include a number of categories, such as:

1. street number,
2. street name,
3. street type,
4. city,
5. state,
6. county,
7. country,
8. zip code,
9. etc.


How assignment works


After standardizing (data correction and supplementation and other standardization methods), the location identifier may be assigned to pages on which the information appears.

A identifier may be associated with unassigned documents or which already have an identifier or a different one (some pages may be associated with more than one location).

That assignment may be made by assigning each page with a location associated with a page linked, either directly or indirectly (through a predetermined number of links), to the document.

Once an association has been made, the identifiers could be used in finding other associated pages or in ranking search results.

Or search results which include the pages may show the assigned location to users.

Associations and disassociations of locations can happen as a collection of documents is reviewed.


The first assumption is that if a page has location information on it, it is associated with that location.

The process may begin by identifying, for each page, other pages that include a geographic location identifier and are "relevant" to that page from a geographic identification standpoint.


Defining relevant documents

"Relevant" documents" may be defined as relevant where

1 The pages are on the same web site, and
2 the anchor text appearing on the page with location information leading to the other page contains one or more terms from a small rule-based set of terms.

Those "relevant" terms may include, for example:

1. location(s),
2. direction(s),
3. find,
4. finder,
5. locate,
6. locater,
7. store(s),
8. branch(es),
9. about,
10. company,
11. contact,
12. information,
13. etc.

A document could also be considered relevant if the anchor text to it includes a complete or partial postal address.

For images or other non-text anchors, a linked page may be relevant if the URL in the link includes either a complete or partial postal address or one of the above "relevant" terms.

A page could be considered relevant by examining the contents of the page directly.

A link failing the above tests may be considered "relevant" if the HTML title of the target document includes any of the "relevant" terms, or a complete or partial postal address.

These types of titles would probably be included in the first pass through of all the collected documents. Other rules may be used to determine if the target document makes a hyperlink "relevant".


Looking at distance

After a relevant page has been identified, The number of links away from the page with the location on is is looked at. One version of the invention looks for a range of 2 - 5 links.

If the distance is further, the next relevant document is reviewed. If that one is within the right number of links, it may be associated with the initial document with location information.

That process continues until all relevant documents are reviewed.


Forward links and in-bound links

That describes the process of pages linked from the page with location information on it. The same process happens with pages that link to (backlinks) the page with the geographical identifying information.


A potential addition:

Relevant links and link distances are calculated for documents which don't contain the geographical location information. Each of those pages collects a measure of relevance based upon those distances, and that measure is added together for all neighboring documents that may contain geographical information. So, if a page is linked from or to by a number of pages that use relevant anchor text or URLs, it may be determined to be more relevant for that geographical information on the other pages.



As mentioned above, more than one location can be associated with a document.

The link above to the patent application describing Google Maps is a lot more readable after working through this patent application first. Both share a few concepts, and the Maps application includes more details on geographical location identifiers.

#2 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 19 August 2005 - 09:25 AM

As I was reading this patent application, I was asking myself why would such an effort go towards trying to better understand the location of a site and the pages on it when there was such a push towards developing local search from the major search engines.

The thought of "invisible tabs" struck me. The idea that people don't like to switch from one type of search to another, and ignore some of the different types of searches that they could do at a Google or Yahoo!. But, was there something more? Because it is possible that people could become use to using a Google Local search, and come to love those tabs. Is there a problem with where the information from local search is being collected? Maybe.

I found this paper on geo targeting which offered an idea or two on the subject:

Design and Implementation of a Geographic Search Engine (pdf)

One of them is that local search tends to heavily favor commercial interests and commercial usages. So, if your interest is in finding a local chess club, or museum or park, or even a small business, those local search options in Google or Yahoo! might not be as helpful. In the paper, we are told that:

Business directories (yellow pages) map businesses and associated web sites to addresses, and thus to geographic positions. Some  geographic search engines such as those of Google and Yahoo [15, 24] appear to make heavy use of business directories. The main problem with business directories is also their biggest strength. They require registration fees, and thus usually list mainly commercial companies, ignoring many personal or non-profit web sites. The fees however also often result in higher data quality.


Another issue that the patent application considers, and notes is one of the failings of search engines that collect geographical information, is that while location may be updated on a web site, it tends to be less frequently and correctly updated on other sites that may contain that location information such as online yellow pages and regional and topical directories and portals.

Web directories such as Yahoo and ODP maintain geographic  directories that categorize sites by region. They are difficult to maintain, far from complete, and often outdated. However, they can be useful as an additional data source in geo coding.


The paper notes in its conclusion that:

Beyond this, there are many exciting open problems for future  research in this area. On the most general level, many aspects of Web search and information retrieval, such as ranking functions, categorization, link analysis, crawling strategies, query processing, and interfaces, need to be reevaluated and adapted for the purpose of geographic search.


The patent application from Google seems to address some of those issues.

#3 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 28 August 2005 - 07:24 PM

A somewhat different approach to the subject of location and queries, as noted by Xan of Search Science, in a Search Engine Roundtable comment about this post:

Detecting Dominant Locations from Search Queries

What kind of problems do queries like “denzel washington” and “kentucky fried chicken”present? Those are two of the problems cited in the paper. Here's what we are told that it describes:

● A formal definition of query’s dominant location (QDL), and discussions on why it is important to search relevance. We also stated the differences and relationship between QDL and queries’ local search intention.

● A novel solution that detects QDLs from queries both with and without location keywords using a combination of data sources as necessary. Our solution effectively suppresses false positives and false negatives.

● A classification system that categorizes search queries into four distinctive types by presence of location keywords and QDL. We labeled a large number of MSN Search queries covering all query frequencies, and studied query distributions by our types in different frequency ranges.

● A large-scale evaluation of our QDL solution using these labeled queries. For performance, we report the precision, recall, Micro-F1, and error rates of our QDL detection across all queries as well as for different query frequency ranges and different query types. We also report the computational time cost for each of the test we ran. Our results show that our QDL detection performs consistently over all query frequency ranges and outperforms a dictionary look-up method and Google.



#4 earlpearl

earlpearl

    Hall of Fame

  • Hall Of Fame
  • 1597 posts

Posted 21 November 2005 - 09:50 PM

Wow! Did I say Wow! :) :)

What a post.

I primarily work in local search, primarily for my business which is a local business. I've been very attuned to local issues but I was unaware of that patent.

It explains a lot. My business covers 3 jurisdictions; 2 states and a major city. This year, we have been losing local visibility (serps) for my service in the jurisdictions where we aren't located. Its because of the location information. That explains a lot.

I've posted examples of this phenomena in various forums on various related topics but had not been referenced to this before.

Geez. I need 2 more addresses.

In Feb/March of this year, one aspect of the changes google establishes were a much more significant orientation and better serps for sites that do a reasonable job of defining their addresses/locations with all or some of the locational references descibed in the patent above.

Thank you so much for bringing this up. I haven't seen this anywhere else.

Dave

#5 earlpearl

earlpearl

    Hall of Fame

  • Hall Of Fame
  • 1597 posts

Posted 06 December 2005 - 04:44 PM

Bill:

Those are great posts. This is an issue that has largely been ignored. I believe I may have at least one example where the google algo relative to location is dramatically used; to be described below.

But before the example my experience with by my own and other websites that lend themselves to local search, plus the commentary of still other webmasters that follow this...is simply that usage of local search is dramatically low at this point.

In my own example, my business site is dramatically local in its flavor. It ranks high for generic industry terms but still higher for local terms.

Having looked at two months of recent data I totaled about 1600 searches from all engines with a relevant local geo phrase and a relevant industry phrase.

By comparison there were a total of 27 searches from Google, MSN, and Y local. Webmasters focused on local search report similar statistics or even lower percentages of use of local search.

To date local search is rarely used.

The example is interesting

The site is new as of June 2005. It has one external link, from my site with anchor text like this: Dog Walking services Pennsylvania Philadelphia New Jersey

The site has address information as per the google patent for a Pennsylvania address.

Obviously the phrase is obscure.

Current rankings as follows

Google: Dog Walking Services Pennsylvania #1 and 1 in allinanchor
Dog Walking Services New Jersey 228 and #1 in allinanchor
Dog Walking Services Philadelphia 201 with #1 allinanchor

Yahoo: The site ranks #1 for Dog Walking Services Pennsylvania and Dog Walking Services New Jersey and #3 for Dog Walking Services Philadelphia

MSN: The site is ranked first for all 3 phrases.


I've posed this question and situation elsewhere but this appears to focus on the google ranking phenomena.

Dave

#6 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 06 December 2005 - 08:44 PM

Hi Dave,

Thanks for your kind words.

In Feb/March of this year, one aspect of the changes google establishes were a much more significant orientation and better serps for sites that do a reasonable job of defining their addresses/locations with all or some of the locational references descibed in the patent above.


It's good to get some additional confirmation of the use of the ideas in this patent.

I've seen a few sites see some significant gains in traffic after suggesting that they add geographic location information to their pages.

With a site tied to a location, or a handful of locations, a multi-pronged approach may be helpful, too. One prong is to include geographical information in a manner like that described in the patent application above.

Another is to feature locations in sections of the site in a meaningful manner. A third is to make sure that you have other sites mention, and possibly point to your site, and include address information about it (regional directories, etc.)

Regardless of the patent, I think that there is often a benefit to including contact information on every page of a site that may be tied to a specific location. It can add a level of credibility to the page, by showing people that there is a real address connected to the site. But the patent describes some ways where it might be possible not to have that information on every page, and still have location tied to those pages.

When part of your objective is to get people to visit a physical location, and equate that location with your business, I think that there are a number of steps that you can take to make sure people make that association when they visit your site. Fortunately, those also follow many of the ideas in this patent application, too.

I'm not sure that a separate "local" tab on a search engine will be as effective for desktop computers as some might think they will be, but local search may be more suited for handhelds - smart phones, PDAs, tablet PCs, etc. There are some issues with the way search engines are populating information in their local search databases, as I mentioned in my second post. But combining efforts for local search with an increased attention to the way geographical location information is placed on a site can be helpful.

I do think that when place is so important to a business, it needs to be featured on the site in a number of ways to see success in a number of search engines. If you have one location, a cluster of pages about that location and what you do there can be helpful. If you have a few locations, clusters of information about each of them can aid you.

For instance, imagine that you have a site for a technical school with five different locations in five different states. The different campuses share a fair amount of information, such as admission policies, types of courses, costs, etc. They also have enough differences to merit individual sections for each campus.

Each of those might have a page or more that tours the campus, and the community around the buildings those are in. Each might have detailed directions pages telling how to get to each campus from different directions. Since students will either live on campus, or nearby, each cluster will tell them about the regional areas that each is in, including places to go when they aren't in class, and things to do. Regionally relevant places and events might be described on those pages (landmarks that are tied to the locations).

What other type of information about those locations might be helpful, or useful, or interesting to potential students? What else is going on in the community, and what role does the individual campus play in some of those events?

For the local search directories, instead of pointing to the front of a site which deals with more than one location, those directories can have multiple entries that point at the different clusters.

I think this three pronged attack - use geographic location information intelligently, feature locations in part of the site, and use regional directories to point to those featured sections can be a good way to move forward to make searches based upon location more effective.

#7 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 08 December 2005 - 11:09 PM

Three more patent applications were published today from Google describing how Google maps may work, with a little insight in one of them possibly on how Google Local information integrates with Google maps.

These first two look more closely at the technology and methodology behind the mapping:

Generating, storing, and displaying graphics using sub-pixel bitmaps

Generating and serving tiles in a digital mapping system

The third one looks more at the bigger picture:

Digital mapping system

Most of it deals with the technology behind mapping, overlays, quality printing of maps, and other algorithms that make mapping work well.

The application does also include a little information on how the map might work for these types of queries:

Location queries - for instance, for a particular city

Local search queries - queries containing a business name, or category, or other set of search terms, but not including geographic locations.

Qualified local search queries - search terms and geographical locations are included.

Driving directions queries - two geographical locations are included in the search.

The application explains, amongst some other things, why, if you search for "pizza in palo alto," one Palo Alto pizza parlor will not be scored differently than another because the one is closer to the center of Palo Alto than the other.

I suspect that we will see more from Lars and Jens Rasmussen on maps and geographical and local information in the near future. If I see some more, I will try to add it here.

#8 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 31 December 2005 - 04:13 PM

Another addition to this series of patent applications, this one is a little more exciting than some of the past few for a number of reasons:



1. It includes the idea of using actual landmarks in driving directions.
2. It incorporates Google Earth images, and pictures of places along the route.
3. It enables people who use the service to provide feedback via forms and GPS.
4. It allows for advertising as one of the "waypoints" along a journey.

See:

Visually-oriented driving directions in digital mapping system

Inventors: Andrew R. Golding and Jens Eilstrup Rasmussen
United States Patent Application 20050288859
Published December 29, 2005
Filed: July 13, 2005

This is one patent application that I hope to see implemented. Landmarks in driving directions would be great.



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users