![]() ![]() |
Previous Moderator/Hall of Fame![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Aug 18 2005, 11:16 PM |
|
|
Assigning Geographic Locations to Web Pages
Lars Eilstrup Rasmussen is from Google's Sydney office, and he was a lead engineer on the team that created Google Maps. He and his brother, Jens Eilstrup Rasmussen, founded mapping startup, Where 2 Technologies, which was acquired by Google in October of 2004. Together they put together a patent application which describes a process to assign geographical information to web pages. It was published earlier today. If you find the following patent application interesting, you may also enjoy this one: System for automatically integrating a digital map system United States Patent Application 20050182770 August 18, 2005 Assigning geographic location identifiers to web pages QUOTE Abstract A system and method for assigning geographic location identifiers to web documents may include identifying a set of web documents. A geographic location identifier included within a first web document in the set of web documents may be identified. The identified geographic location identifier may be assigned to a second web document in the set of web documents based on a relevancy of the first web document to the second web document. -------------------------------------------------------------------------------- Inventors: Lars Eilstrup Rasmussen, and Jens Eilstrup Rasmussen The patent: 1. identifies a number of web pages; 2. looks for location information within those pages 3. assigns locations to pages which include geographic information 4. assigns locations to pages "relevant" to those pages that include geographical information. Reasons for the patent: Keyword-based search engines failed to geographically define web pages when trying to use: 1. Search engine manual assignment of locations to pages 2. Site owner manual assignment of locations to pages 3. Use of geographic meta tags 4. Search engines assignment of location when looking at postal addresses appearing on the same pages as the keywords. Assignment of geographic location identifiers "Geographic location identifiers" on web pages can be assigned to other pages which might or might not include geographic identifiers, after relevancy criteria is looked at, allowing pages without location information to be included in a geography based search. Those relevancy factors may include: 1. relative distance between documents, 2. the terminology used, and 3. Whether the page is on the same site. A geographic location identifier may be: 1. a partial or complete postal address, 2. telephone number, 3. area code, 4. airport codes 5. landmark identifiers 4. other values tied to physical locations, such as longitude and latitude. 5. or based upon hyperlinks between pages without geo information that seem related to these pages which do have location information. Other documents, such as directories may be useful in associating location identifiers. Pattern matching may be used to associate documents examining text that matches standard formats for addresses and other information that tends to describe location. Standardization Those location identifiers may then be standardized into a common, predefined format Example: addresses without zip codes may have the appropriate zip code added. Example 2: Misspellings and other possible errors that can be identified may be corrected. These standardized formats may include a number of categories, such as: 1. street number, 2. street name, 3. street type, 4. city, 5. state, 6. county, 7. country, 8. zip code, 9. etc. How assignment works After standardizing (data correction and supplementation and other standardization methods), the location identifier may be assigned to pages on which the information appears. A identifier may be associated with unassigned documents or which already have an identifier or a different one (some pages may be associated with more than one location). That assignment may be made by assigning each page with a location associated with a page linked, either directly or indirectly (through a predetermined number of links), to the document. Once an association has been made, the identifiers could be used in finding other associated pages or in ranking search results. Or search results which include the pages may show the assigned location to users. Associations and disassociations of locations can happen as a collection of documents is reviewed. The first assumption is that if a page has location information on it, it is associated with that location. The process may begin by identifying, for each page, other pages that include a geographic location identifier and are "relevant" to that page from a geographic identification standpoint. Defining relevant documents "Relevant" documents" may be defined as relevant where 1 The pages are on the same web site, and 2 the anchor text appearing on the page with location information leading to the other page contains one or more terms from a small rule-based set of terms. Those "relevant" terms may include, for example: 1. location(s), 2. direction(s), 3. find, 4. finder, 5. locate, 6. locater, 7. store(s), 8. branch(es), 9. about, 10. company, 11. contact, 12. information, 13. etc. A document could also be considered relevant if the anchor text to it includes a complete or partial postal address. For images or other non-text anchors, a linked page may be relevant if the URL in the link includes either a complete or partial postal address or one of the above "relevant" terms. A page could be considered relevant by examining the contents of the page directly. A link failing the above tests may be considered "relevant" if the HTML title of the target document includes any of the "relevant" terms, or a complete or partial postal address. These types of titles would probably be included in the first pass through of all the collected documents. Other rules may be used to determine if the target document makes a hyperlink "relevant". Looking at distance After a relevant page has been identified, The number of links away from the page with the location on is is looked at. One version of the invention looks for a range of 2 - 5 links. If the distance is further, the next relevant document is reviewed. If that one is within the right number of links, it may be associated with the initial document with location information. That process continues until all relevant documents are reviewed. Forward links and in-bound links That describes the process of pages linked from the page with location information on it. The same process happens with pages that link to (backlinks) the page with the geographical identifying information. A potential addition: Relevant links and link distances are calculated for documents which don't contain the geographical location information. Each of those pages collects a measure of relevance based upon those distances, and that measure is added together for all neighboring documents that may contain geographical information. So, if a page is linked from or to by a number of pages that use relevant anchor text or URLs, it may be determined to be more relevant for that geographical information on the other pages. As mentioned above, more than one location can be associated with a document. The link above to the patent application describing Google Maps is a lot more readable after working through this patent application first. Both share a few concepts, and the Maps application includes more details on geographical location identifiers. |
||
| Offline | ![]() |
Previous Moderator/Hall of Fame![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Aug 19 2005, 09:25 AM |
|
|
As I was reading this patent application, I was asking myself why would such an effort go towards trying to better understand the location of a site and the pages on it when there was such a push towards developing local search from the major search engines.
The thought of "invisible tabs" struck me. The idea that people don't like to switch from one type of search to another, and ignore some of the different types of searches that they could do at a Google or Yahoo!. But, was there something more? Because it is possible that people could become use to using a Google Local search, and come to love those tabs. Is there a problem with where the information from local search is being collected? Maybe. I found this paper on geo targeting which offered an idea or two on the subject: Design and Implementation of a Geographic Search Engine (pdf) One of them is that local search tends to heavily favor commercial interests and commercial usages. So, if your interest is in finding a local chess club, or museum or park, or even a small business, those local search options in Google or Yahoo! might not be as helpful. In the paper, we are told that: QUOTE Business directories (yellow pages) map businesses and associated web sites to addresses, and thus to geographic positions. Some geographic search engines such as those of Google and Yahoo [15, 24] appear to make heavy use of business directories. The main problem with business directories is also their biggest strength. They require registration fees, and thus usually list mainly commercial companies, ignoring many personal or non-profit web sites. The fees however also often result in higher data quality. Another issue that the patent application considers, and notes is one of the failings of search engines that collect geographical information, is that while location may be updated on a web site, it tends to be less frequently and correctly updated on other sites that may contain that location information such as online yellow pages and regional and topical directories and portals. QUOTE Web directories such as Yahoo and ODP maintain geographic directories that categorize sites by region. They are difficult to maintain, far from complete, and often outdated. However, they can be useful as an additional data source in geo coding. The paper notes in its conclusion that: QUOTE Beyond this, there are many exciting open problems for future research in this area. On the most general level, many aspects of Web search and information retrieval, such as ranking functions, categorization, link analysis, crawling strategies, query processing, and interfaces, need to be reevaluated and adapted for the purpose of geographic search. The patent application from Google seems to address some of those issues. |
||
| Offline | ![]() |
Previous Moderator/Hall of Fame![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Aug 28 2005, 07:24 PM |
|
|
A somewhat different approach to the subject of location and queries, as noted by Xan of Search Science, in a Search Engine Roundtable comment about this post:
Detecting Dominant Locations from Search Queries What kind of problems do queries like “denzel washington” and “kentucky fried chicken”present? Those are two of the problems cited in the paper. Here's what we are told that it describes: QUOTE ● A formal definition of query’s dominant location (QDL), and discussions on why it is important to search relevance. We also stated the differences and relationship between QDL and queries’ local search intention.
● A novel solution that detects QDLs from queries both with and without location keywords using a combination of data sources as necessary. Our solution effectively suppresses false positives and false negatives. ● A classification system that categorizes search queries into four distinctive types by presence of location keywords and QDL. We labeled a large number of MSN Search queries covering all query frequencies, and studied query distributions by our types in different frequency ranges. ● A large-scale evaluation of our QDL solution using these labeled queries. For performance, we report the precision, recall, Micro-F1, and error rates of our QDL detection across all queries as well as for different query frequency ranges and different query types. We also report the computational time cost for each of the test we ran. Our results show that our QDL detection performs consistently over all query frequency ranges and outperforms a dictionary look-up method and Google. |
||
| Offline | ![]() |
Previous Moderator/Hall of Fame![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Dec 8 2005, 11:09 PM |
|
|
Three more patent applications were published today from Google describing how Google maps may work, with a little insight in one of them possibly on how Google Local information integrates with Google maps.
These first two look more closely at the technology and methodology behind the mapping: Generating, storing, and displaying graphics using sub-pixel bitmaps Generating and serving tiles in a digital mapping system The third one looks more at the bigger picture: Digital mapping system Most of it deals with the technology behind mapping, overlays, quality printing of maps, and other algorithms that make mapping work well. The application does also include a little information on how the map might work for these types of queries: Location queries - for instance, for a particular city Local search queries - queries containing a business name, or category, or other set of search terms, but not including geographic locations. Qualified local search queries - search terms and geographical locations are included. Driving directions queries - two geographical locations are included in the search. The application explains, amongst some other things, why, if you search for "pizza in palo alto," one Palo Alto pizza parlor will not be scored differently than another because the one is closer to the center of Palo Alto than the other. I suspect that we will see more from Lars and Jens Rasmussen on maps and geographical and local information in the near future. If I see some more, I will try to add it here. |
||
| Offline | ![]() |
Previous Moderator/Hall of Fame![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Dec 31 2005, 04:13 PM |
|
|
Another addition to this series of patent applications, this one is a little more exciting than some of the past few for a number of reasons:
1. It includes the idea of using actual landmarks in driving directions. 2. It incorporates Google Earth images, and pictures of places along the route. 3. It enables people who use the service to provide feedback via forms and GPS. 4. It allows for advertising as one of the "waypoints" along a journey. See: Visually-oriented driving directions in digital mapping system Inventors: Andrew R. Golding and Jens Eilstrup Rasmussen United States Patent Application 20050288859 Published December 29, 2005 Filed: July 13, 2005 This is one patent application that I hope to see implemented. Landmarks in driving directions would be great. |
||
| Offline | ![]() |
![]()
|
|
| Lo-Fi Version | Time is now: 5th September 2010 - 08:34 PM |
| Meet our Moderators: | cre8pc : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |