![]() ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Aug 14 2005, 10:41 PM |
|
|
Hi norbiu,
I can give you an idea or two about Google's News algorithms based upon what has come out in interviews, and other sources about Google News: From a rediff.com article, Google wants to be part of journalism's future, from October 18, 2002: QUOTE At the heart of his programme is a clustering algorithm, which functions like a librarian or clipping service, by searching out, matching and collecting articles based on one's reading interest.
Loosely explained, a clustering algorithm is a mathematical procedure that finds similarities between elements and groups them. It examines articles from different sources, analysing factors such as an article's information, page rank, and timeliness. ... What is unique about the programme is that it scans the full text of the articles, rather than just headlines, allowing it to analyse and group stories according to the complete content. Online Journalism Review has a September 25, 2003 interview with Google News Inventor Krishna Bharat, which discusses some of the different challenges in developing the service. We learn a little more there: QUOTE KB: There's a whole field of study called \"information retrieval,\" which deals with text analysis -- trying to find which documents match the query, which documents match other documents. So I drew on a lot of technical work that I knew of in order to make this happen. I had to bring in a lot of intuition specific to the news domain to try to bring in diverse articles. ... In general, we use a number of techniques. One of them is the fact that the newspaper is local, but we can't overemphasize that. We throw all of these different criteria in the mix. ... On the home page, we make a conscious effort not to duplicate. On the results page, you're supposed to get the non-duplicates first and the duplicates afterwards. On the home page, we try to be a little bit more picky. A few more insights are revealed at an AI conference that looks at the Artificial Intelligence built into Google news with Google Senior Research Scientist Mehran Sahami: QUOTE \"AI applications are using the infrastructure to get people useful information in interesting ways,\" according to Sahami. For example, \"Google News is automatically generated from 4,500 news sources each 15 minutes, using several AI techniques (such as clustering, automatic image extraction, and autonomous categorization by topic areas)\" says Sahami. \"There is no human intervention.\" Google News is an example of where AI is making a huge difference. It's used several million times a day.\" A July, 2003, interview with Krishna Bharat provides a few more ideas on how the news search works: QUOTE Q: Are all news sources treated the same, or are some given higher priority based on PageRank™ or other criteria? If so, what are the criteria?
A: As with Google WebSearch, Google News employs many different metrics for determining the relative importance of web pages. PageRank is one of these factors, but the exact mix of determinants is part of our secret sauce and not something we're able to discuss in detail. We can say that Google News also integrates other attributes, such as the recency of the content, to help determine which stories get the most prominence. Q: According to the Google News website, \"information [is] automatically arranged to present the most relevant news first.\" How is relevance determined? A: As with Google's web ranking, relevance is determined by information retrieval techniques that look at the distribution of words in the article and surrounding pages on the web. If the article matches the query well it is deemed relevant and gets a high score. Other factors include the importance of the source, timeliness of the article, and importance of the news story, relative to other stories in the news currently There are a couple of patent applications that have been identified as being from Google, which cover topics dealing with the aggregation of news sources and a news search. Systems and methods for improving the ranking of news articles In this patent application, some determination of which results to show first may be based upon the "quality" of a source. So, how is this "quality" determined? Here's a rough breakdown, as provided by the patent application. Note that these may or may not be in use. We just don't know. QUOTE One or more metric values based at least in part on at least one of a number of articles produced by the source during a first time period, [list]an average length of an article produced by the source, [*]an amount of important coverage that the source produces in a second time period, [*]a breaking news score, [*]an amount of network traffic to the source, [*]a human opinion of the source, [*]circulation statistics of the source, [*]a size of a staff associated with the source, [*]a number of bureaus associated with the source, [*]a number of original named entities in a group of articles associated with the source, [*]a breadth of coverage by the source, [*]a number of different countries from which network traffic to the source originates, and [*]a writing style used by the source;[list] and determining a quality value for each source of the plurality of sources based at least in part on the determined one or more metric values for the source. There's also a personalization element to Google News, as described in this patent application: Systems and methods for personalizing aggregated news content The "customize this page" feature now in the Google News Search goes a long way towards implementing this patent application, but doesn't include some of the elements described, such as these: QUOTE Additionally, the user may indicate that a certain kind of news source (e.g., New York Times, sources in USA, etc.) may be preferred or not preferred. Also, the user may provide general keywords that are of interest to the user (e.g., San Francisco) and stories with these keywords should be boosted. Further, the user can list journalists they like or do not like or genres they like or do not like (e.g., opinion/commentary vs. breaking news vs. briefs vs. full coverage). I'm not sure if any of those give you an idea of what to do to make your articles more prominent in Google News searches, but they might. |
||
| Offline | ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Aug 14 2005, 11:09 PM |
|
|
I think that you get it now.
QUOTE(projectphp) Despite what anyone tells you, Google put a lot of editorial control on News.google.com, and my guess would be that, unless you are triple approved by everyone including Sergey, you will have buckley's of getting on those pages. I'll buy that. We see in the first patent application this line: QUOTE a human opinion of the source, And Krishna Bharat also notes in one of those snippets that I quoted above: QUOTE On the home page, we try to be a little bit more picky. This study is probably worth mentioning (as described by Search Engine Watch): Google News Study Finds Bias But Not Favoritism -- But Study Also Has Flaws |
||
| Offline | ![]() |
![]()
|
|
| Lo-Fi Version | Time is now: 9th February 2010 - 06:47 PM |
| Meet our Moderators: | cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |