Reply to this topicStart new topic
> Tips on getting on the Google News homepage

Member

Group: Members
Joined: 25-April 05
Posts: 29
post Aug 14 2005, 07:41 PM
Does anybody know how the algorithms work ? Google News FAQ wasn't helpful. All of my articles are getting in there, but none on the Home page or the Sci/Tech page.

Thanks !
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post Aug 14 2005, 10:26 PM
Your home page shouldn't be in News. No home page should.

Google news is a service that lists news stories. Your home page isn't a one story page, but a set of links to one page stories. They don't want lists of stories, they want the actual stories listed.

IMHO, that is just the way it works, and you should just move on.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Aug 14 2005, 10:41 PM
Hi norbiu,

I can give you an idea or two about Google's News algorithms based upon what has come out in interviews, and other sources about Google News:

From a rediff.com article, Google wants to be part of journalism's future, from October 18, 2002:

QUOTE
At the heart of his programme is a clustering algorithm, which functions like a librarian or clipping service, by searching out, matching and collecting articles based on one's reading interest.

Loosely explained, a clustering algorithm is a mathematical procedure that finds similarities between elements and groups them. It examines articles from different sources, analysing factors such as an article's information, page rank, and timeliness.  

...

What is unique about the programme is that it scans the full text of the articles, rather than just headlines, allowing it to analyse and group stories according to the complete content.


Online Journalism Review has a September 25, 2003 interview with Google News Inventor Krishna Bharat, which discusses some of the different challenges in developing the service. We learn a little more there:

QUOTE

KB: There's a whole field of study called \"information retrieval,\" which deals with text analysis -- trying to find which documents match the query, which documents match other documents. So I drew on a lot of technical work that I knew of in order to make this happen. I had to bring in a lot of intuition specific to the news domain to try to bring in diverse articles.

...

In general, we use a number of techniques. One of them is the fact that the newspaper is local, but we can't overemphasize that. We throw all of these different criteria in the mix.

...

On the home page, we make a conscious effort not to duplicate. On the results page, you're supposed to get the non-duplicates first and the duplicates afterwards. On the home page, we try to be a little bit more picky.


A few more insights are revealed at an AI conference that looks at the Artificial Intelligence built into Google news with Google Senior Research Scientist Mehran Sahami:

QUOTE
\"AI applications are using the infrastructure to get people useful information in interesting ways,\" according to Sahami. For example, \"Google News is automatically generated from 4,500 news sources each 15 minutes, using several AI techniques (such as clustering, automatic image extraction, and autonomous categorization by topic areas)\" says Sahami. \"There is no human intervention.\" Google News is an example of where AI is making a huge difference. It's used several million times a day.\"



A July, 2003, interview with Krishna Bharat provides a few more ideas on how the news search works:

QUOTE
Q: Are all news sources treated the same, or are some given higher priority based on PageRank™ or other criteria? If so, what are the criteria?

A: As with Google WebSearch, Google News employs many different metrics for determining the relative importance of web pages. PageRank is one of these factors, but the exact mix of determinants is part of our secret sauce and not something we're able to discuss in detail. We can say that Google News also integrates other attributes, such as the recency of the content, to help determine which stories get the most prominence.

Q: According to the Google News website, \"information [is] automatically arranged to present the most relevant news first.\" How is relevance determined?

A: As with Google's web ranking, relevance is determined by information retrieval techniques that look at the distribution of words in the article and surrounding pages on the web. If the article matches the query well it is deemed relevant and gets a high score. Other factors include the importance of the source, timeliness of the article, and importance of the news story, relative to other stories in the news currently


There are a couple of patent applications that have been identified as being from Google, which cover topics dealing with the aggregation of news sources and a news search.

Systems and methods for improving the ranking of news articles

In this patent application, some determination of which results to show first may be based upon the "quality" of a source. So, how is this "quality" determined?

Here's a rough breakdown, as provided by the patent application. Note that these may or may not be in use. We just don't know.

QUOTE
One or more metric values based at least in part on at least one of a number of articles produced by the source during a first time period,  

[list]an average length of an article produced by the source,  

[*]an amount of important coverage that the source produces in a second time period,  

[*]a breaking news score,  

[*]an amount of network traffic to the source,  

[*]a human opinion of the source,  

[*]circulation statistics of the source,  

[*]a size of a staff associated with the source,

[*]a number of bureaus associated with the source,  

[*]a number of original named entities in a group of articles associated with the source,  

[*]a breadth of coverage by the source,  

[*]a number of different countries from which network traffic to the source originates, and  

[*]a writing style used by the source;[list]
and determining a quality value for each source of the plurality of sources based at least in part on the determined one or more metric values for the source.


There's also a personalization element to Google News, as described in this patent application:

Systems and methods for personalizing aggregated news content

The "customize this page" feature now in the Google News Search goes a long way towards implementing this patent application, but doesn't include some of the elements described, such as these:

QUOTE
Additionally, the user may indicate that a certain kind of news source (e.g., New York Times, sources in USA, etc.) may be preferred or not preferred. Also, the user may provide general keywords that are of interest to the user (e.g., San Francisco) and stories with these keywords should be boosted. Further, the user can list journalists they like or do not like or genres they like or do not like (e.g., opinion/commentary vs. breaking news vs. briefs vs. full coverage).


I'm not sure if any of those give you an idea of what to do to make your articles more prominent in Google News searches, but they might.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Aug 14 2005, 10:43 PM
QUOTE
Your home page isn't a one story page, but a set of links to one page stories. They don't want lists of stories, they want the actual stories listed.


I think that you misunderstood. I believe norbiu meant that none of the articles were being place on the home page of Google News, or the front page of the Science/Tech News. smile.gif
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post Aug 14 2005, 10:52 PM
Hehehe, maybe / probably I misunderstood! In fact, not sure I get it now wink-2.gif

In that case, that is even less likely to happen.

To be on the home page / sci/tech page, a site needs to be extra super special. Think ofa regular newspaper. Big stories front page, feel good stories buried in there somewhere. The News home page is reserved for CNN, NYTimes, WSJ etc. The science /Tech page may allow slashdot and Register in, but that is probably about as "wild" as the sources get.

Despite what anyone tells you, Google put a lot of editorial control on News.google.com, and my guess would be that, unless you are triple approved by everyone including Sergey, you will have buckley's of getting on those pages.

If you are super mega ultra lucky, if you run a story that is breaking elsewhere, like the OSX for PC stuff, you may get to be under a "big name" story, but I would guess that is as lucky as most sites will ever get.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Aug 14 2005, 11:09 PM
I think that you get it now. smile.gif

QUOTE(projectphp)
Despite what anyone tells you, Google put a lot of editorial control on News.google.com, and my guess would be that, unless you are triple approved by everyone including Sergey, you will have buckley's of getting on those pages.


I'll buy that.

We see in the first patent application this line:

QUOTE
a human opinion of the source,


And Krishna Bharat also notes in one of those snippets that I quoted above:

QUOTE
On the home page, we try to be a little bit more picky.


This study is probably worth mentioning (as described by Search Engine Watch):
Google News Study Finds Bias But Not Favoritism -- But Study Also Has Flaws
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 7-November 02
Posts: 6,179
From: New England, USA
post Aug 15 2005, 06:27 AM
It really comes down to timing, more than anything else. Having a picture is a requisite, too. Even so, you probably are getting onto the front page (if the story in question, itself, appears on the front page, that is).

Basically, the stuff that is on the front page is determined by the "hotness" of the story. The more places that have the story, the hotter it is. Then, it'll usually tend to show the site that "broke the story" (first source of it), an authority site or two (i.e. Mainstream news source, etc). Everything else is pretty much time sensitive - newer is on the page, and older moves down into the rest of the list. So, the trick, really, is to get your story posted at the time where the most people are going to see it.

I've always had best luck posting my movie reviews at about 6:30-7:00am ET. This will get the story picked up and, if it is, for example, a movie opening that weekend or a DVD coming out that Tuesday, will ensure that the story is on the front page. Then my version of the story will appear in the list on the front page for a half hour or so at a time of day when people tend to be sitting down with a cup of coffee and reading some news. If you go earlier in the morning than that, it tends to not last as long because all the news sites are getting their stories online before 6am ET when the real "morning online news hour" begins. Go much later and you might actually stay up there longer, but you're not reaching as large an audience anyway.

Your mileage may vary, but it's really just a matter of experimenting with the timing of when you get your story posted.

SECRET SUPER TIP #112: I can't prove it, but if I just post a story on a blank page on a site that is normally crawled by Google News, I never know when it'll get crawled. If I have adwords on the page, it'll get crawled and picked up within minutes of when I first view the article. I guess that makes sense since Google normally has no way of knowing when a new article goes up, so it just crawls when it has time, but with Adwords, it triggers something in that DB that says, "Hey, that's a new page!" (or article, really). Then it tips something over at the news site and it says, "Yep, we don't have that one, yet - let's go get it!"

G.
Offline Go to the top of the page

Member

Group: Members
Joined: 25-April 05
Posts: 29
post Aug 15 2005, 06:46 AM
Thanks bragadocchio, I'll start reading right away

I got on the homepage twice ... a few month ago, I have no idea what I did but I got there, the news about Nokia N91 didn't have more than 80 words in it.

I even made a couple of screenshots biggrin.gif I knew this won't happen again.

user posted image

user posted image
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 06:47 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed