We will use the Yahoo! API because it is the easiest to learn and use, and their dataset is huge. Also, the have a funky operator called linkdomain:. So our tool will do something very simple: you feed it a domain name, and it searches Yahoo, using the API:
linkdomain:mydomain.com -site:mydomain.com
Also, we'll be using PHP5. It makes life so much easier!
Let's roll.
Yahoo! Developer Network and API 101
The most important site you need to know about in this tutorial is the Yahoo! Developer Network. There you can find code samples, documentation, mailing lists, and everything you need to get going for all of Yahoo's APIs. Read it, love it, learn it, and most certainly bookmark it!
The Yahoo API is done over a protocol called REST. Don't worry to much what that means, but what we care about is its simplicity. With REST, you construct a URL that contains instructions (our 'command' for the API) and then request that URL from the server, as if you're browsing. Instead of seeing an HTML webpage, you get valid XML.
So our task is two-fold:
- Construct the URL and request it (as if we're browsing for it)
- Retrieve and parse the XML
The Yahoo API requires that each application using it identify itself. The identification is straightforward: you sign in to the YDN and register a unique Application ID. The demo ID that Yahoo uses for all its examples is YahooDemo, so we'll use that here. You need to get your own if you're going to be using the API extensively.
Another thing: the API is limited to 5000 queries a day per IP address. That should be fine for everyone, but if you get throttled suddenly, this might be it
Setting Up
First we create the input form and the basic structure of the page:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Linkdomain Search</title> </head> <body> <h1>Lindomain Search</h1> <form action="<?php echo $_SERVER['PHP_SELF'];?>" method="get> Domain: <input type="text" name="thedomain" /> <input type="submit" value="Find Links" /> </form> </body></html>Nothing exciting: just a simple XHTML page with a form. The form submits to itself a variable called thedomain. You can save this code to any file name you want.
Next up, we add some logic to see if we have a domain name to process. If we do, we retrieve the domain name and sanitize it a bit (for security).
if(isset($_GET['thedomain'])){
$domain = strip_tags(trim($_GET['thedomain']));
}
Next we can construct the search, which is:
$search = urlencode("linkdomain:$domain -site:$domain");
Notice that we URL-encoded the search because we're starting to construct the REST URL now. So a quick recap at where we are, the full code is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Linkdomain Search</title>
</head>
<body>
<h1>Lindomain Search</h1>
if(isset($_GET['thedomain'])){
$domain = strip_tags(trim($_GET['thedomain']));
$search = urlencode("linkdomain:$domain -site:$domain");
}
<form action="<?php echo $_SERVER['PHP_SELF'];?>" method="get>
Domain: <input type="text" name="thedomain" /> <input type="submit" value="Find Links" />
</form>
</body></html>
Constructing the REST URL
Now we fully construct the REST URL. The base for all Yahoo API web search service is http://search.yahooa...ce/V1/webSearch . To this, we append our query and some other bits and it's ready:
$SearchURL = "http://search.yahooapis.com/WebSearchService/V1/webSearch?query=$search&start=0&results=50&appid=YahooDemo";The query string variables are easy to explain: we have $query which is our... er... query. We want the results that start at 0 (programmers are funny that way because we sometimes start counting at zero instead of one), and we want 50 results. Finally, we append our application ID and we're set.
Retrieving the XML
Now that we have a URL, we need to 'browse' to it. In reality, we'll actually fetch the URL and what we fetch will be the XML.
Now here is a trick: APIs can be unstable, especially Google's. Sometimes, if you query an API, it returns an error. A second later, with the exact same query, it works. Of course, we can put in error management, but wouldn't it be nice if our program tried more than once on our behalf? Of course it would be.
To accomplish this, we create a loop that says "keep trying this until you get some XML or you've tried 3 times and wait a second between tries". In PHP:
do{
$xml = @simplexml_load_file($SearchURL);
if(!$xml){
$Tries++;
sleep(1);
}//eob if(!$xml)
}while(!$xml && $Tries < 4);
Now we check if we actually got some XML, and if we did, find out how many results were returned:
if($xml){
foreach($xml->attributes() as $name=>$attr) $res[$name]=$attr;
$results = $res['totalResultsReturned'];
echo "<p>Found $results result(s)</p>";
}
Now we continue: we parse the XML (technically 'traverse the XML') and display the results:
for($i=0; $i<$results; $i++) {
foreach($xml->Result[$i] as $key=>$value) {
if($key == "Url"){
$url = $value;
}
if($key == 'Title'){
$title = $value;
}
if($key == 'Summary'){
$snip = $value;
}
}//eob foreach
echo "<div class=\"resultdiv\"><a href=\"$url\" target=\"_blank\">$title</a></div>";
echo "<div class=\"snipdiv\">$snip</div>";
}//eob for loop
}
Finally, remember we checked using if($xml). What happens if even after 3 tries we still didn't get any XML? We have to display some form of error. With this final addition, we're done! The full code is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Linkdomain Search</title>
</head>
<body>
<h1>Lindomain Search</h1>
<?php
if(isset($_GET['thedomain'])){
$domain = strip_tags(trim($_GET['thedomain']));
$search = urlencode("linkdomain:$domain -site:$domain");
$SearchURL = "http://search.yahooapis.com/WebSearchService/V1/webSearch?query=$search&start=0&results=50&appid=YahooDemo";
do{
$xml = @simplexml_load_file($SearchURL);
if(!$xml){
$Tries++;
sleep(1);
}//eob if(!$xml)
}while(!$xml && $Tries < 4);
if($xml){
foreach($xml->attributes() as $name=>$attr) $res[$name]=$attr;
$results = $res['totalResultsReturned'];
echo "<p>Found $results result(s)</p>";
for($i=0; $i<$results; $i++) {
foreach($xml->Result[$i] as $key=>$value) {
if($key == "Url"){
$url = $value;
}
if($key == 'Title'){
$title = $value;
}
if($key == 'Summary'){
$snip = $value;
}
}//eob foreach
echo "<div class=\"resultdiv\"><a href=\"$url\" target=\"_blank\">$title</a></div>";
echo "<div class=\"snipdiv\">$snip</div>";
}//eob for loop
}
else{
echo "<p>No XML was retrieved :(</p>";
}
}
?>
<form action="<?php echo $_SERVER['PHP_SELF'];?>" method="get">
Domain: <input type="text" name="thedomain" /> <input type="submit" value="Find Links" />
</form>
</body></html>What does it look like? Check it out!
Improvements
For one thing, we didn't go into any error management - just basic stuff. Also, the tool is specific for linkdomain searches which we can improve. We're using just the Yahoo API (which in this case makes sense) but we can add Google and MSN APIs too.
Randoms
Now I want to answer some questions from this thread:
World domination... with a lot of sleep. Seriously though: I'm working on a very cool project that I'll be needing some testing help and linklove for in a couple of weeksWhat are your future plans?
I've had people from IBM, Microsoft, Google, NASA, the US Airforce, US Military, Philips, and many more visit my sites.More obscure trivia about your site(s) - I'm sure you have some interesting stories to tell!
Funniest tech support story: a woman in an Arizona Uni emailed me for help with one of my scripts. Why? Because she was the new boss and didn't want to look weak in front of her programmer employees
PageRank! It's a measure of the probability that a random surfer will land on the page.chaos theory applied to SEO
Web-wise: a nice email from a visitor or a funky new link. Generally: the satisfaction of an achievement, be it learning something new (I remember the first time I figure out a perl guestbook), or figuring out something obscure, or finding a solution to some problem.I always want to know what kind of thing makes a person's day.
The knowledgeable people. I didn't participate in so many threads simply because I had nothing new to add! It's like brainwave central!Is there anything about being here that keeps you jazzed?
Shoutouts and thanks: Kim, Bill, John, Elizabeth, Joe, Yuri, Ruud, Rand x 2, Barry, Wit, and many many more I'm sure the weekend hangover is making me forget!
Pierre






