Reply to this topicStart new topic
> See Text In Images?

Quarter Grand Poster

Group: Members
Joined: 6-February 07
Posts: 389
post Dec 31 2007, 03:06 PM
Is my text in jpeg images still not indexed? thanks.

(yahoo any different)? thanks
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 10-March 05
Posts: 1,065
From: Montreal Canada
post Dec 31 2007, 03:15 PM
Hard to figure out what this is asking.

If you are referring to text in images then it never will be indexed.

If you are referring to text in your pages that are made up of images then it also never will. e.g.: the word "new" as an image instead of the letters n-e-w.

If it's the alt= text then it will be but you need to be patient.
Offline Go to the top of the page

Quarter Grand Poster

Group: Members
Joined: 6-February 07
Posts: 389
post Dec 31 2007, 04:30 PM
thanks Bob, you answered it all!
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 31-July 06
Posts: 1,665
post Dec 31 2007, 08:35 PM
There has been some talk, though, of SEs attempting to read text in images, in the future. I don't really understand how this could work, but I know I've seen it mentioned. Maybe this is what Kevs has heard, too?
Miriam
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 27-July 05
Posts: 2,936
post Dec 31 2007, 09:34 PM
OCR has been around for at least ten years. Google can read it. I believe that they have been using it for a long time.

For example.... lots of people search for "florida lotteries". Not an incredibly difficult term to rank for but tons and tons of traffic. So spammers put up pages with images for viagra and diet products - everybody in florida wants those products (-:

The spammers have valid text pages, relevant for that term and place their viagra ads in the visitors face as images. I believe the google knocked those pages down by reading the images.
Offline Go to the top of the page

Quarter Grand Poster

Group: Members
Joined: 6-February 07
Posts: 389
post Dec 31 2007, 09:38 PM
Yes, this is what I was asking. So what the verdict?

I have a profile on a website. My logo, jpeg, has my full name in it. I don't mind it being on that website, but in this instance, I don't really want it to be indexed. Will it be?
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 27-July 05
Posts: 2,936
post Dec 31 2007, 09:45 PM
Google has every ability to read it.

However, I just searched for a number of phrases that appear in .gif and .jpg images on my website. Did not find them.

So, the answer to your questions is probably "NO".
Offline Go to the top of the page

Centenarian Poster

Group: Members
Joined: 20-November 05
Posts: 142
post Dec 31 2007, 11:11 PM
I would bet that OCR has improved since the last time I used the bundled freebie software that came with my very first scanner, but even state of the art OCR is not going to be able to pick up the varieties of fonts used as text in graphics. Maybe the basic web-standard fonts, but if that's the case, you should be making those as real text over a background image anyway.

EGOL, you've never struck me as a hunch kind of guy, but your post sounds like hypothesis. OCR seems to be too iffy a proposition to me for a search engine to base results. That's my hypothesis.

Could it be those [Viagra] ad images were links and G followed the links and found they were bad neighborhoods? It seems there would be easier, more reliable breadcrumbs to follow than OCR.

BTW, good to participate in a thread with you again, EGOL.
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 27-July 05
Posts: 2,936
post Dec 31 2007, 11:30 PM
Hello Robert, nice to see you again.

Thanks for that "bad neighborhood" idea. That could be the reason. Probably more likely than reading the images.

I have lots of "hunches"... and bet on them big time. wink.gif

This post has been edited by EGOL: Dec 31 2007, 11:32 PM
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 22-May 06
Posts: 1,632
post Jan 1 2008, 01:54 AM
Google can certainly read text in images. They can also read the file byte by byte if they wish and probably do to catch viruses! They also need the technology to improve their image indexing capabilities (they can detect faces very easily say from landscapes).

There difficulties with 'edge detection' algorithms (that is why we have Captchas!), but for most fonts this is not a difficult exercise.

Egol's explanation of the 'Viagra' ads makes sense. If Google does that routinely highly unlikely due to the tremendous computing power they would need. Anyway if one wanted to hide the images from Google why could they not use javascript? or encode the image?

Yannis
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 10-March 05
Posts: 1,065
From: Montreal Canada
post Jan 1 2008, 02:14 AM
I used OCR as far back as 4 years ago using the "cream of the crop" app from Adobe called Capture and the results were full of errors. Nothing I would bet anything on. We are far from good OCR unless the font is 16 point at 300 dpi double spaced mono font and even then. Their 3% error rate is full of spit.

I was OCRing legal documents using courrier at 10 point double spaced which is kind of normal and it was not good. As a matter of fact it was bad. Sorry Adobe.
QUOTE
They can also read the file byte by byte if they wish and probably do to catch viruses!
Yes viruses can be caught by binary signatures because they are binary in nature.

This post has been edited by bobbb: Jan 1 2008, 02:38 AM
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 28-April 03
Posts: 1,489
From: UK
post Jan 1 2008, 03:16 AM
Reliable ocr will come in time.

Offline Go to the top of the page

Hall of Famer

Group Icon
Group: Hall Of Fame
Joined: 3-November 05
Posts: 3,461
From: CHeeseland
post Jan 1 2008, 03:45 AM
If you want your content indexed by search engines, it always makes sense to provide it in a text form (or at least in an alt-attribute of an image). Text is still the only way you can make sure that your message can reach all visitors. I'm curious, do you all see that changing in the future? Where would it make sense to read text from images and make it searchable (other than EGOL's mentioned spam tests)?

John
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 28-April 03
Posts: 1,489
From: UK
post Jan 1 2008, 04:00 AM
Well before google patents it, if they have not already, as they like patenting all the stuff that seems obvious,
I see reading of images for many reasons.

One - to see what the images say, as its a form of cloaking to put text in images, that the search engines cant read.

two - because there is a lot of text in a lot of images, such as names on shops, logos, products, clothing etc.

three - because text is text - just because it is in a image does not mean it should be ignored.

Current search engine technology ignores so much data it is scary.

To add to that, its not just text in images that can be read.

Future tech will allow the reading of context of a image, the contents of a image etc.

Even the mood or target audience of a image.

Ok I may be old or dead before tech improves that much, but there is so much that is obvious to me, that someone will patent and stop others from developing that it scares me....


Not just images either, same applies to sound, and also videos.

Index sound files the same ways.....

Search engines have a long, long way to go.

Heck in time, emotions and bodily function readouts will be indexed.

Maybe even one day, search engines with the resources to do so, will actually return results based on relevance instead of seo.....

This post has been edited by kensplace: Jan 1 2008, 04:01 AM
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 26-August 07
Posts: 1,521
post Jan 1 2008, 10:38 AM
I've never had any real issue with OCR... I've always had Adobe and Abbey... and both work fine... so long as you have a copy of the font used, or there is sufficent examples of all characters in the document.
(you may also have to spend a day or two learning to fine tune things and fioguring how to select bits at a time etc.)

As to things like google using OCR... have you not seen the nubmer of PDF's and the option to view as HTML?
How do you think that is done?

Okay, they may not go through all sites converting pdf's to html and then logging the terms in them... but they "could"!
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 10-March 05
Posts: 1,065
From: Montreal Canada
post Jan 1 2008, 12:32 PM
QUOTE
As to things like google using OCR... have you not seen the nubmer of PDF's and the option to view as HTML?
How do you think that is done?
PDF that is searchable inside.

I'll do an experiment and put a non indexable PDF on a site and see. By non indexable I mean that if you open it and cannot select text then it is just an image of words...[or a tree].

QUOTE
Reliable ocr will come in time.
Yes and some day we will have conversations with computers ala Star Trek.

The SEs may want to filter porn and stuff by being better at images. I still think your text in images will not be indexed for a long while.
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 28-April 03
Posts: 1,489
From: UK
post Jan 4 2008, 04:31 PM
I see from the great seobythesea blog smile.gif that google already has patented some of the above

When will they stop patenting the obvious......

It should be outlawed...

This post has been edited by kensplace: Jan 4 2008, 04:32 PM
Offline Go to the top of the page

Star Member

Group: Members
Joined: 11-December 03
Posts: 800
From: Back in Sunny California
post Jan 4 2008, 04:41 PM
In my mind it's never been a question that search engines have the ability to read text in images, video, speech to text ( the technology is out there ). And I'm sure they've been doing well before they officially had it patented.

The underlying question is how frequently they use the technology. I think most search engines are selective on the document and sites which they use it because it's very resource intensive. With the amount of data the SEs consume, they definitely can't run it against every filter even though it would make the data more comprehensive. The patent that Kensplace mentions is a great list of use cases where image to text makes sense.

So for the regular website owner that has a lot of text in images, more likely than not your content will not be OCR'd or seen by the engine. So if you're concerned about having your information visible, check out John's post above and convert it to flat text.
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 06:33 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed