Jump to content

Leading Community for Usability, Search Engine Marketing,
Social Networking, Site Planning & Web Site Development, Since 1998


Photo

See Text In Images?


17 replies to this topic

#1 kevs

kevs

    Light Speed Member

  • Members
  • 760 posts

Posted 31 December 2007 - 03:06 PM

Is my text in jpeg images still not indexed? thanks.

(yahoo any different)? thanks

#2 bobbb

bobbb

    Time Traveler Member

  • 1000 Post Club
  • 1426 posts

Posted 31 December 2007 - 03:15 PM

Hard to figure out what this is asking.

If you are referring to text in images then it never will be indexed.

If you are referring to text in your pages that are made up of images then it also never will. e.g.: the word "new" as an image instead of the letters n-e-w.

If it's the alt= text then it will be but you need to be patient.

#3 kevs

kevs

    Light Speed Member

  • Members
  • 760 posts

Posted 31 December 2007 - 04:30 PM

thanks Bob, you answered it all!

#4 SEOigloo

SEOigloo

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2100 posts

Posted 31 December 2007 - 08:35 PM

There has been some talk, though, of SEs attempting to read text in images, in the future. I don't really understand how this could work, but I know I've seen it mentioned. Maybe this is what Kevs has heard, too?
Miriam

#5 EGOL

EGOL

    Eyes Like Hawk Moderator

  • Moderators
  • 4573 posts

Posted 31 December 2007 - 09:34 PM

OCR has been around for at least ten years. Google can read it. I believe that they have been using it for a long time.

For example.... lots of people search for "florida lotteries". Not an incredibly difficult term to rank for but tons and tons of traffic. So spammers put up pages with images for viagra and diet products - everybody in florida wants those products (-:

The spammers have valid text pages, relevant for that term and place their viagra ads in the visitors face as images. I believe the google knocked those pages down by reading the images.

#6 kevs

kevs

    Light Speed Member

  • Members
  • 760 posts

Posted 31 December 2007 - 09:38 PM

Yes, this is what I was asking. So what the verdict?

I have a profile on a website. My logo, jpeg, has my full name in it. I don't mind it being on that website, but in this instance, I don't really want it to be indexed. Will it be?

#7 EGOL

EGOL

    Eyes Like Hawk Moderator

  • Moderators
  • 4573 posts

Posted 31 December 2007 - 09:45 PM

Google has every ability to read it.

However, I just searched for a number of phrases that appear in .gif and .jpg images on my website. Did not find them.

So, the answer to your questions is probably "NO".

#8 Robert_Paulson

Robert_Paulson

    Gravity Master Member

  • Members
  • 163 posts

Posted 31 December 2007 - 11:11 PM

I would bet that OCR has improved since the last time I used the bundled freebie software that came with my very first scanner, but even state of the art OCR is not going to be able to pick up the varieties of fonts used as text in graphics. Maybe the basic web-standard fonts, but if that's the case, you should be making those as real text over a background image anyway.

EGOL, you've never struck me as a hunch kind of guy, but your post sounds like hypothesis. OCR seems to be too iffy a proposition to me for a search engine to base results. That's my hypothesis.

Could it be those [Viagra] ad images were links and G followed the links and found they were bad neighborhoods? It seems there would be easier, more reliable breadcrumbs to follow than OCR.

BTW, good to participate in a thread with you again, EGOL.

#9 EGOL

EGOL

    Eyes Like Hawk Moderator

  • Moderators
  • 4573 posts

Posted 31 December 2007 - 11:30 PM

Hello Robert, nice to see you again.

Thanks for that "bad neighborhood" idea. That could be the reason. Probably more likely than reading the images.

I have lots of "hunches"... and bet on them big time. ;-)

Edited by EGOL, 31 December 2007 - 11:32 PM.


#10 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 01 January 2008 - 01:54 AM

Google can certainly read text in images. They can also read the file byte by byte if they wish and probably do to catch viruses! They also need the technology to improve their image indexing capabilities (they can detect faces very easily say from landscapes).

There difficulties with 'edge detection' algorithms (that is why we have Captchas!), but for most fonts this is not a difficult exercise.

Egol's explanation of the 'Viagra' ads makes sense. If Google does that routinely highly unlikely due to the tremendous computing power they would need. Anyway if one wanted to hide the images from Google why could they not use javascript? or encode the image?

Yannis

#11 bobbb

bobbb

    Time Traveler Member

  • 1000 Post Club
  • 1426 posts

Posted 01 January 2008 - 02:14 AM

I used OCR as far back as 4 years ago using the "cream of the crop" app from Adobe called Capture and the results were full of errors. Nothing I would bet anything on. We are far from good OCR unless the font is 16 point at 300 dpi double spaced mono font and even then. Their 3% error rate is full of spit.

I was OCRing legal documents using courrier at 10 point double spaced which is kind of normal and it was not good. As a matter of fact it was bad. Sorry Adobe.

They can also read the file byte by byte if they wish and probably do to catch viruses!

Yes viruses can be caught by binary signatures because they are binary in nature.

Edited by bobbb, 01 January 2008 - 02:38 AM.


#12 kensplace

kensplace

    Time Traveler Member

  • 1000 Post Club
  • 1489 posts

Posted 01 January 2008 - 03:16 AM

Reliable ocr will come in time.

#13 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 01 January 2008 - 03:45 AM

If you want your content indexed by search engines, it always makes sense to provide it in a text form (or at least in an alt-attribute of an image). Text is still the only way you can make sure that your message can reach all visitors. I'm curious, do you all see that changing in the future? Where would it make sense to read text from images and make it searchable (other than EGOL's mentioned spam tests)?

John

#14 kensplace

kensplace

    Time Traveler Member

  • 1000 Post Club
  • 1489 posts

Posted 01 January 2008 - 04:00 AM

Well before google patents it, if they have not already, as they like patenting all the stuff that seems obvious,
I see reading of images for many reasons.

One - to see what the images say, as its a form of cloaking to put text in images, that the search engines cant read.

two - because there is a lot of text in a lot of images, such as names on shops, logos, products, clothing etc.

three - because text is text - just because it is in a image does not mean it should be ignored.

Current search engine technology ignores so much data it is scary.

To add to that, its not just text in images that can be read.

Future tech will allow the reading of context of a image, the contents of a image etc.

Even the mood or target audience of a image.

Ok I may be old or dead before tech improves that much, but there is so much that is obvious to me, that someone will patent and stop others from developing that it scares me....


Not just images either, same applies to sound, and also videos.

Index sound files the same ways.....

Search engines have a long, long way to go.

Heck in time, emotions and bodily function readouts will be indexed.

Maybe even one day, search engines with the resources to do so, will actually return results based on relevance instead of seo.....

Edited by kensplace, 01 January 2008 - 04:01 AM.


#15 Autocrat

Autocrat

    Sonic Boom Member

  • 1000 Post Club
  • 1521 posts

Posted 01 January 2008 - 10:38 AM

I've never had any real issue with OCR... I've always had Adobe and Abbey... and both work fine... so long as you have a copy of the font used, or there is sufficent examples of all characters in the document.
(you may also have to spend a day or two learning to fine tune things and fioguring how to select bits at a time etc.)

As to things like google using OCR... have you not seen the nubmer of PDF's and the option to view as HTML?
How do you think that is done?

Okay, they may not go through all sites converting pdf's to html and then logging the terms in them... but they "could"!

#16 bobbb

bobbb

    Time Traveler Member

  • 1000 Post Club
  • 1426 posts

Posted 01 January 2008 - 12:32 PM

As to things like google using OCR... have you not seen the nubmer of PDF's and the option to view as HTML?
How do you think that is done?

PDF that is searchable inside.

I'll do an experiment and put a non indexable PDF on a site and see. By non indexable I mean that if you open it and cannot select text then it is just an image of words...[or a tree].

Reliable ocr will come in time.

Yes and some day we will have conversations with computers ala Star Trek.

The SEs may want to filter porn and stuff by being better at images. I still think your text in images will not be indexed for a long while.

#17 kensplace

kensplace

    Time Traveler Member

  • 1000 Post Club
  • 1489 posts

Posted 04 January 2008 - 04:31 PM

I see from the great seobythesea blog ;) that google already has patented some of the above

When will they stop patenting the obvious......

It should be outlawed...

Edited by kensplace, 04 January 2008 - 04:32 PM.


#18 phaithful

phaithful

    Light Speed Member

  • Members
  • 800 posts

Posted 04 January 2008 - 04:41 PM

In my mind it's never been a question that search engines have the ability to read text in images, video, speech to text ( the technology is out there ). And I'm sure they've been doing well before they officially had it patented.

The underlying question is how frequently they use the technology. I think most search engines are selective on the document and sites which they use it because it's very resource intensive. With the amount of data the SEs consume, they definitely can't run it against every filter even though it would make the data more comprehensive. The patent that Kensplace mentions is a great list of use cases where image to text makes sense.

So for the regular website owner that has a lot of text in images, more likely than not your content will not be OCR'd or seen by the engine. So if you're concerned about having your information visible, check out John's post above and convert it to flat text.



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users