See Text In Images?
#2
Posted 31 December 2007 - 03:15 PM
If you are referring to text in images then it never will be indexed.
If you are referring to text in your pages that are made up of images then it also never will. e.g.: the word "new" as an image instead of the letters n-e-w.
If it's the alt= text then it will be but you need to be patient.
#5
Posted 31 December 2007 - 09:34 PM
For example.... lots of people search for "florida lotteries". Not an incredibly difficult term to rank for but tons and tons of traffic. So spammers put up pages with images for viagra and diet products - everybody in florida wants those products (-:
The spammers have valid text pages, relevant for that term and place their viagra ads in the visitors face as images. I believe the google knocked those pages down by reading the images.
#8
Posted 31 December 2007 - 11:11 PM
EGOL, you've never struck me as a hunch kind of guy, but your post sounds like hypothesis. OCR seems to be too iffy a proposition to me for a search engine to base results. That's my hypothesis.
Could it be those [Viagra] ad images were links and G followed the links and found they were bad neighborhoods? It seems there would be easier, more reliable breadcrumbs to follow than OCR.
BTW, good to participate in a thread with you again, EGOL.
#10
Posted 01 January 2008 - 01:54 AM
There difficulties with 'edge detection' algorithms (that is why we have Captchas!), but for most fonts this is not a difficult exercise.
Egol's explanation of the 'Viagra' ads makes sense. If Google does that routinely highly unlikely due to the tremendous computing power they would need. Anyway if one wanted to hide the images from Google why could they not use javascript? or encode the image?
Yannis
#11
Posted 01 January 2008 - 02:14 AM
I was OCRing legal documents using courrier at 10 point double spaced which is kind of normal and it was not good. As a matter of fact it was bad. Sorry Adobe.
Yes viruses can be caught by binary signatures because they are binary in nature.They can also read the file byte by byte if they wish and probably do to catch viruses!
Edited by bobbb, 01 January 2008 - 02:38 AM.
#13
Posted 01 January 2008 - 03:45 AM
John
#14
Posted 01 January 2008 - 04:00 AM
I see reading of images for many reasons.
One - to see what the images say, as its a form of cloaking to put text in images, that the search engines cant read.
two - because there is a lot of text in a lot of images, such as names on shops, logos, products, clothing etc.
three - because text is text - just because it is in a image does not mean it should be ignored.
Current search engine technology ignores so much data it is scary.
To add to that, its not just text in images that can be read.
Future tech will allow the reading of context of a image, the contents of a image etc.
Even the mood or target audience of a image.
Ok I may be old or dead before tech improves that much, but there is so much that is obvious to me, that someone will patent and stop others from developing that it scares me....
Not just images either, same applies to sound, and also videos.
Index sound files the same ways.....
Search engines have a long, long way to go.
Heck in time, emotions and bodily function readouts will be indexed.
Maybe even one day, search engines with the resources to do so, will actually return results based on relevance instead of seo.....
Edited by kensplace, 01 January 2008 - 04:01 AM.
#15
Posted 01 January 2008 - 10:38 AM
(you may also have to spend a day or two learning to fine tune things and fioguring how to select bits at a time etc.)
As to things like google using OCR... have you not seen the nubmer of PDF's and the option to view as HTML?
How do you think that is done?
Okay, they may not go through all sites converting pdf's to html and then logging the terms in them... but they "could"!
#16
Posted 01 January 2008 - 12:32 PM
PDF that is searchable inside.As to things like google using OCR... have you not seen the nubmer of PDF's and the option to view as HTML?
How do you think that is done?
I'll do an experiment and put a non indexable PDF on a site and see. By non indexable I mean that if you open it and cannot select text then it is just an image of words...[or a tree].
Yes and some day we will have conversations with computers ala Star Trek.Reliable ocr will come in time.
The SEs may want to filter porn and stuff by being better at images. I still think your text in images will not be indexed for a long while.
#17
Posted 04 January 2008 - 04:31 PM
When will they stop patenting the obvious......
It should be outlawed...
Edited by kensplace, 04 January 2008 - 04:32 PM.
#18
Posted 04 January 2008 - 04:41 PM
The underlying question is how frequently they use the technology. I think most search engines are selective on the document and sites which they use it because it's very resource intensive. With the amount of data the SEs consume, they definitely can't run it against every filter even though it would make the data more comprehensive. The patent that Kensplace mentions is a great list of use cases where image to text makes sense.
So for the regular website owner that has a lot of text in images, more likely than not your content will not be OCR'd or seen by the engine. So if you're concerned about having your information visible, check out John's post above and convert it to flat text.
Reply to this topic

0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users






