![]() ![]() |
MemberGroup: Members
Joined: 30-June 05
Posts: 38
|
Jul 10 2006, 12:45 PM |
|
|
Hi, there.
This might shed some light to the subject, to the meaning of the terms "latent" and "semantic structure" and what one can get from LSI in general: Demystifying LSA, LSI, SVD, PCA, AND IS acronisms Hope this help. Sorry I cannot follow the discussion. I'm too busy. This post has been edited by orion: Jul 10 2006, 12:47 PM |
||
| Offline | ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 1-September 02
Posts: 9,213
From: UK
|
Jul 19 2006, 03:42 PM |
|
| Offline | ![]() |
MemberGroup: Members
Joined: 30-June 05
Posts: 38
|
Aug 11 2006, 02:53 PM |
|
|
The Singular Value Decomposition and Latent Semantic Indexing Tutorial Series is now available at http://www.miislita.com
So far, only Part 1 and 2 are available. The series is designed to provide the non specialist (IR students and search engine marketers) with how-to calculation instructions and to debunk/demystify the many myths about SVD and LSI many SEMs/SEOs have. SVD and LSI Tutorial 1: Understanding SVD and LSI This tutorial introduces you to SVD and LSI. Includes: 1. Search Engine Marketers and their LSI Myths. 2. SVD/LSI Applications and Limitations. 3. A Geometrical Visualization of SVD. SVD and LSI Tutorial 2: Computing Singular Values This tutorial shows you how to compute singular values. Includes: 1. Matrix Transposition. 2. The Frobenius Norm. 3. Computing singular values and singular matrices. Subsequent parts shows how to compute the Full SVD, stepwise how-to calculations on how LSI scores and rank documents and new advances in the field. They will soon be out. I will eventually show you how to compute/play with LSI for your projects without having to pay a dime to anyone. Enjoy it. |
||
| Offline | ![]() |
MemberGroup: Members
Joined: 30-June 05
Posts: 38
|
Aug 26 2006, 05:01 PM |
|
|
And here is my case against a portion of the SEO industry that are just LSI-based Snake Oil Marketers
This post has been edited by orion: Aug 26 2006, 05:02 PM |
||
| Offline | ![]() |
MemberGroup: Members
Joined: 30-June 05
Posts: 38
|
Sep 20 2006, 12:49 PM |
|
|
Here is Part 4 of the SVD and LSI Tutorial series.
http://www.miislita.com/information-retrie...lculations.html Note how straightforward is LSI. Now anyone can compute LSI scores (or at least understand the basic calculations) with nothing, but just an online matrix calculator. I have included some basic procedures. This should help SEOs to get out of their head more myths and misconceptions regarding LSI. Enjoy it and stay away from LSI snake oil sellers. Dr. E. Garcia |
||
| Offline | ![]() |
MemberGroup: Members
Joined: 30-June 05
Posts: 38
|
Sep 22 2006, 09:03 AM |
|
|
The quick reference for the series on SVD and LSI, the LSI Fast Track Tutorial, is now available at
http://www.miislita.com/information-retrie...ck-tutorial.pdf Note that there are no "magic words" in LSI. Note also how term vector theory is still used, at the begining and at the end of the SVD decomposition. Dr. E. Garcia |
||
| Offline | ![]() |
MemberGroup: Members
Joined: 30-June 05
Posts: 38
|
Oct 3 2006, 12:50 PM |
|
|
Here is Mike Grehan recent ClickZ column Lies, Lies, and LSI. In that article Mike and Randfish take positions on the whole issue of certain SEOs trying to market LSI services. Here is my response to few comments made by Randfish.
|
||
| Offline | ![]() |
MemberGroup: Members
Joined: 30-June 05
Posts: 38
|
Oct 19 2006, 08:25 AM |
|
|
Finally, here is the last article of the tutorial series on SVD and LSI:
LSI Keyword Research and Co-Occurrence Theory In this LSI tutorial readers will learn how to cluster keywords in a k-dimensional reduced space. They will also learn how first- and second-order co-occurrence affects LSI scores. This should demystify the so-called "LSI tools", most of which are based on permutation and synonym lookups, not on LSI at all. Some merely use plain term vectors and even others simply fake the results. Non of these use SVD, so by definition are not using LSI. Here is a good tip. When we cluster terms using LSI, terms must be in the initial term-document matrix. So, whatever the results these will only be valid for the tested universe. An "LSI tool" that reports terms not present in the original universe is simply appending these from an external source (e.g., by means of a word list lookup) and therefore faking the results. So far I have not seen any valid LSI tool from any search marketing firm. My feel is that some that have bought one have been "taken". Also available: LSI Keyword Research - A Fast Track Tutorial Both pieces are designed to demystify how LSI clusters keywords. We also debunk a bad SEO advice: The Synonym Myth. According to this myth, to make "LSI-friendly" certain SEOs are advicing the stuffing of documents with synonyms and related terms. This is a bad advice for two reasons: 1. it shows a lack of understanding on how LSI group terms. 2. LSI clustering power is not due to the nature of the terms, but direct consequence of a co-occurrence phenomenon. Terms do not have to be synonyms to be clustered. However, let me say this. The use of synonyms and related terms in documents is a sounded technique used by professional writers for centuries and recommended, but one should not stuff documents with these because back in 1988 Dumais and others applied 1965 Golub and Kahan SVD algorithm to a vocabulary problem and called that LSI (or LSA if you wish). Doing so in an arbitrary manner demonstrates a lack of understanding of latent semantic indexing theory. Once again, stay away from SEO firms that promote so-called "LSI Tools". Do not let these to game you. Cheers Dr. E. Garcia This post has been edited by orion: Oct 19 2006, 08:34 AM |
||
| Offline | ![]() |
MemberGroup: Members
Joined: 30-June 05
Posts: 38
|
Oct 20 2006, 09:19 AM |
|
|
Indeed, a lot of use of synonyms and related terms in a copy has nothing to do with LSI.
At this DigitalPoint thread I explained that the use of synonyms and related terms is a common sense practice one should use to improve copy style, but not that one should use because of LSI. There is no such thing as documents "LSI-friendly" Some SEOs are giving the wrong advice by saying that one should use synonyms and related terms under the pretension or wrong thesis that this will make a document "Lsi friendly". In fact, when one think thoroughly there is no such thing as making documents "LSI friendly". This is another SEO Myth. The great thing about a phenomenon taking place at a global level like co-occurrence and IDF (inverse document frequency) is that the chances for end users to manipulate these are close to nada, zero, zip, nothing. In LSI, co-occurrence (especially second-order co-occurrence) is responsible for the LSI scores assigned to terms, not the nature of the terms itself or whether these are synonyms or related terms. In the early LSI papers this was not fully addressed and emphasis was given to synonyms. Why? Because the documents selected to conduct those experiments happen to contain synonyms and related terms. It was thought that synonymity association was responsible for the clustering phenomenon. The fact is that this was direct result co-occurrence patterns present in the LSI matrix. In recent years several papers have been published on the subject: Understanding LSI via the Truncated Term-term Matrix, 2005 Thesis, by Regis Newo (Germany) A Framework for Understanding Latent Semantic Indexing (LSI) Performance, April Kontostathis and William Pottenger (Lehigh University). Pottenger and Kontostathis have published a series of papers on the subject. These two studies explain the role of co-occurrence patterns in the LSI matrix, but differ a bit in some of their findings. SEOs are still quoting the first LSI papers from the late eighties and early nineties and in the process some have stretched that old research in order to market better whatever they sell. The following figure from the last tutorial shows that LSI cluster documents, not because these are synonyms, but because first and second order co-occurrence paths present in the term-document matrix, as can be seen from the corresponding eigenvectors and term vectors. ![]() Certainly in this term-document example taken from Grossman and Frieder IR textbook (note: the data is theirs, but the graph and calculations are mine) non of the terms are synonyms. Still LSI was able to cluster terms. When LSI is applied to a term-document matrix representing a collection of documents in the zillions, the co-occurrence phenomenon that affects the LSI scores becomes a global effect, occuring between documents in the collection. Thus, the only way that end users (e.g. SEOs) would influence the LSI scores is if they can access and control the content of all the documents of the matrix or launch a coordinated spam attack to the entire collection. The later would be the case of a spammer trying to make an LSI-based search engine to index billion of documents (to say a quantity) he/she have created. If an end user or research want to understand and manipulate the effect of co-occurrence in a single document, he/she would need to deconstruct a single document and make a term-passage matrix for that single document and to this apply LSI --then play by manipulating single terms. Whatever the results these will only be valid for that universe represented by the matrix, that is for that and only that document. If such document is then submitted to the LSI-based search engine that local effect simply vanishes and global co-occurrence "takes control" and spreads throughout the collection, forming the corresponding connectivity paths that eventually forces a redistribution of term weights. Consequently, SEOs that sell this idea of making documents "LSI-friendly" like some firms sending emails reading "is your site LSI optimized?", "we can make your documents LSI-valid!" or those that promote the notion of "LSI and Link Popularity" end exposed for what they are and for how much they know about search engines. The sad thing is that these find their way via search engine conferences (SES), blogs and forums to deceive the industry with such blogonomies. BTW here are Two More LSI Blogonomies. Dr. E. Garcia This post has been edited by orion: Dec 5 2006, 12:03 PM |
||
| Offline | ![]() |
![]()
|
|
3 Pages < 1 2 3 >
|
|
| Lo-Fi Version | Time is now: 9th February 2010 - 06:20 PM |
| Meet our Moderators: | cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |