Latent Semantic Indexing
The Latent Semantic Indexing information retrieval model builds upon the prior research in information retrieval and, using the singular value decomposition (SVD) to reduce the dimensions of the term-document space, attempts to solve the synonomy and polysemy problems that plague automatic information retrieval systems. LSI explicitly represents terms and documents in a rich, high-dimensional space, allowing the underlying (``latent''), semantic relationships between terms and documents to be exploited during searching.
LSI relies on the constituent terms of a document to suggest the document's semantic content. However, the LSI model views the terms in a document as somewhat unreliable indicators of the concepts contained in the document. It assumes that the variability of word choice partially obscures the semantic structure of the document. By reducing the dimensionality of the term-document space, the underlying, semantic relationships between documents are revealed, and much of the ``noise'' (differences in word usage, terms that do not help distinguish documents, etc.) is eliminated. LSI statistically analyses the patterns of word usage across the entire document collection, placing documents with similar word usage patterns near each other in the term-document space, and allowing semantically-related documents to be near each other even though they may not share terms LSI differs from previous attempts at using reduced-space models for information retrieval in several ways. Most notably, LSI represents documents in a high-dimensional space. Koll for instance, used only seven dimensions to represent his semantic space. Secondly, both terms and documents are explicitly represented in the same space. Thirdly, unlike Borko and Bernick no attempt is made to interpret the meaning of each dimension. Each dimension is merely assumed to represent one or more semantic relationships in the term-document space. Finally, because of limits imposed mostly by the computational demands of vector-space approaches to information retrieval, previous attempts focused on relatively small document collections. LSI is able to represent and manipulate large data sets, making it viable for real-world applications Compared to other information retrieval techniques, LSI performs surprisingly well. In one test, Dumais found LSI provided more related documents than standard word-based retrieval techniques when searching the standard MED collection. Over five standard document collections, the same study indicated LSI performed an average of better than lexical retrieval techniques. In addition, LSI is fully automatic and easy to use, requiring no complex expressions or syntax to represent the query. Because terms and documents are explicitly represented in the space, relevance feedback can be seamlessly integrated with the LSI model, providing even better overall retrieval performance.

3 Comments:
People kept asking me about it, so there is the clearest definition I could find.
http://javelina.cet.middlebury.edu/lsa/out/lsa_definition.htm is a really good link for it to :)
I have a client who wants me to write them instructions on the suggestions I've made, they in turn make the physcial alterations. I there anyone out there that has a form for this, or a form for anything SEO, including contracts.
Post a Comment
<< Home