This MedLibrary.org supplementary page on Word salad is provided directly from the open source Wikipedia as a service to our readers. Please see the note below on authorship of this content, as well as the Wikipedia usage guidelines. To search for other content from our encyclopedia supplement, please use the form below:
Related Sponsors
| This article needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (October 2007) |
| This article or section may contain original research or unverified claims. Please improve the article by adding references. See the talk page for details. (September 2007) |
Word salad is a mixture of seemingly meaningful words that together signify nothing;1 the term originated as the common name for schizophasia, a symptom of various mental illnesses. When applied to a physical theory, "word salad" is a derogatory description that labels the theory as senseless or utterly devoid of meaning.citation needed Word salad may also be a term of scorn, used to denote derisive feelings toward a person or organization's speech or press releases.
In the context of computer science and linguistics, explicitly constructed word salad is a tool for demonstrating the difference between random utterance and coherent expression of thought. Software such as the Dissociated press within emacs demonstrates the construction of interesting-but-meaningless word salad from large samples of coherent language, by constructing new, random documents that share some of the same word or letter clustering properties as the language sample. These word salads appear as natural language to the inattentive eye or ear, but are clearly meaningless when read or listened to with full attention. In the 21st century, spammers have begun using word salad construction as a way to elude e-mail filtering and attract web page indexing to spam.23
Contents |
Word salad with spam e-mail
In response to the growing problem of spam e-mail, filtering tools became available starting around 2002 which implemented a widely employed method known as the naive Bayes classifier. This method uses the probability of various words appearing in spam emails to automatically classify them as spam. For a short time, this worked fairly well to classify emails as probable spam. In response, spammers developed word salad to fool programs employing this method of classification.4 By adding large amounts of random text somewhere in their message, spammers hope to confuse Bayesian classifiers into classifying the message as "ham e-mail" (non-spam e-mail). Typically, this text contains random words from a dictionary.citation needed
Word salad for web page spam
Gyöngyi and Garcia-Molina state this problem clearly:
"As more and more people rely on the wealth of information available online, increased exposure on the World Wide Web may yield significant financial gains for individuals or organizations. Most frequently, search engines are the entryways to the Web; that is why some people try to mislead search engines, so that their pages would rank high in search results, and thus, capture user attention."2
Sentence and paragraph salad
Paragraph salad will reduce the effectiveness of any of the algorithms mentioned above and will lead to higher scores with any Bayesian filters. The only algorithms that might thwart sentence and paragraph salad would be very high level and expensive natural language processing, some kind of artificial intelligence algorithm involving a search engine, or exhaustive listing of spam emails. All of these techniques would be exceptionally expensive, and would likely not be very successful at filtering spam despite their high cost.citation needed
In a related technique, actual text from some large corpus of legitimate English (the plays of Shakespeare, other etexts distributed by Project Gutenberg, random world wide web pages, Wikipedia, or the like) is added into the email. This approach attempts to get around algorithms that could be devised to detect the more primitive form of word salad.5
Letter salad
On an even smaller scale than word salad, spammers use misspellings of words to try to thwart Bayesian filters. Misspelling Viagra as Via6ra, \/|/\Gr/\, or any one of a number of other ways (see Leet), or even using characters from international character sets is an attempt to avoid the high efficiency with which a Bayesian filter would classify any email containing certain words as spam. A simple spell checker might significantly reduce the effectiveness of letter salad approaches, yet most present spam filters do not use one.citation needed
Word salad filtering
Algorithms for detecting word salad are clearly possible and not particularly difficult to implement.citation needed They would be, for the most part, more computationally intensive than most rules used by spam filters today (2006). A statistical approach based on Zipf's law of word frequency has potential in detecting simple word salad, as do grammar checking and the use of natural language processing.6 Statistical Markovian analysis, where short phrases are used to determine if they are likely to occur in normal English sentences, is another statistical approach that would be effective against completely random phrasing6 but might be fooled by Dissociated press techniques.citation needed
Future
As spam filters get better at detecting simple world and letter salad, spammers will likely migrate towards sentence and paragraph salad techniques.citation needed In the process of obscuring their message from improving spam filters, they will also obscure their message from potential targets of their advertising, virus distribution, or phishing. At some point, the profitability of spam may be brought down to the point that its volume is substantially reduced.
Word salad as an insult
In everyday usage, the term "word salad" may be used to indicate disgust or contempt for someone's speech. Examples of speech that some consider word salad are Sarah Palin's responses to questions about bailouts7 and Caitlin Upton's remarks about why Americans cannot locate America on a map.
Notes
References
Gyöngyi, Zoltán; Garcia-Molina, Hector (2005), "Web spam taxonomy", Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2005 in The 14th International World Wide Web Conference (WWW 2005) May 10, (Tue)-14 (Sat), 2005, Nippon Convention Center (Makuhari Messe), Chiba, Japan., New York, N.Y.: ACM Press, ISBN 1-59593-046-9
Lavergne, Thomas (2006). "Unnatural language detection". RJCRI'O6: Young Scientist' conference on Information Retrieval: 383-388, (French?). Retrieved on 2007-10-02.
Wikipedia content modification information:
- This page was last modified on 15 November 2008, at 05:32.
Wikipedia Authorship and Review
Wikipedia content provided here is not reviewed directly by MedLibrary.org. Wikipedia content is authored by an open community of volunteers and is not produced by or in any way affiliated with MedLibrary.org.
Wikipedia Usage Guidelines
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article on "Word salad".
The URL for this specific entry is:
All Wikipedia text is available under the terms of the GNU Free Documentation License. (See Copyrights for details). Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
