Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

How Is Text Mining Different From Data Mining?


  • Please log in to reply
3 replies to this topic

#1 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 14,819 posts

Posted 18 April 2017 - 10:49 AM

Something for my favorite brainiacs  :infinite-banana:

 

Text mining or text analytics is the analysis of unstructured data contained in natural language text using various methods, tools and techniques. It has become an important research process with applications in many different disciplines. 

 

 

Knowledge discovery through text analytics: advances, challenges and opportunities
Text mining is a form of data mining. Where other forms of data are typically organized as matrices, raw text data is “unstructured.” Simply put, text mining involves collecting and analyzing large volumes of textual data. Typically text mining is performed to learn about the groups or communities that produced the text, but its ultimate purpose depends on the field and the interests of the researcher. Social scientists use text mining tools to learn about shifting public opinion; marketers use it to learn about consumers’ opinions of products and services; and it has even been used to predict the direction of stock markets. As we discuss in the book, text mining involves multiple different tools for collecting data, as well as multiple different approaches to analyzing the data collected. These approaches include sentiment analysis, topic modeling and metaphor analysis, among others.

 

 



#2 EGOL

EGOL

    Professor

  • Hall Of Fame
  • 6,404 posts

Posted 18 April 2017 - 11:58 AM

Typically text mining is performed to learn about the groups or communities that produced the text, but its ultimate purpose depends on the field and the interests of the researcher.

 

I believe that an awful lot of text mining is done for purposes of professional spam and professional copyright infringement.

 

Try to search for a physician in a small community.  Instead of finding a physician, you will find professional spam. 

 

A lot of affiliate sites that promote retail products are produced by methods that are really professional copyright infringement but the text is professionally obfuscated to make them look otherwise. 



#3 iamlost

iamlost

    The Wind Master

  • Site Administrators
  • 5,517 posts

Posted 20 April 2017 - 07:46 PM

Dang nab it, Kim!
You have this talent for tossing out the most innocuous questions that actually require opening complex black boxes and checking on that blasted cat...

The short version: text (data) mining is a subset of data mining.

Text data mining (it's not text itself so much as text data that is mined) is deriving high level (aka more than the words themselves or what the words themselves might be conveying) information. Once mined textual analysis is the identification of statistical patterns from the text rather than from it's structure. Often when people say text mining they mean both the mining and the analysis.

I first became intrigued by the possibilities back in high school when I read an article about how scholars had determined that certain books of the Bible had been written by the same author(s). A long time latter as I became fascinated by Natural Language programming and all that stems from and surrounds it I learned that one could with ~75% accuracy determine the decade within the past century that a given published (in English) work was written (language changes over time). That given a few examples from a given author identify that writer's other work ~85% of the time. Etc.

Note: better percentages these days.

So, for instance, using EGOL's spam example, one could quite easily send emails written as EGOL that would read as what EGOL would write if EGOL was inclined to recommend some flighty SEO company or shady investment opportunity... add a trifling bit of identity/contact mining... and EGOL gets the blame...
Note: proving a negative after the fact is a difficult proposition.
Note: so should you get a desperate email from EGOL needing emergency medical funds because his luxury airship was caught in a hydrogen sulphide plume while overflying the Appalachians...

While the tempo and interest in text data mining has recently picked up much of what the academics are now discussing, researching is actually stuff that was live 10-20 years ago. It's just that there was, as there still is, a commercial use that has, for once, left the academics well back in what's possible. For perhaps a year or three, then another few until some plug-n-play api makes it look easy... hawked to every MFA WP idjit... sigh.

For one valuable use: it is well known that people feel inclined to like/trust folks that speak as they do, sales types have long been able to switch jargon/accent to match customers... well, similarly with text. If the copy reads as they tend to write they are more likely to accept/agree/recommend... I've been serving coupon offers in the best determined match from the 8-major North American English dialects (as text this is not pronunciation rather vocabulary and grammar) for years, affiliate pre-sell pages for a couple. It easily doubles conversions when got right, drops a bit when got wrong. So overall still worth the doing for folks running their own system(s); difficult/impossible to get enough oomph on most shared hosts...
Note: I've talked about this in years past with regard to video.

Note: match video testimonials with similar speaking visitors to maximise just about everything...

Note: you gotta know your visitors...

 

iamlost: text data miner/analyser/leverager for ~20 years...
 



#4 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 14,819 posts

Posted 21 April 2017 - 09:00 AM

For one valuable use: it is well known that people feel inclined to like/trust folks that speak as they do, sales types have long been able to switch jargon/accent to match customers... well, similarly with text. If the copy reads as they tend to write they are more likely to accept/agree/recommend... 

 

 

There's a paper out that unfortunately is not free to the public but I have access to it as a member...anyway, it is called "

Don't be deceived: Using linguistic analysis to learn how to discern online review authenticity"

and ...

 

This article uses linguistic analysis to help users discern the authenticity of online reviews. Two related studies were conducted using hotel reviews as the test case for investigation. The first study analyzed 1,800 authentic and fictitious reviews based on the linguistic cues of comprehensibility, specificity, exaggeration, and negligence. The analysis involved classification algorithms followed by feature selection and statistical tests. A filtered set of variables that helped discern review authenticity was identified. 

 

 

I'll have to round up the hard copy I have of it but I do find this whole thing interesting.  For example, conversions content writing is largely ignored and in its place is marketing writing, which is not how we talk to each other and has an agenda. 

 

If Google were to analyze how we voice search, we don't usually say, "Ok Google. Show me all the local pizza places that have the best deals on stromboli", but that's how the Internet "talks" to us.  If Google were to want to learn about us (machine learning) based on the data we search for, that's fine but I think they are not understanding WHY we are searching for the "keyword" data it's given.





RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users