Debra Mastaler Makes Lemonade
Posted 17 February 2011 - 04:58 PM
The piece stands very well on it's own - well worth the read. However, a few additional points from my own 'do unto them what they do unto me but twice as hard for twice as long' rule.
* I touch all the bases when issuing a DMCA:
---the website registrant
---the website host
* because I strictly regulate, i.e. mostly block and bounce, bots, the Way Back Machine and the 'plagiarism checking tools' are not an option. However, few of you should have that problem. Instead, I have embedded telltales that can be directly searched, among other features.
* Debra mentions the value of including dates in the content. This is especially easy with blogs and well worth doing as a usability feature as well as documentary one. Not using that publishing format, as an initial defense I utilise the Dublin Core meta data header tags including:
<meta name = "DC.Creator" content = "iamlost">
<meta name = "DC.Date.Created" content = "2011-02-17">
<meta name = "DC.Date.Modified" content = "2011-02-17">
If you prefer the standard HTML meta tags provide Author, Copyright among others.
<meta name="Author" content="iamlost, email@example.com">
<meta name="copyright" content="Copyright 2011">
I also include a blanket footer copyright 'spread', i.e. Copyright © 2002 - 2011, which is NOT likely to be technically appropriate usage but has become standard usage.
Note: none of the above 'proves' anything legally because it is easily altered BUT every little bit helps.
* Debra mentions dating screenshot postings. A very good practice I recommend following. Every page that I upload gets 'shot', imprinted with a timestamp, and stored. Further I log every upload automatically.
* Debra contacted the infringer. I may do that where the infringement is minor, however, where the scraping is broad, i.e. as in Debra's example: almost all of the content from my training pages, I don't bother, immediately shooting to sink.
In the proactive prevention area besides what is mentioned above:
* I utilise <meta name=”robots” content=”noarchive”> so there is no SE public facing cached page to scrape.
* perhaps the greatest 'unknown' site scraping activity is via Google Translate. Remember that the same content in different languages is NOT generally viewed as duplicate content. Of course I block (return 403) any user-agent with translate, transcoder, or babelfish in the string. And known translation IPs are blocked.
I know that many people have little problem with allowing scrapers to re-publish their content without permission because many leave the absolute link strings in place. This is a personal and business model decision. I prefer to mess with their day and income.
Posted 17 February 2011 - 06:30 PM
Will look into "created dates" in WP. Must be there... I hope.
although I have moved a lot of stuff about, and that throws much out of the window.
For the on DMCA I did, I did give dates for all the formats of the article in question (I think 4 URLs were covered since its birth about 4 years ago). But only proof was a spreadsheed log I keep for changes and the dates in the old static html files.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users