Hey Tamar, I forgot to mention a blog post I did on this a while back, "Content Thieves".
HMTKSteve
· 2 years ago
This is a touchy issue.
I publish my blog entries under a Creative Commons license that is visible on every page of my blog. I use partial feeds, not because I am against full feeds but due to the way the templates work on my blog it is easier to break a post into pieces to fit in the advertising blocks.
I was recently the victim of blog scraping and it took me a few days to find out and clean up the mess. In my case I would not have minded if the content used under the terms of my license, but it was not.
If you want to use someone's content just ask! You would be amazed at how many folks will say, "yes, just give me proper credit and link backs."
I wrote a short article on the release of the Wii News channel but I could not get screen caps at the time. I went to flikr, found some good photos and contacted the owner of those photos. In exchange for proper credit he allowed me to use the images.
Jonathan Bailey
· 2 years ago
Just a heads up, you can also use DMCA notices against the hosts of the scraper provided that they are hosted within the U.S., which most are. I usually consider that or Adsense to be my first stop, whichever is more relevant and would do the most damage.
Great article though, I'm glad to see that you and others like you are spreading awareness of this issue!
Let me know if I can be of any help.
Chris Matthieu
· 2 years ago
Coprighted works (both All Rights Reserved and Creative Commons) can be registered with Numly. Numly Numbers are used for verification purposes and can be embedded in your content.
Scrapers are not removing these numbers allowing readers to track down the real owner of the content.
Tamar Weinberg
· 2 years ago
Hi Jonathan: I took the Adsense approach. The particular site in question that Steve brought to my attention is trying to offer a "free bundle" of tools that can scrape content off a site in 2 minutes. And my requests for removing the content was ignored; in fact, later that day, they published THIS article to their blog (ironic, isn't it?)
Steve: We aren't published under CC at all. This is an "All rights reserved" company blog. I certainly don't mind my blog posts being quoted (with proper credit), but publishing content verbatim is definitely not something most bloggers will approve of. I certainly don't.
HMTKSteve
· 2 years ago
Tamar,
This is why technorati is your friend. If you embed an invisible link back to your blog inside your article it is very likely that you will see your scraped content providing links back to you when doing a technorati search.
I'm often reminded of a certain quote in American history,"John Marshall has made his decision; now let him enforce it!"
EasyEve
· 2 years ago
It is great to have a starting point to combat scraping.
Does any of this apply to websites (in addition to blogs)?
AndyBeard
· 2 years ago
Include lots of links back to your other content, tag pages etc
Have a clear license and other legal information in the footer of each post
Use a GPL license and thet them use the content. It is all links, and if you can't beat them, you might as well benefit from it.
The problem is that most scrapers are grabbing content from Technorati or news feeds which is a pain, because those services strip out the links.
But only do this if every single bit of your content is 100% your own, and you have rights to distrubute it in this fashion.
If you are using anyone elses content, such as photos, video etc, possibly the biggest problem is Google Reader, which is effectively designed to create splogs, or aggregated shared content, depending on your point of view.
Tyler
· 2 years ago
My favorite trick is to set up a Google Alert for the name of my blog, I've caught ALOT of stuff that way!
kelvin newman
· 2 years ago
I can't believe how little shame people have. I always like the image changing trick, that way you can spoil there scraping page with relative ease.
Tamar Weinberg
· 2 years ago
Nice tips, thanks guys.
I did the image changing trick... all images on the splog were plastered with 10e20 copyright images. :)
The others are quite useful as well.
I actually like Google's shared reader -- I don't know if I would categorize that as 'scraping' -- I found a lot of great blogs that way. The links do point back to the original blog and I end up subscribing to the ones I like!
infectious
· 2 years ago
I guess I am scraping your titles in my personal aggregator thingie. Would you like me to remove your records?
Tamar Weinberg
· 2 years ago
infectious: The concern isn't about aggregating content and linking to the site. Your aggregator does just that. What people are doing is taking images and text verbatim and putting it on their own blogs while claiming it as their own. This is the distinction I had hoped to make. There is nothing wrong with your content and you have built a rather cool application there. :)
HMTKSteve
· 2 years ago
Looks like that particular "auto-scraping" blog no longer scrapes your content!
Tamar Weinberg
· 2 years ago
Yup -- I noticed that this morning. They didn't answer the "please remove" email, so I suppose the hotlinking image threw them off. In any event, all's well that ends well.
infectious
· 2 years ago
OK, good. Thanks, I enjoy reading here at 10e20.com. :)
I publish my blog entries under a Creative Commons license that is visible on every page of my blog. I use partial feeds, not because I am against full feeds but due to the way the templates work on my blog it is easier to break a post into pieces to fit in the advertising blocks.
I was recently the victim of blog scraping and it took me a few days to find out and clean up the mess. In my case I would not have minded if the content used under the terms of my license, but it was not.
If you want to use someone's content just ask! You would be amazed at how many folks will say, "yes, just give me proper credit and link backs."
I wrote a short article on the release of the Wii News channel but I could not get screen caps at the time. I went to flikr, found some good photos and contacted the owner of those photos. In exchange for proper credit he allowed me to use the images.
Great article though, I'm glad to see that you and others like you are spreading awareness of this issue!
Let me know if I can be of any help.
Scrapers are not removing these numbers allowing readers to track down the real owner of the content.
Steve: We aren't published under CC at all. This is an "All rights reserved" company blog. I certainly don't mind my blog posts being quoted (with proper credit), but publishing content verbatim is definitely not something most bloggers will approve of. I certainly don't.
This is why technorati is your friend. If you embed an invisible link back to your blog inside your article it is very likely that you will see your scraped content providing links back to you when doing a technorati search.
I'm often reminded of a certain quote in American history,"John Marshall has made his decision; now let him enforce it!"
Does any of this apply to websites (in addition to blogs)?
Have a clear license and other legal information in the footer of each post
Use a GPL license and thet them use the content. It is all links, and if you can't beat them, you might as well benefit from it.
The problem is that most scrapers are grabbing content from Technorati or news feeds which is a pain, because those services strip out the links.
But only do this if every single bit of your content is 100% your own, and you have rights to distrubute it in this fashion.
If you are using anyone elses content, such as photos, video etc, possibly the biggest problem is Google Reader, which is effectively designed to create splogs, or aggregated shared content, depending on your point of view.
I did the image changing trick... all images on the splog were plastered with 10e20 copyright images. :)
The others are quite useful as well.
I actually like Google's shared reader -- I don't know if I would categorize that as 'scraping' -- I found a lot of great blogs that way. The links do point back to the original blog and I end up subscribing to the ones I like!