hummingwolf (
hummingwolf) wrote2005-09-15 11:37 pm
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Google Blogsearch and You
Please bear with me: I'm going to try really, really hard to make sense, but there is still migraine residue clogging up my brain.
So anyway, the topic of this post is Google Blogsearch, a new feature from the Google Empire about which many people are excited and many other people are panicked. Go ahead, take a minute and search for yourself if you haven't already.
Now, some of you are geekily thrilled about seeing yourself all over the blogosphere. Have fun! Google has given you a nifty new toy to play with!
Those of you who are now scared are probably those who, on your LJ Edit Info page, have the Block Robots/Spiders from indexing your journal option checked and thought Google would respect that forever. The response from the folks at Google is essentially: "Oops, we made a boo-boo. We're deleting those posts from our database as fast as we can!" So wait a few days, search again, and see if the posts you didn't want to be easily found via a search are still searchable.
Now, the above is advice posted some places in LJ. But what Google (and probably other blog search sites) uses to index blogs here is a data feed--the LJ RSS feed in our case--which does not carry a tag telling spiders not to archive your data. You can minimize potential archiving by setting your journal's RSS feed to output just the subject lines of your entries. Do this by going to http://www.livejournal.com/admin/console and entering the following:
The bots will still be able to search your journal, but will only search entry titles rather than the bodies of entries.
Another issue is that there is a full-site RSS feed which big aggregators like Google can use. You can opt out of this, though, by going to http://www.livejournal.com/admin/console again and telling it
This should remove your stuff from the XML feeds, the "latest post" feeds and the "latest images" feed.
Also remember: Bots can index your public posts even if you later decide to make them private, but they cannot index your friends-only posts which have always been friends-only. If you do not want your post to be publicly available, set the appropriate security level before posting. [Edit:
mynn says in comments: "Another option is
Set your default posts to Friends Only and make them public after you're done posting.... One of the threads I was reading yesterday (through prettykate) indicated that some live journal clients post posts public, then go back, open it up, and set the privacy indicator." Never having used any posting method other than simple web update, I don't know anything about behavior of the clients, so I'll take her word for it.]
This has been a public service for anyone out there who didn't want their whole LiveJournal to be searchable by random internutters, but didn't want to go friends-only either. But the important thing to remember is this: If you don't want people reading your diary entries, don't post them on the Internet.
[Edit: Info for people who do want their journals indexed.]
If you actively want your journal to be archived, there should be no need to change your settings from the defaults. However, if you like to play around with all your LJ opportunities, here's what you would do:
In your Edit Info page, you would make sure that the box for "Block Robots/Spiders" is not checked.
In the Admin Console, enter the command
To tweak your syndication level (how much of your LJ gets into the automatically generated syndicated feed),
A new admin console command was added, set synlevel, which allows you to choose how much of your journal posts are syndicated via RSS feeds. can be "full", for the entire entry, "summary" for the first paragraph, or "title" for only the entry subject."
So anyway, the topic of this post is Google Blogsearch, a new feature from the Google Empire about which many people are excited and many other people are panicked. Go ahead, take a minute and search for yourself if you haven't already.
Now, some of you are geekily thrilled about seeing yourself all over the blogosphere. Have fun! Google has given you a nifty new toy to play with!
Those of you who are now scared are probably those who, on your LJ Edit Info page, have the Block Robots/Spiders from indexing your journal option checked and thought Google would respect that forever. The response from the folks at Google is essentially: "Oops, we made a boo-boo. We're deleting those posts from our database as fast as we can!" So wait a few days, search again, and see if the posts you didn't want to be easily found via a search are still searchable.
Now, the above is advice posted some places in LJ. But what Google (and probably other blog search sites) uses to index blogs here is a data feed--the LJ RSS feed in our case--which does not carry a tag telling spiders not to archive your data. You can minimize potential archiving by setting your journal's RSS feed to output just the subject lines of your entries. Do this by going to http://www.livejournal.com/admin/console and entering the following:
set synlevel title
The bots will still be able to search your journal, but will only search entry titles rather than the bodies of entries.
Another issue is that there is a full-site RSS feed which big aggregators like Google can use. You can opt out of this, though, by going to http://www.livejournal.com/admin/console again and telling it
set latest_optout yes
This should remove your stuff from the XML feeds, the "latest post" feeds and the "latest images" feed.
Also remember: Bots can index your public posts even if you later decide to make them private, but they cannot index your friends-only posts which have always been friends-only. If you do not want your post to be publicly available, set the appropriate security level before posting. [Edit:
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Set your default posts to Friends Only and make them public after you're done posting.... One of the threads I was reading yesterday (through prettykate) indicated that some live journal clients post posts public, then go back, open it up, and set the privacy indicator." Never having used any posting method other than simple web update, I don't know anything about behavior of the clients, so I'll take her word for it.]
This has been a public service for anyone out there who didn't want their whole LiveJournal to be searchable by random internutters, but didn't want to go friends-only either. But the important thing to remember is this: If you don't want people reading your diary entries, don't post them on the Internet.
[Edit: Info for people who do want their journals indexed.]
If you actively want your journal to be archived, there should be no need to change your settings from the defaults. However, if you like to play around with all your LJ opportunities, here's what you would do:
In your Edit Info page, you would make sure that the box for "Block Robots/Spiders" is not checked.
In the Admin Console, enter the command
set latest_optout no
To tweak your syndication level (how much of your LJ gets into the automatically generated syndicated feed),
A new admin console command was added, set synlevel
no subject
Thanks for the info; i don't so much mind my public entries seen by the teeming masses on the internet. Most of the more personal stuff (or things i wouldn't want family or potential employers to see) are friendslocked anyhoo. :)
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
Then in the console, one change would be "set latest_optout no" but I'm not sure of the original setting for the synlevel... Ahh, found it--the Google Blogsearch comes to the rescue. :-)
"A new admin console command was added, set synlevel , which allows you to choose how much of your journal posts are syndicated via RSS feeds. can be "full", for the entire entry, "summary" for the first paragraph, or "title" for only the entry subject."
So, if you do want
no subject
no subject
Another option is
Re: Another option is
it will, sort of
I assume (since email posts are f-only even if you say "public" and your default is f-only) that setting the default to f-only would fix that 'loophole' that may exisist with some client software, my evil twin. (I still think it's funny the number of sleep deprived folks on my friends list thought we were the same person; but I can be so random and obscure sometimes it's entirely plausible).
Re: it will, sort of
It's pretty funny that someone would mistake me for you. I never talk about Wolfie! For that matter, I rarely even talk about wolves. Maybe I should change that.
Re: Another option is
no subject
no subject
A blogsearch right now doesn't find you at all, so you really don't need to be that nervous. You might want to change sensitive entries as a precaution against any search engines less ethical than Google, but it's not something you absolutely need to do right away.
no subject