hummingwolf: Drawing of a creature that is part-wolf, part-hummingbird. (Hummingwolf by Dandelion)
hummingwolf ([personal profile] hummingwolf) wrote2005-09-15 11:37 pm

Google Blogsearch and You

Please bear with me: I'm going to try really, really hard to make sense, but there is still migraine residue clogging up my brain.

So anyway, the topic of this post is Google Blogsearch, a new feature from the Google Empire about which many people are excited and many other people are panicked. Go ahead, take a minute and search for yourself if you haven't already.

Now, some of you are geekily thrilled about seeing yourself all over the blogosphere. Have fun! Google has given you a nifty new toy to play with!

Those of you who are now scared are probably those who, on your LJ Edit Info page, have the Block Robots/Spiders from indexing your journal option checked and thought Google would respect that forever. The response from the folks at Google is essentially: "Oops, we made a boo-boo. We're deleting those posts from our database as fast as we can!" So wait a few days, search again, and see if the posts you didn't want to be easily found via a search are still searchable.

Now, the above is advice posted some places in LJ. But what Google (and probably other blog search sites) uses to index blogs here is a data feed--the LJ RSS feed in our case--which does not carry a tag telling spiders not to archive your data. You can minimize potential archiving by setting your journal's RSS feed to output just the subject lines of your entries. Do this by going to http://www.livejournal.com/admin/console and entering the following:
set synlevel title

The bots will still be able to search your journal, but will only search entry titles rather than the bodies of entries.

Another issue is that there is a full-site RSS feed which big aggregators like Google can use. You can opt out of this, though, by going to http://www.livejournal.com/admin/console again and telling it
set latest_optout yes

This should remove your stuff from the XML feeds, the "latest post" feeds and the "latest images" feed.

Also remember: Bots can index your public posts even if you later decide to make them private, but they cannot index your friends-only posts which have always been friends-only. If you do not want your post to be publicly available, set the appropriate security level before posting. [Edit: [livejournal.com profile] mynn says in comments: "Another option is
Set your default posts to Friends Only and make them public after you're done posting.... One of the threads I was reading yesterday (through prettykate) indicated that some live journal clients post posts public, then go back, open it up, and set the privacy indicator." Never having used any posting method other than simple web update, I don't know anything about behavior of the clients, so I'll take her word for it.]




This has been a public service for anyone out there who didn't want their whole LiveJournal to be searchable by random internutters, but didn't want to go friends-only either. But the important thing to remember is this: If you don't want people reading your diary entries, don't post them on the Internet.


[Edit: Info for people who do want their journals indexed.]
If you actively want your journal to be archived, there should be no need to change your settings from the defaults. However, if you like to play around with all your LJ opportunities, here's what you would do:

In your Edit Info page, you would make sure that the box for "Block Robots/Spiders" is not checked.

In the Admin Console, enter the command
set latest_optout no

To tweak your syndication level (how much of your LJ gets into the automatically generated syndicated feed),
A new admin console command was added, set synlevel , which allows you to choose how much of your journal posts are syndicated via RSS feeds. can be "full", for the entire entry, "summary" for the first paragraph, or "title" for only the entry subject."
lindsaybits: (Default)

[personal profile] lindsaybits 2005-09-16 03:40 am (UTC)(link)
As me grandma would say: if you don't want something heard, don't say it.

Thanks for the info; i don't so much mind my public entries seen by the teeming masses on the internet. Most of the more personal stuff (or things i wouldn't want family or potential employers to see) are friendslocked anyhoo. :)
ext_3407: Dandelion's drawing of a hummingwolf (Hummingwolf by Dandelion)

[identity profile] hummingwolf.livejournal.com 2005-09-16 01:28 pm (UTC)(link)
Yeah. I've made the habit of being mostly public, but I like the option of being mostly un-indexed. Sort of a reflection of my ambivalence about having a public journal, I suppose. :-)

[identity profile] gurdonark.livejournal.com 2005-09-16 03:55 am (UTC)(link)
I will enjoy having this resource. I can never remember when I wrote x or y.
ext_3407: squiggly symbol floating over water (Cuddly plush toy)

[identity profile] hummingwolf.livejournal.com 2005-09-16 01:31 pm (UTC)(link)
So far I think the blogsearch indexes LJs from this year, with scattered journals indexed back a bit further. A search now brings up entries from [livejournal.com profile] gurdonpoems from 2003, but the earliest [livejournal.com profile] gurdonark entry found is from March 2005. I suspect they're still busy indexing, though.

[identity profile] compostwormbin.livejournal.com 2005-09-16 04:39 am (UTC)(link)
I like being able to do the searches but understand the privacy concerns. I know I'll be taking particular care with the friends designation on postings from now on. I'm trying to minimize my identifying info on my user info page too.
ext_3407: squiggly symbol floating over water (Kaleidocoolth)

[identity profile] hummingwolf.livejournal.com 2005-09-16 01:35 pm (UTC)(link)
I choose to be mostly public while telling uninvited robots to leave me alone. Guess it's a reflection of my ambivalence about having a public journal in the first place. The new Google thing doesn't bother me much, but I did notice other people on LJ freaking out about it.

[identity profile] a3hourtour.livejournal.com 2005-09-16 09:46 am (UTC)(link)
Thanks for the heads up and the instructions!
ext_3407: squiggly symbol floating over water (Cuddly plush toy)

[identity profile] hummingwolf.livejournal.com 2005-09-16 01:38 pm (UTC)(link)
You're welcome! I noticed some journalers freaking out about it, and thought it best to post what I found out before too much panicking occurs. Google seems to be keeping their word of deleting posts people didn't want indexed (mine disappeared overnight), so here's hoping cool heads prevail on LJ.

[identity profile] daisydumont.livejournal.com 2005-09-16 11:47 am (UTC)(link)
properties now set. thanks!
ext_3407: squiggly symbol floating over water (Kaleidoscope (purple & white))

[identity profile] hummingwolf.livejournal.com 2005-09-16 01:40 pm (UTC)(link)
You're welcome! Google seems to be doing their part and deleting the posts of people who didn't want to be indexed, so a search for "daisydumont" right now only finds entries of other people who have mentioned you.

[identity profile] daisydumont.livejournal.com 2005-09-16 01:44 pm (UTC)(link)
hmmmm. how do i set it back to default? hmmmm.
ext_3407: squiggly symbol floating over water (Default)

[identity profile] hummingwolf.livejournal.com 2005-09-16 01:49 pm (UTC)(link)
Set which back to default?

[identity profile] daisydumont.livejournal.com 2005-09-16 01:56 pm (UTC)(link)
i mean, i assume my setting those properties changed something, from a default setting. if i decide i want my lj to be indexable, how would i reset it?
ext_3407: squiggly symbol floating over water (Default)

[identity profile] hummingwolf.livejournal.com 2005-09-16 02:07 pm (UTC)(link)
Ahh. In your Edit Info page, you would uncheck the box for "Block Robots/Spiders."

Then in the console, one change would be "set latest_optout no" but I'm not sure of the original setting for the synlevel... Ahh, found it--the Google Blogsearch comes to the rescue. :-)

"A new admin console command was added, set synlevel , which allows you to choose how much of your journal posts are syndicated via RSS feeds. can be "full", for the entire entry, "summary" for the first paragraph, or "title" for only the entry subject."

So, if you do want [livejournal.com profile] daisydumont to be indexed by Google & the other search engines, now you know how to set things up.

[identity profile] daisydumont.livejournal.com 2005-09-16 04:40 pm (UTC)(link)
ok, thanks! not sure what i want to do about that, really.
ext_3407: Dandelion's drawing of a hummingwolf (Hummingwolf by Dandelion)

[identity profile] hummingwolf.livejournal.com 2005-09-16 06:39 pm (UTC)(link)
Yeah. This whole idea of a public journal is still strange to me. Mostly I don't mind having entries visible to other LJers, but I don't particularly want this journal to come up anytime someone does a search for, I dunno, chocolate or song lyrics or something.

Another option is

[identity profile] speck.livejournal.com 2005-09-16 12:20 pm (UTC)(link)
Set your default posts to Friends Only and make them public after you're done posting.
ext_3407: squiggly symbol floating over water (8 months)

Re: Another option is

[identity profile] hummingwolf.livejournal.com 2005-09-16 01:41 pm (UTC)(link)
Yep. Well, that won't keep public posts from being indexed, but it will keep locked posts locked!

it will, sort of

[identity profile] speck.livejournal.com 2005-09-16 02:04 pm (UTC)(link)
One of the threads I was reading yesterday (through prettykate) indicated that some live journal clients post posts public, then go back, open it up, and set the privacy indicator.

I assume (since email posts are f-only even if you say "public" and your default is f-only) that setting the default to f-only would fix that 'loophole' that may exisist with some client software, my evil twin. (I still think it's funny the number of sleep deprived folks on my friends list thought we were the same person; but I can be so random and obscure sometimes it's entirely plausible).

ext_3407: squiggly symbol floating over water (Cuddly plush toy)

Re: it will, sort of

[identity profile] hummingwolf.livejournal.com 2005-09-16 03:20 pm (UTC)(link)
Oh, okay, I see what you were talking about. Never used anything other than simple web update, so it didn't occur to me to wonder about what clients and e-mail posts do.

It's pretty funny that someone would mistake me for you. I never talk about Wolfie! For that matter, I rarely even talk about wolves. Maybe I should change that.

Re: Another option is

[identity profile] erigeneia.livejournal.com 2005-09-16 03:19 pm (UTC)(link)
Wow, this is completely off topic, but I love your icon. I always wondered how that worked!

[identity profile] erigeneia.livejournal.com 2005-09-16 03:21 pm (UTC)(link)
I did the things you said, but I'm still nervous. I don't want my entries indexed. Do you know of an easy way to make all my posts retroactively friends-only, or do I need to go through them one by one and change it?
ext_3407: squiggly symbol floating over water (Default)

[identity profile] hummingwolf.livejournal.com 2005-09-16 03:26 pm (UTC)(link)
There isn't any easy way to change the settings on all your posts--from everything I've read, you have to edit each entry, one at a time.

A blogsearch right now doesn't find you at all, so you really don't need to be that nervous. You might want to change sensitive entries as a precaution against any search engines less ethical than Google, but it's not something you absolutely need to do right away.

[identity profile] erigeneia.livejournal.com 2005-09-16 03:48 pm (UTC)(link)
Thanks muchly! I started changing them, one by one, and it's really interesting to go back and read old entries. :)