This article is written by Philipp Lenssen as part of the Blog Swap with Seth Finkelstein – Seth's article on 10 Things You Might Not Know About Censorware can be found at Philipp's blog.
Not too long ago, you couldn't enter more than 10 words into the Google search box. Or to be more precisely, you *could*, but subsequent words were ignored. I bet the Google founders were thinking "10 words ought to be enough for everyone," and mostly there were right – but for some advanced uses, especially with the Google Search API, a little more is helpful. Then, a while ago, Google increased the words limit to 32 words. This is probably OK for a few more years!
Another change is that Google ignores stop words nowadays. Stop words in search engines are words like "the" or "a" which are too tiny or common to be useful additions to most searches. However, Google is now accepting them as semi-normal words (one remaining difference being that they're not highlighted, or linked to the dictionary). This means in Google.com, you get different results when search for [the tale of a cowboy] vs [* tale * * cowboy] vs [tale cowboy]. (I'll be using square brackets around search queries – they're not to be included in the search.)
Another operator changed its functionality during the years; a couple of years ago, you could only query Google for [site:something.com], but not [site:something.com/something/]. Today, you can add folders to the site operator.
These days, everyone puts a Beta tag on their 2.0-ish web app. But did you know back in 1998, when Google launched their search, it was also in Beta? Take a look at a copy stored in the WayBack Machine to see it. Be aware the page might look quite ugly by today's standards... heck, it was probably ugly even back in 1998 (then again, so was my homepage in 1998!).
While no one outside Google knows for sure, it is often speculated that Google's PageRank value – the "authority rank" (or quantity of backlinks which themselves receive lots of backlinks) – is a much more precise number than the plain 1, 2, 3... 10 values. A float, not an integer, if you will.
So, for example, if you're looking at a site which shows a PageRank 8 in the Google Toolbar, its internal PageRank may be something like 8.355 (or however precise Google's number is). But we don't know for sure – maybe Google's algorithms prefer speed over quality when it comes to the recursive PR calculations of billions of pages. This calculation might not be a breeze even for Google's 10,000 - 200,000 computers (that's another number we can't be too sure of outside of Google).
I guess when you're an uber-geek, like Google founders Larry Page and Sergey Brin are, you are also very competitive (to the point of risk being arrogant towards slower thinkers, maybe). John Battelle in his book The Search (page 67/68), tells of how the two met at Stanford University in the summer of '95:
Like most schools, Stanford invites potential recruits to the campus for a tour. But it wasn't on the pastoral campus that Page met Brin – it was on the streets of San Francisco. Brin, a second-year student known to be gregarious, had signed up to be a student guide of sorts. His role that day was to show a group of prospective first-years around the City by the Bay.
Page ended up in Brin's group, but it wasn't exactly love at first sight. "Sergey is pretty social; he likes meeting people." Page recalls, contrasting that quality with his own reticence. "I thought he was pretty obnoxious. He had really strong opinions about things, and I guess I did, too."
"We both found each other obnoxious," Brin counters when I tell him of Page's response. "But we say it a little bit jokingly. Obviously we spent a lot of time talking to each other, so there was something there. We had a kind of bantering thing going."
You might have come across the official Google Blog. But did you know Google actually has 16 different – and all official – blogs (give or take one)? Here's the full list (I'm also collecting these all on one page):
You heard about how Google self-censors in China (e.g. human rights sites top-ranked by Google in other countries are missing in Google.cn). But did you know that Google showed censored search results in other countries for years, sometimes even without showing a disclaimer that something was missing? In Germany and France, that was the case.
You can see this for yourself if you first search Google.com for [site:ety.com]. This will result in 9,940 results. Now if you do the same search on Google.fr – Google France – you get zero results. However, there's a disclaimer at the bottom:
"In response to a legal request submitted to Google, we have removed 260 result(s) from this page. If you wish, you may read more about the request at ChillingEffects.org."
Note Google's disclaimer is showing the wrong number of missing pages – it 1,000s, not 260. Following the link to Chilling Effects, we see this text:
Google received complaints prior to March 2005 about URLs that are alleged to be illegal under U.S. or local law. In response to these complaints, one or more URLs that would have appeared for this search were not displayed.
In other words, Google is not censoring this out of their own belief, but by following government requests. Now what's ety.com anyway, except being one of the many censored domains? A quick glance will show it's some kind of stupid Nazi propaganda site, illegal by some country's standards. But you know what Voltaire said... "I may disagree with what you say, but I will defend to the death your right to say it."
Since around 2001, Google on their front-page were proud to show off the number of pages they search through... a number that went from a billion and a half to over 8 billion (according to Google). Today, Google doesn't show this number anymore. Maybe Googlers – that's what Google employees are called – realized that results quality beats results quantity. Or maybe they just realized that by sheer numbers, competitors were winning. In August 2005, Yahoo in their blog announced:
As it turns out we have grown our index and just reached a significant milestone at Yahoo! Search – our index now provides access to over 20 billion items (...) [including] over 19.2 billion web documents
Today, when you want to find out about the Google index size, there's a workaround though: search Google for ["* *"] – that's a good estimate. Right now, it's displaying 25,270,000,000 pages. In a direct comparison, when we search for "the" on both Google and Yahoo, Google shows a couple of billion pages more. Then again, these numbers are hard to verify – Google only lets us see the first 1000 results for each query. And in the end, who wants to see more than that anyway? Most people don't even go beyond the first 10 results, and rather adjust their search query instead!
If you're a developer utilizing the Google web search API, and you need way beyond the 1,000 requests per day Google offers by default, here's a tip: you can email the Google API support and request more hits for your API key. Depending on your projects and traffic needs, which you will have to outline, Google just might grant you the request!
While Google doesn't have its own comic book search engine, you can still achieve good results by going to Google Images, setting the file size to "Large images", and then searching for [comics]. Using this setting, you can also search for an artist's name, like ["john byrne"], ["john romita jr"], ["frank miller"] or ["daniel clowes"]. You might even have some fun adding your own speech bubbles to the comic book pages you find (use a free font like WebLetterer for best results)...
OK, so Writely – which Google recently acquired – is not really a chat, but an online word processor. However, by inviting others to your Writely document, you can group-edit any document... and see the changes by others merged into the document as you type! This feature allows you to chat with a group, and you can have fun with positioning text on different places on the screen, wiki-editing what others wrote, or adding colors and images.By Seth Finkelstein | posted in google | on May 15, 2006 10:26 AM (Infothought permalink)