:: wikimiki.org ::
| Search Engine Spammer |
Search engine spammerSpamdexing or search engine spamming is the practice of deliberately and dishonestly manipulating search engines to increase the chance of a website or page being placed close to the beginning of search engine results, or to influence the category to which the page is assigned in a dishonest manner. Many designers of web pages try to get a good ranking in search engines and design their pages accordingly. Spamdexing refers exclusively to practices that are dishonest and mislead search and indexing programs to give a page a ranking it does not deserve.
People who do this are called search engine spammers. The word is a portmanteau of spamming and indexing (as well as a pun on spandex.)
Search engines use a variety of algorithms to determine relevancy ranking. Some of these include determining whether the search term appears in the META keywords tag, others whether the search term appears in the body text of a web page. A variety of techniques are used to spamdex, including listing chosen keywords on a page in small-point font face the same colour as the page background (rendering it invisible to humans but not search engine web crawlers).
Search engine spammers are generally aware that the content that they promote is not very useful or relevant to the ordinary internet surfer. They try to use methods that will make the website appear above more relevant websites in the search engine listings. The rise of spamdexing in the mid-1990s made the leading search engines of the time less useful, and the success of Google at both producing better search results and combating keyword spamming, through its reputation-based PageRank link analysis system, led directly to its becoming the dominant search site late in the decade, where it remains. However, while it has not been rendered useless by spamdexing, Google has not been immune to more sophisticated methods either. Spamdexing on Google has generated the term [http://www.firstmonday.org/issues/issue10_10/tatum/index.html Google bombing].
Techniques
Here are some common spamdexing techniques:
- Hidden or invisible text
- Disguising keywords and phrases by making them the same (or almost the same) colour as the background, using a tiny font size or hiding them within the HTML code such as "no frame" sections, ALT attributes and "no script" sections. This is useful to make a page appear to be relevant in a way that makes it more likely to be found. Example: A promoter of a Ponzi scheme wants to attract web surfers to a site where he advertises his scam. He places hidden text appropriate for a fan page of a popular music group on his page, hoping that the page will be listed as a fan site and receive many visits from music lovers.
- Keyword stuffing (also known as keyword spamming)
- Repeated use of a word to increase its frequency on a page. Older versions of indexing programs simply counted how often a keyword appeared, and used that to determine relevance levels. Most modern search engines have the ability to analyze a page for Keyword stuffing and determine whether the frequency is above a "normal" level.
- Meta tag stuffing
- Repeating keywords in the Meta tags, and using keywords that are unrelated to the site's content.
- Hidden links
- Putting links where visitors will not see them in order to increase link popularity.
- Mirror websites
- Hosting of multiple websites all with the same content but using different URLs. Some search engines give a higher rank to results where the keyword searched for appears in the URL.
- Gateway or doorway pages
- Creating low-quality web pages that contain very little content but are instead stuffed with very similar key words and phrases. They are designed to rank highly within the search results. A doorway page will generally have "click here to enter" in the middle of it.
- Page redirects
- Taking the user to another page without his or her intervention, e.g. using META refresh tags, CGI scripts, Java, JavaScript, Server side redirects or server side techniques.
- Cloaking
- Sending to a search engine a version of a web page different from what web surfers see.
- Code swapping
- Optimizing a page for top ranking, then swapping another page in its place once a top ranking is achieved.
- Link spamming
- Link spam takes advantage of Google's PageRank algorithm, which gives a higher ranking to a website the more other websites link to it. A spammer may create multiple web sites at different domain names that all link to each other. Another technique is to take advantage of web applications such as weblogs and wikis that display hyperlinks submitted by anonymous or pseudonymous users. Link farms are another technique.
- Referrer log spamming
- When someone accesses a web page, i.e. the referee, by following a link from another web page, i.e. the referrer, the referee is given the address of the referrer by the person's internet browser. Some websites have a referrer log which shows which pages link to that site. By having a robot randomly access many sites enough times, with a message or specific address given as the referrer, that message or internet address then appears in the referrer log of those sites that have referrer logs. Since some search engines base the importance of sites by the number of different sites linking to them, referrer-log spam may be used to increase the search engine rankings of the spammer's sites, by getting the referrer logs of many sites to link to them.
Spamdexing often gets confused with legitimate search engine optimization (SEO) techniques, which do not involve deceit.
Spamming involves getting web sites more exposure than they deserve for their keywords, leading to unsatisfactory search results. Optimization involves getting web sites the rank they deserve on the most targeted keywords, leading to satisfactory search experiences. To be sure, there is much gray area between the two extremes. The root problem is that search-engine administrators and website builders have different agendas: the search engine wants to present valuable search results; the webmaster just wants to come up first, particularly if he/she runs a commercial website and needs visitor Traffic from search engines and directories. For that reason, many search-engine administrators say that any form of search engine optimization used to improve a website's page rank is nothing else than spamdexing.
Many search engines check for instances of spamdexing and will remove suspect pages from their indexes.
In 2002, search engine manipulator SearchKing filed suit in an Oklahoma court against the search engine Google. SearchKing's claim was that Google's tactics to prevent spamdexing constituted an unfair business practice. This may be compared to lawsuits which email spammers have filed against spam-fighters, as in various cases against MAPS and other DNSBLs. In January of 2003, the court pronounced a summary judgment in Google's favor. [http://research.yale.edu/lawmeme/modules.php?name=Downloads&d_op=search&query=SearchKing]
See also TrustRank
External links
- [http://www.google.com/webmasters/guidelines.html Google's Webmaster Guidelines page]
- [http://help.yahoo.com/help/us/ysearch/indexing/index.html Yahoo's Search Engine Indexing page]
- [http://search.msn.com/docs/siteowner.aspx MSN Search's Site Owner page]
- [http://tool.motoricerca.info/spam-detector/ Online tool that detects spam techniques on web pages]
- [http://kailashnadh.name/docs/spam_blog A paper explaining various methods to determine webpage/blog spam]
- [http://splogspot.com A public, searchable database of blog spam pages]
- [http://airweb.cse.lehigh.edu/2005/#proceedings AIRWeb' 05: First Workshop on Adversarial Information Retrieval on the Web] - Research on search engine spamming
Category:Internet terminology
Category:Spamming
Category:World Wide Web
Category:Computing portmanteaus
Category:Search engine optimization
ja:検索エンジンスパム
Portmanteau:For other uses, see (disambiguation).
A portmanteau (plural: portmanteaus or portmanteaux) is a term in linguistics that refers to a word or morpheme that fuses two or more grammatical functions. A folk usage of portmanteau refers to a word that is formed by combining both sounds and meanings from two or more words. In linguistics, these false portmanteaux are called blends.
Etymology
This word was coined by Lewis Carroll in Through the Looking-Glass, and What Alice Found There, in which it is likened to the French word "porte-manteaux" for a type of travelling case or suitcase. Carroll has Humpty Dumpty say, "Well, slithy means lithe and slimy... You see it's like a portmanteau—there are two meanings packed up into one word." Carroll used such words to humorous effect in his poems, especially Jabberwocky, which Humpty Dumpty is explaining to Alice.
"Portmanteau word" was the original phrase used to describe such words (as listed in dictionaries published as late as the early 1990s), but this has since been abbreviated to simply "portmanteau" as the term (and the type of words it describes) gained popularity. "Portmanteau" is rarely used for its original meaning in current English, that type of travelling case having fallen into disuse. In Queensland, Australia, it is shortened to 'Port', and used as slang for a schoolbag.
Portmanteau morphemes
A portmanteau morpheme is a morpheme that fuses two grammatical categories (see Fusional language). The classical example of such a morpheme in English is the verbal suffix -s. This particular suffix carries (i.e., ports) at least four distinct inflectional meanings and imparts each of these onto the verb's meaning:
- Singular (number)
- Third-person (person)
- Present (tense)
- Indicative (mood)
Spanish verb suffixes are also exceptionally fusional, with very many portmanteaux in the Spanish inflectional system.
Portmanteau words
A portmanteau word is a word that fuses two function words. This use overlaps a bit with the folk term contraction, but linguists tend to avoid using the latter.
Folk usage
Outside linguistics, the words that are called blends are popularly labeled portmanteaux. The term portmanteau is used in a different, yet still not clearly defined sense, to refer to a blending of the parts of two or more words (generally the first part of one word and the ending of a second word) to combine their meanings into a single neologism.
See also
- List of portmanteaus
- Neologism, word, term, or phrase which has been recently created
- Contraction (grammar)
- Corruption (grammar)
- Morphology (linguistics)
- Rhyme
External links
- [http://creativityforyou.com/portman.html Portmanteau Words]
-
Category:Linguistics
ja:%E3%81%8B%E3%81%B0%E3%82%93%E8%AA%9E
SpandexSpandex or elastane is a synthetic fiber known for its exceptional elasticity (stretchability). It is stronger and more durable than rubber, its major plant competitor. It was invented in 1959 by DuPont, and when first introduced it revolutionized many areas of the clothing industry.
Spandex is the preferred name in North America and Australia, while elastane is most often used elsewhere. It turns out that "spandex" was coined from an anagram of "expands." A well-known trademark for spandex or elastane is INVISTA's brand name Lycra; another trademark (also INVISTA's) is Elaspan, Dorlastan (Bayer), Linel (Fillattice).
Spandex fiber characteristics
Spun from a block copolymer, these fibers exploit the high crystallinity and hardness of polyurethane segments, yet remain "rubbery" due to alternating segments of polyethylene glycol[http://www.psrc.usm.edu/macrog/uresyn.htm]. This yields the following combination of materials properties:
- can be stretched over 500% without breaking
- able to be stretched repetitively and still recover original length
- lightweight
- abrasion resistant
- poor strength, but stronger and more durable than rubber
- soft, smooth, and supple
- resistant to body oils, perspiration, lotions, and detergents
- no static or pilling problems
Major spandex fiber uses
- Apparel and clothing articles where stretch is desired, generally for comfort and fit, such as:
- athletic, aerobic, and exercise apparel
- wetsuits
- swimsuits/bathing suits
- competitive swimwear
- brassiere straps and bra side panels
- ski pants
- slacks
- hosiery
- leggings
- socks
- belts
- Compression garments such as:
- surgical hose
- support hose
- bicycle pants
- foundation garments
- Shaped garments such as bra cups
-
Production
The U.S. Federal Trade Commission definition for spandex fiber is "A manufactured fiber in which the fiber-forming substance is a long chain synthetic polymer comprised of at least 85 percent of a segmented polyurethane".
First U.S. commercial spandex fiber production: 1959, DuPont Company
Current U.S. spandex fiber producers: INVISTA; Bayer Corporation; [http://www.radicispandex.com RadiciSpandex Corporation]
Fiction
In comic books, superheroes and superheroines commonly wear costumes made of spandex.
See also
- Textile
- Spandex fetishism
- Darlex
External links
- [http://www.elaspan.com/ Elaspan® spandex] – Company website
- [http://www.lycra.com/ Lycra® spandex] – Company website
- [http://www.radicispandex.com/ RadiciSpandex] – Company website
Category: Organic polymers
Category:Textiles
Algorithm
In mathematics and computer science an algorithm (the word is derived from the name of the Persian mathematician Al-Khwarizmi) is a finite set of well-defined instructions for accomplishing some task which, given an initial state, will terminate in a corresponding recognizable end-state (contrast with heuristic). Algorithms can be implemented by computer programs, although often in restricted forms; mistakes in implementation and limitations of the computer can prevent a computer program from correctly executing its intended algorithm.
The concept of an algorithm is often illustrated by the example of a recipe, although many algorithms are much more complex; algorithms often have steps that repeat (iterate) or require decisions (such as logic or comparison). Correctly performing an algorithm will not solve a problem if the algorithm is flawed or not appropriate to the problem. For example, a hypothetical algorithm for making a potato salad will fail if there are no potatoes present, even if all the motions of preparing the salad are performed as if the potatoes were there.
Different algorithms may complete the same task with a different set of instructions in more or less time, space, or effort than others. For example, given two different recipes for making potato salad, one may have peel the potato before boil the potato while the other presents the steps in the reverse order, yet they both call for these steps to be repeated for all potatoes and end when the potato salad is ready to be eaten.
Certain countries, such as the USA, controversially allow some algorithms to be patented, provided a physical embodiment is possible (for example, a multiplication algorithm may be embodied in the arithmetic unit of a microprocessor).
Formalized algorithms
Algorithms are essential to the way computers process information, because a computer program is essentially an algorithm that tells the computer what specific steps to perform (in what specific order) in order to carry out a
specified task, such as calculating employees’ paychecks or printing students’ report cards. Thus, an algorithm can be considered to be any sequence of operations which can be performed by a Turing-complete system.
Typically, when an algorithm is associated with processing information, data is read from an input source or device, written to an output sink or device, and/or stored for further use. Stored data is regarded as part of the internal state of the entity performing the algorithm.
For any such computational process, the algorithm must be rigorously defined: specified in the way it applies in all possible circumstances that could arise. That is, any conditional steps must be systematically dealt with, case-by-case; the criteria for each case must be clear (and computable).
Because an algorithm is a precise list of precise steps, the order of computation will almost always be critical to the functioning of the algorithm. Instructions are usually assumed to be listed explicitly, and are described as starting 'from the top' and going 'down to the bottom', an idea that is described more formally by flow of control.
So far, this discussion of the formalisation of an algorithm has assumed the premises of imperative programming. This is the most common conception, and it attempts to describe a task in discrete, 'mechanical' means. Unique to this conception of formalized algorithms is the assignment operation, setting the value of a variable. It derives from the intuition of 'memory' as a scratchpad. There is an example below of such an assignment.
See functional programming and logic programming for alternate conceptions of what constitutes an algorithm.
Implementation
Algorithms are not only implemented as computer programs, but often also by other means, such as in a biological neural network (for example, the human brain implementing arithmetic or an insect relocating food), or in electric circuits or in a mechanical device.
The analysis and study of algorithms is one discipline of computer science, and is often practiced abstractly (without the use of a specific programming language or other implementation). In this sense, it resembles other mathematical disciplines in that the analysis focuses on the underlying principles of the algorithm, and not on any particular implementation. One way to embody (or sometimes codify) an algorithm is the writing of pseudocode.
Some writers restrict the definition of algorithm to procedures that eventually finish. Others include procedures that could run forever without stopping, arguing that some entity may be required to carry out such permanent tasks. In the latter case, success can no longer be defined in terms of halting with a meaningful output. Instead, terms of success that allow for unbounded output sequences must be defined. For example, an algorithm that verifies if there are more zeros than ones in an infinite random binary sequence must run forever to be effective. If it is implemented correctly, however, the algorithm's output will be useful: for as long as it examines the sequence, the algorithm will give a positive response while the number of examined zeros outnumber the ones, and a negative response otherwise. Success for this algorithm could then be defined as eventually outputting only positive responses if there are actually more zeros than ones in the sequence, and in any other case outputting any mixture of positive and negative responses.
Example
One of the simplest algorithms is to find the largest number in an (unsorted) list of numbers. The solution necessarily requires looking at every number in the list, but only once at each. From this follows a simple algorithm:
# Look at each item in the list. If it is larger than any that has been seen so far, make a note of it.
# The latest noted item is the largest in the list when the process is complete.
And here is a more formal coding of the algorithm in pseudocode:
Algorithm LargestNumber
Input: A non-empty list of numbers L.
Output: The largest number in the list L.
largest ← -∞
for each item in list L, do
if the item > largest, then
largest ← the item
return largest
Notes on notation:
- "←" is a loose shorthand for "changes to". For instance, with "largest ← the item", it means that the largest number found so far changes to this item.
- "return" terminates the algorithm and outputs the value listed behind it.
As it happens, most people who implement algorithms want to know how much of a particular resource (such as time or storage) a given algorithm requires. Methods have been developed for the analysis of algorithms to obtain such quantitative answers; for example, the algorithm above has a time requirement of O(n), using the big O notation with n as the length of the list. At all times the algorithm only needs to remember a single value; the largest number found so far. Therefore this algorithm has a space requirement of O(1). (Note that the size of the inputs is not counted as space used by the algorithm.)
For a more complex example see Euclid's algorithm.
History
Euclid's algorithm
The word algorithm comes from the name of the 9th century Persian mathematician Abu Abdullah Muhammad bin Musa al-Khwarizmi. The word algorism originally referred only to the rules of performing arithmetic using Hindu-Arabic numerals but evolved into algorithm by the 18th century. The word has now evolved to include all definite procedures for solving problems or performing tasks.
The first case of an algorithm written for a computer was Ada Byron's notes on the analytical engine written in 1842, for which she is considered by many to be the world's first programmer. However, since Charles Babbage never completed his analytical engine the algorithm was never implemented on it.
The lack of mathematical rigor in the "well-defined procedure" definition of algorithms posed some difficulties for mathematicians and logicians of the 19th and early 20th centuries. This problem was largely solved with the description of the Turing machine, an abstract model of a computer formulated by Alan Turing, and the demonstration that every method yet found for describing "well-defined procedures" advanced by other mathematicians could be emulated on a Turing machine (a statement known as the Church-Turing thesis).
Nowadays, a formal criterion for an algorithm is that it is a procedure that can be implemented on a completely-specified Turing machine or one of the equivalent formalisms. Turing's initial interest was in the halting problem: deciding when an algorithm describes a terminating procedure. In practical terms computational complexity theory matters more: it includes the problems called NP-complete, which are generally presumed to take more than polynomial time for any (deterministic) algorithm. NP denotes the class of decision problems that can be solved by a non-deterministic Turing machine in polynomial time.
Classes
There are many ways to classify algorithms, and the merits of each classification have been the subject of ongoing debate.
One way of classifying algorithms is by their design methodology or paradigm. There is a certain number of paradigms, each different from the other. Furthermore, each of these categories will include many different types of algorithms. Some commonly found paradigms include:
- Divide and conquer. A divide and conquer algorithm repeatedly reduces an instance of a problem to one or more smaller instances of the same problem (usually recursively), until the instances are small enough to solve easily.
- Dynamic programming. When a problem shows optimal substructure, meaning the optimal solution to a problem can be constructed from optimal solutions to subproblems, and overlapping subproblems, meaning the same subproblems are used to solve many different problem instances, we can often solve the problem quickly using dynamic programming, an approach that avoids recomputing solutions that have already been computed. For example, the shortest path to a goal from a vertex in a weighted graph can be found by using the shortest path to the goal from all adjacent vertices.
- The greedy method. A greedy algorithm is similar to a dynamic programming algorithm, but the difference is that solutions to the subproblems do not have to be known at each stage; instead a "greedy" choice can be made of what looks best for the moment.
- Linear programming. When solving a problem using linear programming, the program is put into a number of linear inequalities and then an attempt is made to maximize (or minimize) the inputs. Many problems (such as the maximum flow for directed graphs) can be stated in a linear programming way, and then be solved by a 'generic' algorithm such as the Simplex algorithm.
- Search and enumeration. Many problems (such as playing chess) can be modelled as problems on graphs. A graph exploration algorithm specifies rules for moving around a graph and is useful for such problems. This category also includes the search algorithms and backtracking.
- The probabilistic and heuristic paradigm. Algorithms belonging to this class fit the definition of an algorithm more loosely.
# Probabilistic algorithms are those that make some choices randomly (or pseudo-randomly); for some problems, it can in fact be proved that the fastest solutions must involve some randomness.
# Genetic algorithms attempt to find solutions to problems by mimicking biological evolutionary processes, with a cycle of random mutations yielding successive generations of 'solutions'. Thus, they emulate reproduction and "survival of the fittest". In genetic programming, this approach is extended to algorithms, by regarding the algorithm itself as a 'solution' to a problem. Also there are
# Heuristic algorithms, whose general purpose is not to find a optimal solution, but an approximate solution where the time or resources to find a perfect solution are not practical. An example of this would be local search, taboo search, or simulated annealing algorithms, a class of heuristic probabilistic algorithms that vary the solution of a problem by a random amount. The name 'simulated annealing' alludes to the metallurgic term meaning the heating and cooling of metal to achieve freedom from defects. The purpose of the random variance is to find close to globally optimal solutions rather than simply locally optimal ones, the idea being that the random element will be decreased as the algorithm settles down to a solution.
Another way to classify algorithms is by implementation. A recursive algorithm is one that invokes (makes reference to) itself repeatedly until a certain condition matches, which is a method common to functional programming. Algorithms are usually discussed with the assumption that computers execute one instruction of an algorithm at a time. Those computers are sometimes called serial computers. An algorithm designed for such an environment is called a serial algorithm, as opposed to parallel algorithms, which take advantage of computer architectures where several processors can work on a problem at the same time. The various heuristic algorithms would probably also fall into this category, as their name (e.g. a genetic algorithm) describes its implementation.
See also
- Algorism
- Approximation algorithms
- Cryptography
- Data structure
- Genetic algorithm
- List of algorithms
- Merge algorithms
- Numerical analysis
- Randomised algorithm
- Search algorithm
- Sort algorithm
- String algorithms
- Timeline of algorithms
- Wikibooks:Algorithms
References
- Important algorithm-related publications
External links
-
- Algana.co.uk [http://www.algana.co.uk/ Algorithm analysis and puzzles.] Free source code for algorithm Java applets and C++ modules
- Gaston H. Gonnet and Ricardo Baeza-Yates: Example programs from [http://www.dcc.uchile.cl/~rbaeza/handbook/ Handbook of Algorithms and Data Structures.] Free source code for many important algorithms.
- [http://www.nist.gov/dads/ Dictionary of Algorithms and Data Structures]. "This is a dictionary of algorithms, algorithmic techniques, data structures, archetypical problems, and related definitions."
- [http://www.nr.com Numerical Recipes]
- [http://www.algosort.com/ Computer Programming Algorithms Directory]
- [http://dmoz.org/Computers/Algorithms/ Computers/Algorithms @ dmoz.org]
- [http://groups-beta.google.com/group/algogeeks "Algogeeks" Google Group] - Discuss ideas, algorithms, challenges related to programming. Also announcements about Online Programming Contests will be posted in this group.
- [http://musicalgorithms.ewu.edu/ Musicalgorithms] An interesting way of using algorithms to make music.
- [http://www.algorithmist.com/ The Algorithmist] is a resource dedicated to anything algorithms - from the practical realm, to the theoretical realm. There are also links and explanation to problem sets.
- [http://www.algogeeks.com/ AlgoGeeks] Community of algorithm enthusiastics actively involved in algorithm discussions, knowledge, problems and solutions, thus creating resource for algorithms, puzzles and problems
-
Category:Discrete mathematics
Category:Computer science
Category:Mathematical logic
Category:Arabic words
ko:알고리즘
ja:アルゴリズム
th:อัลกอริทึม
Meta tagsMeta tags are used to provide structured data about data.
Use in web pages
The most popular use is in web pages and is similar in many ways to the information provided in traditional library catalogue records.
Words which are often misspelled are often added as keywords. For example, accommodation and millennium are misspelled almost as often as they are spelt correctly. Adding these misspellings as keywords is therefore of potential value.
Commercial uses
x Meta tags have been the focus of a field of marketing research known as search engine optimization or SEO. In the mid to late 1990s, search engines were reliant on Meta tag data to correctly classify a web page. Webmasters quickly learned the commercial significance of having the right Meta tag, as it frequently led to a high ranking in the search engines - and thus, high traffic to the web site.
As search engine traffic achieved greater significance in online marketing plans, consultants were brought in who were well versed in how search engines perceive a web site. These consultants used a variety of techniques (legitimate and otherwise) to improve ranking for their clients. You can see any web site's Meta tags by viewing the source code.
The two most popular Meta tags used for search engine optimization is the Meta description tag and Meta keyword tag. The Meta description tag allows web site owners to describe what the web page is about. Search engines such as Google still use this to display the description of a web site in the search results. The Meta keyword tag allows site owners to put keywords in it to help search engines decide what a web site is about.
In the early 2000s, search engines veered away from reliance on Meta tags, as many web sites used inappropriate keywords or were keyword stuffing to obtain any and all traffic possible.
Some search engines, however, still take Meta tags into some consideration when delivering results. In recent years, search engines have become smarter, penalizing web sites that are cheating (by repeating the same keyword several times to get a boost in the search ranking). Instead of going up in the ranking, these sites will go down in ranking or on some search engines, will be kicked out of the search engine completely.
Alternative to meta tags
An alternative to meta tags for enhanced subject access within a website is the use of a back-of-book-style index for the website. See examples at the websites of the Australian Society of Indexers ([http://www.aussi.org www.aussi.org]) and the American Society of Indexers ([http://www.asindexing.org www.asindexing.org]).
See also
- HTML element
- Metadata
External links
- [http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.4 HTML 4.01 Specification: Meta data]
- [http://www.hotwebtools.com/meta Meta data generator]
Category:HTML
Body textBody text is the text on an web page which appears between the opening "" and closing "" tags that delimit the body section of the document. The tags themselves are not required if the document is HTML.
It may include headers and other text, and is usually text which a visitor to the page reads, but it may not always be visible, depending on formatting techniques used by the page designer.
KeywordKeyword may mean:
- Keyword (computer), an identifier in a computer language that indicates a specific command
- Keyword (linguistics), a word that occurs with unexpected frequency in a text
- Keyword (America Online), an addressing scheme used on America Online as an alternative to URLs
- A word describing a concept found in a document such as a Web page, constituting part of the metadata for the document
Web crawler
:See WebCrawler for the specific search engine of that name.
A web crawler (also known as a web spider or ant) is a program which browses the World Wide Web in a methodical, automated manner. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches.
A web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit. As it visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, recursively browsing the Web according to a set of policies.
Crawling policies
There are two important characteristics of the Web that generate a scenario in which web crawling is very difficult: its large volume and its rate of change, as there is a huge amount of pages being added, changed and removed every day. Also, network speed has improved less than current processing speeds and storage capacities.
The large volume implies that the crawler can only download a fraction of the Web pages within a given time, so it needs to prioritize its downloads. The high rate of change implies that by the time the crawler is downloading the last pages from a site, it is very likely that new pages have been added to the site, or that pages have already been updated or even deleted.
As Edwards et al. note, "Given that the bandwidth for conducting crawls is neither infinite nor free it is becoming essential to crawl the Web in a not only scalable, but efficient way if some reasonable measure of quality or freshness is to be maintained." (Edwards et al., 2001). A crawler must carefully choose at each step which pages to visit next.
The behavior of a web crawler is the outcome of a combination of policies:
- A selection policy that states which pages to download.
- A re-visit policy that states when to check for changes to the pages.
- A politeness policy that states how to avoid overloading websites.
- A parallelization policy that states how to coordinate distributed web crawlers.
Selection policy
Given the current size of the Web, even large search engines cover only a portion of the publicly available content; a study by Lawrence and Giles (Lawrence and Giles, 2000) showed that no search engine indexes more than 16% of the Web. As a crawler always downloads just a fraction of the Web pages, it is highly desirable that the downloaded fraction contains the most relevant pages, and not just a random sample of the Web.
This requires a metric of importance for prioritizing Web pages. The importance of a page is a function of its intrinsic quality, its popularity in terms of links or visits, and even of its URL (the latter is the case of vertical search engines restricted to a single top-level domain, or search engines restricted to a fixed Website). Designing a good selection policy has an added difficulty: it must work with partial information, as the complete set of Web pages is not known during crawling.
Cho et al. (Cho et al., 1998) made the first study on policies for crawling scheduling. Their data set was a 180,000-pages crawl from the stanford.edu domain, in which a crawling simulation was done with different strategies. The ordering metrics tested were breadth-first, backlink-count and partial Pagerank calculations. One of the conclusions was that if the crawler wants to download pages with high Pagerank early during the crawling process, then the partial Pagerank strategy is the better, followed by breadth-first and backlink-count. However, these results are for just a single domain.
Najork and Wiener (Najork and Wiener, 2001) performed an actual crawl on 328 million pages, using breadth-first ordering. They found that a breadth-first crawl captures pages with high Pagerank early in the crawl (but they did not compare this strategy against other strategies). The explanation given by the authors for this result is that "the most important pages have many links to them from numerous hosts, and those links will be found early, regardless of on which host or page the crawl
originates".
Abiteboul (Abitebout et al., 2003) designed a crawling strategy based on an algorithm called OPIC (On-line Page Importance Computation). In OPIC, each page is given an initial sum of "cash" which is distributed equally among the pages it points to. It is similar to a Pagerank computation, but it is faster and is only done in one step. An OPIC-driven crawler downloads first the pages in the crawling frontier with higher amounts of "cash". Experiments were carried in a 100,000-pages synthetic graph with a power-law distribution of in-links. However, there was no comparison with other strategies nor experiments in the real Web.
Boldi et al. (Boldi et al., 2004) used simulation on subsets of the Web of 40 million pages from the .it domain and 100 million pages from the WebBase crawl, testing breadth-first against random ordering and an omniscient strategy. The winning strategy was breadth-first, although a random ordering also performed
surprisingly well. One problem is that the WebBase crawl is biased to the crawler used to gather the data. They also showed how bad Pagerank calculations carried on partial subgraphs of the Web, obtained during crawling, can approximate the actual Pagerank.
Baeza-Yates et al. (Baeza-Yates et al., 2005) used simulation on two subsets of the Web of 3 million pages from the .gr and .cl domain, testing several crawling strategies. They showed that both the OPIC strategy and a strategy that uses the length of the per-site queues are both better than breadth-first crawling, and that it is also very effective to use a previous crawl, when it is available, to guide the current one.
Focused crawling: The importance of a page for a crawler can also be expressed as a function of the similarity of a page to a given query. This is called "focused crawling" and was introduced by Chakrabarti et al. (Chakrabarti et al., 1999).
The main problem in focused crawling is that in the context of a web crawler, we would like to be able to predict the similarity of the text of a given page to the query before actually downloading the page. A possible predictor is the anchor text of links; this was the approach taken by Pinkerton (Pinkerton, 1994) in a
crawler developed in the early days of the Web. Diligenti et al. (Diligenti et al., 2000) propose to use the complete content of the pages already visited to infer the similarity between the driving query and the pages that have not been visited yet. The performance of a focused crawling depends mostly on the richness of links in the specific topic being searched, and a focused crawling usually relies on a general Web search engine for providing starting points.
A related problem for selecting pages is that nowadays, most pages on the Web are generated submitting queries to databases, and those pages cannot be ignored. See deep web.
Re-visit policy
The Web has a very dynamic nature, and crawling a fraction of the Web can take a long time, usually measured in weeks or months. By the time a web crawler has finished its crawl, many events could have happened. These events can include creations, updates and deletions.
From the search engine's point of view, there is a cost associated with not detecting an event, and thus having an outdated copy of a resource. The most used cost functions, introduced in (Cho and Garcia-Molina, 2000), are freshness and age.
Freshness: This is a binary measure that indicates whether the local copy is accurate or not. The freshness of a page p in the repository at time t is defined as:
Age This is a measure that indicates how outdated the local copy is. The age of a page p in the repository, at time t is defined as:
deep web
Coffman et al. (Edward G. Coffman, 1998) worked with a definition of the objective of a web crawler that is equivalent to freshness, but use a different wording: they propose that a crawler must minimize the fraction of time pages remain outdated. They also noted that the problem of web crawling can be modeled as a multiple-queue, single-server polling system, on which the web crawler is the server and the websites are the queues. Page modifications are the arrival of the customers, and switch-over times are the interval between page accesses to a single website. Under this model, mean waiting time for a customer in the polling system is equivalent to the average age for the web crawler.
The objective of the crawler is to keep the average freshness of pages in its collection as high as possible, or to keep the average age of pages as low as possible. These objectives are not equivalent: in the first case, the crawler is just concerned with how many pages are out-dated, while in the second case, the crawler is concerned with how old the local copies of pages are.
Two simple re-visiting policies were studied by Cho and Garcia-Molina (Cho and Garcia-Molina, 2003):
Uniform policy: This involves re-visiting all pages in the collection with the same frequency, regardless of their rates of change.
Proportional policy: This involves re-visiting more often the pages that change more frequently. The visiting frequency is directly proportional to the (estimated) change frequency.
(In both cases, the repeated crawling order of pages can be done either at random or with a fixed order.)
Cho and Garcia-Molina proved the surprising result that, in terms of average freshness, the uniform policy outperforms the proportional policy in both a simulated Web and a real Web crawl. The explanation for this result comes from the fact that, when a page changes too often, the crawler will waste time by trying to re-crawl it too fast and still will not be able to keep its copy of the page fresh.
To improve freshness, we should penalize the elements that change too often (Cho and Garcia-Molina, 2003a). The optimal re-visiting policy is neither the uniform policy nor the proportional policy. The optimal method for keeping average freshness high includes ignoring the pages that change too often, and the optimal
for keeping average age low is to use access frequencies that monotonically (and sub-linearly) increase with the rate of change of each page. In both cases, the optimal is closer to the uniform policy than to the proportional policy: as Coffman et al. (Edward G. Coffman, 1998) note, "in order to minimize the expected
obsolescence time, the accesses to any particular page should be kept as evenly spaced as possible". Explicit formulas for the re-visit policy are not attainable in general, but they are obtained numerically, as they depend on the distribution of page changes. (Cho and Garcia-Molina, 2003a) show that the exponential distribution is a good fit for describing page changes, while (Ipeirotis et al., 2005) show how to use statistical tools to discover paramters that affect this distribution. Note that the re-visiting policies considered here regard
all pages as homogeneous in terms of quality ("all pages on the Web are worth the same"), something that is not a realistic scenario, so further information about the Web page quality should be included to achieve a better crawling policy.
Politeness policy
As noted by Koster (Koster, 1995), the use of Web robots is useful for a number of tasks, but comes with a price for the general community. The costs of using Web robots include:
- Network resources, as robots require considerable bandwidth, and operate with a high degree of parallelism during a long period of time.
- Server overload, especially if the frequency of accesses to a given server is too high.
- Poorly written robots, which can crash servers or routers, or which download pages they cannot handle.
- Personal robots that, if deployed by too many users, can disrupt networks and Web servers.
A partial solution to these problems is the robots exclusion protocol, also known as the robots.txt protocol (Koster, 1996) that is a standard for administrators to indicate which parts of their Web servers should not be accessed by robots. This standard does not include a suggestion for the interval of visits to the same server, even though this interval is the most effective way of avoiding server overload.
The first proposal for the interval between connections was given in (Koster, 1993) and was 60 seconds. However, if we download pages at this rate from a website with more than 100,000 pages over a perfect connection with zero latency and infinite bandwidth, it would take more than 2 months to download only that entire website; also, we would be using a fraction of the resources from that Web server permanently. This does not seem acceptable.
Cho (Cho and Garcia-Molina, 2003) uses 10 seconds as an interval for accesses, and the WIRE crawler (Baeza-Yates and Castillo, 2002) uses 15 seconds as the default. The MercatorWeb crawler (Heydon and Najork, 1999) follows an adaptive politeness policy: if it took t seconds to download a document from a given server, the crawler waits for 10 - t seconds before downloading the next page. Dill et al. (Dill et al., 2002) use 1 second.
Anecdotal evidence from access logs shows that access intervals from known crawlers vary between 20 seconds and 3–4 minutes. It is worth noticing that even when being very polite, and taking all the safeguards to avoid overloading Web servers, some complaints from Web server administrators are received. Brin and Page note that: "... running a crawler which connects to more than half a million servers (...) generates a fair amount of email and phone calls. Because of the vast number of people coming on line, there are always those who do not know what a crawler is, because this is the first one they have seen." (Brin and Page, 1998).
Parallelization policy
A parallel crawler is a crawler that runs multiple process in parallel. The goal is to maximize the download rate while minimizing the overhead from parallelization and to avoid repeated downloads of the same page. To avoid downloading the same page more than once, the crawling system requires a policy for assigning the new URLs discovered during the crawling process, as the same URL can be found by two different crawling processes. Cho and Garcia-Molina (Cho and Garcia-Molina, 2002) studied two types of policy:
Dynamic assignment: With this type of policy, a central server assigns new URLs to different crawlers dynamically. This allows the central server to, for instance, dynamically balance the load of each
crawler.
With dynamic assignment, typically the systems can also add or remove downloader processes. The
central server may become the bottleneck, so most of the workload must be transferred to the distributed
crawling processes for large crawls.
There are two configurations of crawling architectures with dynamic assignment that have been described
by Shkapenyuk and Suel (Shkapenyuk and Suel, 2002):
- A small crawler configuration, in which there is a central DNS resolver and central queues per website, and distributed downloaders.
- A large crawler configuration, in which the DNS resolver and the queues are also distributed.
Static assignment: With this type of policy, there is a fixed rule stated from the beginning of the crawl that defines how to assign new URLs to the crawlers.
For static assignment, a hashing function can be used to transform URLs (or, even better, complete website names) into a number that corresponds to the index of the corresponding crawling process. As there are external links that will go from a website assigned to one crawling process to a website assigned to a different crawling process, some exchange of URLs must occur.
To reduce the overhead due to the exchange of URLs between crawling processes, the exchange should be done in batch, several URLs at a time, and the most cited URLs in the collection should be known by all crawling processes before the crawl (e.g.: using data from a previous crawl) (Cho and Garcia-Molina, 2002).
An effective assignment function must have three main properties: each crawling process should get approximately the same number of hosts (balancing property), if the number of crawling processes grows, the number of hosts assigned to each process must shrink (contra-variance property), and the assignment must be able to add and remove crawling processes dynamically. Boldi et al. (Boldi et al., 2004) propose to use consistent hashing, which replicates the buckets, so adding or removing a bucket does not requires re-hashing of the whole table to achieve all of the desired properties.
Web crawler architectures
DNS
A crawler must have a good crawling strategy, as noted in the previous sections, but it also needs a highly optimized architecture. Shkapenyuk and Suel (Shkapenyuk and Suel, 2002) noted that: "While it is fairly easy to build a slow crawler that downloads a few pages per second for a short period of time, building a high-performance system that can download hundreds of millions of pages over several weeks presents a number of challenges in system design, I/O and network efficiency, and robustness and manageability."
Web crawlers are a central part of search engines, and details on their algorithms and architecture are kept as business secrets. When crawler designs are published, there is often an important lack of detail that prevents other from reproducing the work. There are also emerging concerns about "search engine spamming", which prevent major search engines from publishing their ranking algorithms.
Examples of web crawlers
The following is a list of published crawler architectures for general-purpose crawlers (excluding focused web crawlers), with a brief description that includes the names given to the different components and outstanding features:
RBSE (Eichmann, 1994) was the first published web crawler. It was based on two programs: the first program, "spider" maintains a queue in a relational database, and the second program "mite", is a modified www ASCII browser that downloads the pages from the Web.
WebCrawler (Pinkerton, 1994) was used to build the first publicly-available full-text index of a sub-set of theWeb. It was based on lib-WWW to download pages, and another program to parse and order URLs for breadth-first exploration of the Web graph. It also included a real-time crawler that followed links based on the similarity of the anchor text with the provided query.
World Wide Web Worm (McBryan, 1994) was a crawler used to build a simple index of document titles and URLs. The index could be searched by using the grep Unix command.
Internet Archive Crawler (Burner, 1997) is a crawler designed with the purpose of archiving periodic snapshots of a large portion of the Web. It uses several processes in a distributed fashion, and a fixed number of websites are assigned to each process. The inter-process exchange of URLs is carried in
batch with a long time interval between exchanges, as this is a costly process. The Internet Archive Crawler also has to deal with the problem of changing DNS records, so it keeps an historical archive
of the hostname to IP mappings.
[http://www.cs.cmu.edu/~rcm/websphinx/ WebSPHINX] (Miller and Bharat, 1998) is composed of a Java class library that implements multi-threaded
Web page retrieval and HTML parsing, and a graphical user interface to set the starting URLs, to
extract the downloaded data and to implement a basic text-based search engine.
Google Crawler (Brin and Page, 1998) is described in some detail, but the reference is only about an early version of its architecture, which was based in C++ and Python. The crawler was integrated with the
indexing process, because text parsing was done for full-text indexing and also for URL extraction.
There is an URL server that sends lists of URLs to be fetched by several crawling processes. During
parsing, the URLs found were passed to a URL server that checked if the URL have been previously
seen. If not, the URL was added to the queue of the URL server.
CobWeb (da Silva et al., 1999) uses a central "scheduler" and a series of distributed "collectors". The
collectors parse the downloaded Web pages and send the discovered URLs to the scheduler, which in
turn assign them to the collectors. The scheduler enforces a breadth-first search order with a politeness
policy to avoid overloading Web servers. The crawler is written in Perl.
Mercator (Heydon and Najork, 1999) is a modular web crawler written in Java. Its modularity arises from the usage of interchangeable "protocol modules" and "processing modules". Protocols modules are
related to how to acquire the Web pages (e.g.: by HTTP), and processing modules are related to how to process Web pages. The standard processing module just parses the pages and extract new URLs, but other processing modules can be used to index the text of the pages, or to gather statistics from the Web.
WebFountain (Edwards et al., 2001) is a distributed, modular crawler similar to Mercator but written in C++. It features a "controller" machine that coordinates a series of "ant" machines. After repeatedly downloading pages, a change rate is inferred for each page and a non-linear programming method must be used to solve the equation system for maximizing freshness. The authors recommend to use
this crawling order in the early stages of the crawl, and then switch to a uniform crawling order, in which all pages being visited with the same frequency.
PolyBot [Shkapenyuk and Suel, 2002] is a distributed crawler written in C++ and Python, which is composed of a "crawl manager", one or more "downloaders" and one or more "DNS resolvers". Collected URLs are added to a queue on disk, and processed later to search for seen URLs in batch mode. The politeness policy considers both third and second level domains (e.g.: www.example.com and
www2.example.com are third level domains) because third level domains are usually hosted by the same Web server.
WebRACE (Zeinalipour-Yazti and Dikaiakos, 2002) is a crawling and caching module implemented in Java,
and used as a part of a more generic system called eRACE. The system receives requests from users for
downloading Web pages, so the crawler acts in part as a smart proxy server. The system also handles
requests for "subscriptions" to Web pages that must be monitored: when the pages change, they must
be downloaded by the crawler and the subscriber must be notified. The most outstanding feature of
WebRACE is that, while most crawlers start with a set of "seed" URLs, WebRACE is continuously
receiving new starting URLs to crawl from.
[http://ubi.imc.pi.cnr.it/projects/ubicrawler/ Ubicrawler] (Boldi et al., 2004) is a distributed crawler written in Java, and it has no central process. It is composed of a number of identical "agents"; and the assignment function is calculated using consistent
hashing of the host names. There is zero overlap, meaning that no page is crawled twice, unless
a crawling agent crashes (then, another agent must re-crawl the pages from the failing agent). The
crawler is designed to achieve high scalability and to be tolerant to failures.
FAST Crawler (Risvik and Michelsen, 2002) is the crawler used by the FAST search engine, and a general
description of its architecture is available. It is a distributed architecture in which each machine
holds a "document scheduler" that maintains a queue of documents to be downloaded by a "document
processor" that stores them in a local storage subsystem. Each crawler communicates with the other crawlers via a "distributor" module that exchanges hyperlink information.
[http://www.cwr.cl/projects/WIRE WIRE] (Baeza-Yates and Castillo, 2002) is a web crawler written in C++, including several policies for scheduling the page downloads and a module for generating reports and statistics on the downloaded pages so it has been used for Web characterization.
In addition to the specific crawler architectures listed above, there are general crawler architectures published by Cho (Cho and Garcia-Molina, 2002) and Chakrabarti (Chakrabarti, 2003). Also, a few web crawlers have been released under Open Source Licences such as the Apache License or the GNU public license: [http://larbin.sourceforge.net/index-eng.html Larbin], DataparkSearch, Nutch, [http://www-diglib.stanford.edu/~testbed/doc2/WebBase/ WebBase], a free version of [http://www.cs.cmu.edu/~rcm/websphinx/ WebSPHINX], [http://www.grub.org/ GRUB] and [http://www.htdig.org/ HTDig].
References
See also: Google, PageRank, Data mining
- Serge Abiteboul, Mihai Preda, and Gregory Cobena. [http://www-db.deis.unibo.it/courses/SI-LS/papers4pres/APC03.pdf Adaptive on-line page importance computation.] In Proceedings of the twelfth international conference on World Wide Web, pages 280-290, Budapest, Hungary, 2003. ACM Press.
- Baeza-Yates, R. and Castillo, C. (2002). [http://www.dcc.uchile.cl/~ccastill/papers/baeza02balancing.pdf Balancing volume, quality and freshness in web crawling]. In Soft Computing Systems – Design, Management and Applications, pages 565–572, Santiago, Chile. IOS Press Amsterdam.
- Baeza-Yates, R., Castillo, C., Marin, M. and Rodriguez, A. (2005). [http://www.dcc.uchile.cl/%7Eccastill/papers/baeza05_crawling_country_better_breadth_first_web_page_ordering.pdf Crawling a Country: Better Strategies than Breadth-First for Web Page Ordering]. In Proceedings of the Industrial and Practical Experience track of the 14th conference on World Wide Web, pages 864–872, Chiba, Japan. ACM Press.
- Boldi, P., Codenotti, B., Santini, M., and Vigna, S. (2004a). [http://vigna.dsi.unimi.it/ftp/papers/UbiCrawler.pdf UbiCrawler: a scalable fully distributed Web crawler]. Software, Practice and Experience, 34(8):711–726.
- Boldi, P., Santini, M., and Vigna, S. (2004b). [http://vigna.dsi.unimi.it/ftp/papers/ParadoxicalPageRank.pdf Do your worst to make the best: Paradoxical effects in pagerank incremental computations]. In Proceedings of the third Workshop on Web Graphs (WAW), volume 3243 of Lecture Notes in Computer Science, pages 168-180, Rome, Italy. Springer.
- Brin, S. and Page, L. (1998). [http://www-db.stanford.edu/~backrub/google.html The anatomy of a large-scale hypertextual Web search engine]. Computer Networks and ISDN Systems, 30(1-7):107–117.
- Burner, M. (1997). [http://www.webtechniques.com/archives/1997/05/burner/ Crawling towards eternity – building an archive of the World Wide Web]. Web Techniques, 2(5).
- Chakrabarti, S. (2003). [http://www.cs.berkeley.edu/~soumen/mining-the-web/ Mining the Web]. Morgan Kaufmann Publishers. ISBN 1558607544
- Chakrabarti, S., van den Berg, M., and Dom, B. (1999). [http://www.fxpal.com/people/vdberg/pubs/www8/www1999f.pdf Focused crawling: a new approach to topic-specific web resource discovery]. Computer Networks, 31(11–16):1623–1640.
- Junghoo Cho, Hector Garcia-Molina, and Lawrence Page. [http://www.csd.uch.gr/~hy558/papers/cho-order.pdf Efficient crawling through URL ordering]. In Proceedings of the seventh conference on World Wide Web, Brisbane, Australia, April 1998.
- Cho, J. and Garcia-Molina, H. (2000). [http://www.cs.brown.edu/courses/cs227/2002/cache/Cho.pdf Synchronizing a database to improve freshness]. In Proceedings of ACM International Conference on Management of Data (SIGMOD), pages 117-128, Dallas, Texas, USA.
- Cho, J. and Garcia-Molina, H. (2002). [http://rose.cs.ucla.edu/~cho/papers/cho-parallel.pdf Parallel crawlers]. In Proceedings of the eleventh international conference on World Wide Web, pages 124–135, Honolulu, Hawaii, USA. ACM Press.
- Cho, J. and Garcia-Molina, H. (2003). [http://rose.cs.ucla.edu/~cho/papers/cho-tods03.pdf Effective page refresh policies for web crawlers]. ACM Transactions on Database Systems, 28(4).
- Cho, J. and Garcia-Molina, H. (2003). [http://rose.cs.ucla.edu/~cho/papers/cho-freq.pdf Estimating frequency of change]. ACM Transactions on Internet Technology, 3(3).
- Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. L., and Gori, M. (2000). [http://nautilus.dii.unisi.it/pubblicazioni/files/conference/2000-Diligenti-VLDB.pdf Focused crawling using context graphs]. In Proceedings of 26th International Conference on Very Large Databases (VLDB), pages 527-534, Cairo, Egypt.
- Dill, S., Kumar, R., Mccurley, K. S., Rajagopalan, S., Sivakumar, D., and Tomkins, A. (2002). [http://alme1.almaden.ibm.com/cs/people/mccurley/pdfs/fractal.pdf Self-similarity in the web]. ACM Trans. Inter. Tech., 2(3):205–223.
- Eichmann, D. (1994). [http://mingo.info-science.uiowa.edu/eichmann/www94/Spider.ps The RBSE spider: balancing effective search against Web load]. In Proceedings of the First World Wide Web Conference, Geneva, Switzerland.
- Edward G. Coffman, Z. Liu, R. W. (1998). Optimal robot scheduling for Web search engines. Journal of Scheduling, 1(1):15–29.
- Jenny Edwards, Kevin S. McCurley, and John A. Tomlin. [http://www.almaden.ibm.com/webfountain/resources/AdaptiveModelforCrawling.pdf An adaptive model for optimizing performance of an incremental web crawler]. In Proceedings of the Tenth Conference on World Wide Web, pages 106–113, Hong Kong, May 2001. Elsevier Science.
- Heydon, A. and Najork, M. (1999). [http://www.cindoc.csic.es/cybermetrics/pdf/68.pdf Mercator: A scalable, extensible Web crawler]. World Wide Web Conference, 2(4):219–229.
- Ipeirotis, P., Ntoulas, A., Cho, J., Gravano, L. (2005) [http://pages.stern.nyu.edu/~panos/publications/icde2005.pdf Modeling and managing content changes in text databases]. In Proceedings of the 21st IEEE International Conference on Data Engineering, pages 606-617, April 2005, Tokyo.
- Koster, M. (1993). [http://www.robotstxt.org/wc/guidelines.html Guidelines for robots writers].
- Koster, M. (1995). Robots in the web: threat or treat ? ConneXions, 9(4).
- Koster, M. (1996). [http://www.robotstxt.org/wc/exclusion.html A standard for robot exclusion].
- Steve Lawrence and C. Lee Giles. [http://www.nature.com/doifinder/10.1038/21987 Accessibility of information on the web]. Intelligence,11(1):32–39, 2000.
- McBryan, O. A. (1994). GENVL and WWWW: Tools for taming the web. In Proceedings of the First World Wide Web Conference, Geneva, Switzerland.
- Miller, R. and Bharat, K. (1998). [http://www-2.cs.cmu.edu/~rcm/papers/www7/www7.html Sphinx: A framework for creating personal, site-specific web crawlers]. In Proceedings of the seventh conference on World Wide Web, Brisbane, Australia. Elsevier Science.
- Marc Najork and Janet L. Wiener. [http://xobjects.seu.edu.cn/xobjects/www10/papers/pdf/p208.pdf Breadth-first crawling yields high-quality pages]. In Proceedings of the Tenth Conference on World Wide Web, pages 114–118, Hong Kong, May 2001. Elsevier Science.
- Pinkerton, B. (1994). [http://archive.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/pinkerton/WebCrawler.html Finding what people want: Experiences with the WebCrawler]. In Proceedings of the First World Wide Web Conference, Geneva, Switzerland.
- Shkapenyuk, V. and Suel, T. (2002). [http://cis.poly.edu/tr/tr-cis-2001-03.pdf Design and implementation of a high performance distributed web crawler]. In Proceedings of the 18th International Conference on Data Engineering (ICDE), pages 357-368, San Jose, California. IEEE CS Press.
- da Silva, A. S., Veloso, E. A., Golgher, P. B., Ribeiro-Neto, B. A., Laender, A. H. F., and Ziviani, N. (1999). [http://www.dcc.fua.br/~alti/pubs/spire99_cow.ps.gz Cobweb – a crawler for the Brazilian web]. In Proceedings of String Processing and Information Retrieval (SPIRE), pages 184–191, Cancun, Mexico. IEEE CS Press.
- Zeinalipour-Yazti, D. and Dikaiakos, M. D. (2002). [http://www.cs.ucr.edu/~csyiazti/downloads/papers/ngits02/ngits02.pdf Design and implementation of a distributed crawler and filtering processor]. In Proceedings of the Fifth Next Generation Information Technologies and Systems (NGITS), volume 2382 of Lecture Notes in Computer Science, pages 58–74, Caesarea, Israel. Springer.
External links
- [http://www.robotstxt.org/wc/robots.html The Web Robots Page]
- [http://www.hotwebtools.com/crawler Meta Data crawler used to view HTTP, header and other elements]
Category:Internet terminology
Category:World Wide Web
ja:クローラ
1990s
The 1990s refers to the years 1990 to 1999; the last decade of the 20th Century. The 90s were marked with rapid progression of globalization and global capitalism following the collapse of the Soviet Union and the end of the Cold War. Key forces shaping the decade were the Gulf War; popularization of Personal Computer and Internet leading to the dot.com boom.
Events and trends
While optimism and hopes were high following the collapse of Communism, the backlash of the Cold War's effect was only beginning, precipitating the continuation of terrorism in Third World regions that were once the frontlines for American and Soviet foreign politics, particularly in Asia. However, during the 1990s many First World economies such as the United States, Canada, Ireland, Australia, and South Korea experienced steady economic growth for nearly the entire decade. The United Kingdom, after the recession of 1991-92 and Black Wednesday, experienced a run of 51 consecutive quarters of economic growth that stretched into the new millenium. Even less affluent nations such as Malaysia saw tremendous improvements in economic prosperity and quality of life during the 1990s.
Many countries, institutions, companies, and organizations also viewed the 90s decade as "a prosperous time", meaning that almost all of them rebounded after many years of failure. Some examples include Apple Computer's revival of power after being at the edge of bankruptcy, breakthroughs in many fields of technology that includes the Internet, virtual reality. Oil and Gas was discovered in many countries and Pope John Paul II's papacy reached its peak.
Nevertheless, the 1990s brought tragic conflicts as well, like the Balkan Wars, the Rwandan genocide, the Battle of Mogadishu in Somalia and the first Gulf War.
Criticism/Backlash of the Decade
Despite denials from various sociologists and media pundits, some feel that the 90s were an era of increasing materialism and growing hypocrisy continued from the 1980s. In general it could still be said that the mindset of the 1980s and 1990s were more or less the same. The 1990s are also widely critized for their controversial pop culture obsessed with gore, sex, violence, and language, along with the 2000s to a somewhat lesser extent. The 1990s nonetheless have a very positive receiving into the 2000s and are still considered quite "modern" even as of 2006, with many genres of media from the decade still being quite cool among youth during the 2000s as no great revolutions in pop culture have occurred for some time and only moderate backlash of the decade itself has yet occurred. Also, while not a criticism of the decade per se, some people see the 1990s as the beginning of the 21st Century rather than the end of the 20th Century in an abstract sense based on the fact that the Cold War, a definitive phoenomenon of the 20th Century, was over by about 1991 and the tech boom began to take off a couple years after, and very 21st Century events such as the rise of the Internet and other information technologies and the expansion of Islamic terrorism began to become prominent in the 1990s.
Technology
Internet]
- The Pentium processor is developed by Intel.
- Microsoft introduces Windows 95 to the market, which gained immediate popularity.
- Explosive growth of the Internet, decrease in the cost of computers and other technology.
- Advancements with computer modems, ISDN, cable modems and DSL lead to faster connection to the Internet.
- The development of web browsers such as Netscape and Internet Explorer makes surfing the World Wide Web easier and more user friendly.
- The Java programming language is developed by Sun Microsystems.
- Businesses begin E-commerce websites; companies such as Amazon.com, eBay, AOL, and Yahoo! grew rapidly on the Internet.
- Cell phones burst in popularity and decrease in size, becoming a necessity for modern life.
- Pagers and PDAs become popular communication tools.
- E-mail becomes popular; as a result Microsoft acquires the popular Hotmail.com.
- Year 2000 problem (commonly known as Y2K).
- Microsoft Windows operating system becomes virtually ubiquitous on IBM PCs.
- Development of free operating system Linux is started.
- Breakthrough of compact disc technology, introduced in the 1980s, later branching into DVD.
Science
DVD]
- Detection of extrasolar planets orbiting stars other than the sun.
- The cloning of Dolly the sheep is achieved.
- Human Genome Project begun.
- DNA identification of individuals finds wide application in criminal law.
- Hubble Space Telescope launched in 1990; revolutionizes astronomy.
- Protease inhibitors introduced allowing HAART therapy against HIV; drastically reduces AIDS mortality.
- NASA's spacecraft Pathfinder lands on Mars and deploys a small roving vehicle, Sojourner, that analyzes the planet's geology and atmosphere.
- The Hale-Bopp comet swings past the sun for the first time in 4,200 years.
- Development of biodegradable products, replacing products made from styrofoam; advanced methods for recycling of waste products (such as paper, glass, aluminum) are developed.
- Genetically engineered crops are developed for commercial use.
- Discovery of dark matter, dark energy, and brown dwarves, and first confirmation of black holes.
- The Galileo probe orbits Jupiter, studying the planet and its moons extensively.
War, peace, and politics
Jupiter]
Jupiter]
- Reunification of Germany on October 3 1990.
- End of apartheid in South Africa (1990) and election of ANC government of Nelson Mandela.
- Gulf War (resulting from Iraq's invasion of Kuwait) and United Nations embargo on Iraq in 1991.
- North Yemen and South Yemen merge to form Yemen (1991).
- Break up of the Soviet Union in 1991 - the end of the Cold War, United States as sole world superpower.
- The bombing of the World Trade Center in 1993 by an explosive-filled van leads to awareness of international terrorism as a rising threat.
- Eritrea gains independence from Ethiopia (1993).
- European Union is declared in 1992.
- Military actions in Somalia in 1993 lead to questions of the United States' role as a policing officer of the world. (see also, Black Hawk Down).
- Rwandan genocide kills one million people, in 1994.
- The birth of the "Second Republic" in Italy, with the Mani Pulite investigations of 1994.
- Peace process begins in Northern Ireland in 1995
- Balkan war in former Yugoslavia in 1995.
- A decade of women presidents in the Republic of Ireland.
- The United Kingdom hands sovereignty of Hong Kong to the People's Republic of China on July 1, 1997.
- U.S. Congressman Newt Gingrich crafts his manifesto "Contract with America", leading his Republican Party to become the controlling majority in the U.S. House of Representatives.
- U.S. president Bill Clinton's sex scandal with Monica Lewinsky and his impeachment trial in 1998, which lasts the entire year.
- Anti-globalization protests.
- The Second Congo War start in 1998 in central Africa and includes 5 different cultures and 7 different nations. It goes on until 2002.
- In May 1999, Pakistan sends troops covertly to occupy strategic peaks in Kashmir. A month later the Kargil War with India results in a political fiasco for Nawaz Sharif, followed by a military withdrawal to the Line of Control. The incident leads to a military coup in October in which the Prime Minister Nawaz Sharif is ousted by Army Chief Pervez Musharraf.
- Portugal hands sovereignty of Macau to the People's Republic of China on December 20, 1999.
Economics
- Development of GATT, the World Trade Organization and other global economic institutions.
- The North American Free Trade Agreement (NAFTA), which phases out trade barriers between the United States, Mexico and Canada is signed into law by U.S. President Bill Clinton.
- After 1992 the booming of the US stock market, in reference to which Alan Greenspan coined the memorable phrase "irrational exuberance", which eventually stretched into the dot-com boom / dot-com bubble.
- Financial crisis hits East and Southeast Asia in 1997 and 1998 after a long period of phenomenal economic development. See East Asian Tigers.
Culture
Trends/Various
- The Gay 1990s The 1990s saw an increase in gay visibility. Tv shows like thirtysomething,My So called Life and Ellen featured gay characters, Movies like The Birdcage,In and Out and Kiss Me Guido saw mainstream sucess, and celebrities like K.D Lang and George Michael coming out of the closet. Even President Bill Clinton generally held a pro gay rights viewpoint.
- Douglas Coupland publishes the novel Generation X: Tales for an Accelerated Culture, popularizing the term Generation X as the name of the generation born in the late 1960s and early 1970s (then college-age).
- Reality television explodes on MTV with the popularity of The Real World (1992-); along with Road Rules (1995-), Real World/Road Rules Challenge (1998), and Real World reunions, these shows remained popular throughout the 1990s.
- Video games become more advanced, but still a far cry from the systems of the 2000s. The more influential game systems of the Nineties include the Super Nintendo Entertainment System, the Sony Playstation, and the Sega Dreamcast.
- Extreme sports reached a new height in popularity, and by 1995, were given their own annual tournament on US cable network ESPN, the X-Games.
- Black becomes a dominant color in fashion, among several dark colors (see Goth, The Matrix, and Regis Philbin). - Dogma 95 becomes the leading European artistic film movement by the end of the decade.
- Professional wrestling became extremely popular. After scandals and near bankruptcy due to competition from World Championship Wrestling (WCW), the World Wrestling Federation was repackaged more edgier and realistic. Superstars such as Stone Cold Steve Austin, The Rock, Mick Foley, Steve Borden (Sting), Bill Goldberg, Raven, Sabu and others became household names. At the same time, Extreme Championship Wrestling (ECW) led wrestling's entry into edgier angles.
- Recreational sports such as rock climbing, mountain biking, sky diving, snowboarding, mountain climbing, bungee jumping, in-line skating, kayaking and rowing become hugely popular.
- Extended alcohol sales are implemented to reduce alcohol abuse.
- The 1990s remains a somewhat "cool" decade into the 2000s as many aspects of the 90s continue to be important into the next decade, see New Nineties.
Music
- Grunge music, popularized by Nirvana, big from the fall of 1991 through 1994 but influential to rock up to 2005 (see Post-Grunge), Grunge movement followed by the Britpop movement of about 1995 to 1997 which was in turn followed by numetal.
- Teen pop held over from the late 1980s popular into 1990, returns with Backstreet Boys and Spice Girls in latter third of the decade
- Radiohead comes to be one of the most critically and commercially loved bands since The Beatles. Two of their albums, The Bends and Ok Computer top lists at the end of the decade.
- Rap music gains widespread mainstream acceptance throughout the decade, starting with the success of MC Hammer, Public Enemy and Vanilla Ice around 1989-91 and ending with hip-hop inspired by Puff Daddy, Dr. Dre and Eminem c. 1997-99. By 1999 hip hop had definitely passed rock and roll in popularity.
- Music festivals such as Lollapalooza became popular; a fusing of genres from alternative rock, rap, punk rock and garage bands.
- Rock music begins to be referred to as "alternative" as it is originated in 1980s underground rock and 1970s punk and begins to lose popularity to hip hop.
- Trance, techno and electronica music becomes widely popular at rave parties in Europe/USA and in pop culture, particularly later in the decade. The drug Ecstasy, (aka MDMA or 'X') is popularized by rave culture.
- 1980s backlash, beginning in about 1991 and lasting into the 2000s. During most of the 1990s anything "Eighties" was considered to be ultimately uncool.
- Music becomes more profane, by end of decade a Parental Advisory sticker becomes acceptable rather than controversial.
- In America, country music becomes more mainstream with popular chart topping artist such as Garth Brooks, Shania Twain, LeAnn Rimes, Faith Hill, and Tim McGraw. The trend decreases somewhat in the 2000s.
Television
- Japanimation becomes popular in the United States in the late 1990s with shows Pokemon, Dragonball Z, and Cowboy Bebop.
- Mighty Morphin' Power Rangers gains popularity with kids in the mid 90s; leading to entire Power Rangers series. Barney and Teenage Mutant Ninja Turtles also popular
- MTV moves away from music videos and into original television shows such as The Real World, which is cited as the inspiration for the Reality TV boom of the 2000s.
- Cartoons aimed at an adult audience become popular. Among the most successful are The Simpsons (1989-), Ren & Stimpy (1991-1995), Beavis and Butt-head (1993-1997), South Park (1997-), King of the Hill (1997-), and Family Guy (1999-2002, 2005-).
- Television networks increase programs aimed at twenty- and thirty-somethings. Some of the popular are Beverly Hills 90210 (1990-2000), Melrose Place (1992-1999), Party of Five (1994-2000), Ally McBeal (1997-2002), Friends (1994-2004), and Seinfeld (1989-1998).
- Notable television sitcoms aimed at the teen/preteen market include Boy Meets World (1993-2000), Full House (1987-1995), Family Matters (1989-1998), and Third Rock From The Sun (1996-2001), among many others.
- Major 1990s slang words/phrases, mostly related to hip hop include "homie", "phat", "da bomb", "Audi 5000", "tight", "word to your mother", "Talk to the hand", "You go girl!", and "Wasssuppp!"
Other significant events
Talk to the hand]
- The massive global human impact on the environment, which first garnered attention in the 60s, was widely acknowledged.
- Divorce and scandal rocked the British Royal House of Windsor.
- The assassination of Selena Quintanilla.
- Sex and violence in the media increase, especially in the late part of the decade. Profanity in music reaches peak in the late 90s.
- O.J. Simpson's trial, described in the media as the "trial of the century".
- You go, girl! becomes a popular phrase in the media as feminism is more widely accepted and publicised in the media with The Spice Girls, the WNBA, women's boxing, Sex and the City and others showcasing modern femininity.
- The Vieques controversy.
- The Oklahoma City Bombing, the bombing of a federal building in Oklahoma City, Oklahoma, killing 168.
- The Waco massacre prompts a nationwide debate in the U.S. about the freedom of association right of the Michigan Militia, Montana Militia and other radical groups.
- Crime levels in the U.S. peak in 1991, begin to fall afterwards to the lowest levels since the late 1960s at end of decade.
- Drug use in the U.S. reaches an all-time low in 1992 before increasing, reaching its peak in 1997 before declining again.
- Princess Diana dies in a car accident in 1997. Debates of accident vs assassination rage.
- Mother Teresa, the Roman Catholic nun who won the Nobel Peace Prize, dies at age 87.
- 21-year-old Golfer Tiger Woods wins the Masters Tournament by a record 12 strokes; becoming the youngest and first African-American to win the Masters.
- The Omagh bombing in Omagh, County Tyrone, Ireland which kills 29 civilians (including a woman pregnant with twins) and injures hundreds more.
- John F. Kennedy, Jr., his wife Carolyn Bessette and sister-in-law Lauren Bessette are killed when Kennedy's private plane crashes off the coast of Martha's Vineyard.
- American cyclist Lance Armstrong wins his first Tour de France in 1999, less than two years after battling testicular cancer.
- Beer keg registration becomes popular public policy in U.S.
People
World leaders
- Prime Minister Bob Hawke (Australia)
- Prime Minister Paul Keating (Australia)
- Prime Minister John Howard (Australia)
- President Fernando Affonso Collor de Mello (Brazil)
- President Itamar Franco (Brazil)
- President Fernando Henrique Cardoso (Brazil)
- Prime Minister Brian Mulroney (Canada)
- Prime Minister Kim Campbell (Canada)
- Prime Minister Jean Chrétien (Canada)
- "Paramount Leader" Deng Xiaoping (People's Republic of China)
- President Jiang Zemin (People's Republic of China)
- President Lee Teng-hui (Republic of China on Taiwan)
- President Franjo Tuđman (Croatia)
- Prime Minister Poul Nyrup Rasmussen (Denmark)
- President Hosni Mubarak (Egypt)
- President François Mitterrand (France)
- President Jacques Chirac (France)
- Chancellor Helmut Kohl (Germany)
- Chancellor Gerhard Schröder (Germany)
- Governor David Clive Wilson (Hong Kong (under British rule))
- Governor Christopher Francis Patten (Hong Kong (under British rule))
- Chief Executive Tung Chee Hwa (Hong Kong, People's Republic of China)
- Prime Minister Atal Bihari Vajpayee (India)
- President Mohammad Khatami (Iran)
- President Saddam Hussein (Iraq)
- Prime Minister Yitzhak Rabin (Israel)
- Prime Minister Benjamin Netanyahu (Israel)
- Emperor Akihito (Japan)
- Governor Vasco Joaquim Rocha Vieira (Macau (under Portuguese rule))
- Chief Executive Edmund Ho (Macau, People's Republic of China)
- President Yasser Arafat (Palestinian Authority)
- Pope Pope John Paul II
- President Corazon Aquino (Philippines)
- President Fidel Ramos (Philippines)
- President Joseph Estrada (Philippines)
- Prime Minister Mike Moore (New Zealand)
- Prime Minister Jim Bolger (New Zealand)
- Prime Minister Jenny Shipley (New Zealand)
- Prime Minister Helen Clark (New Zealand)
- President Ion Iliescu (Romania)
- President Emil Constantinescu (Romania)
- President Boris Yeltsin (Russia)
- Taoiseach Charles Haughey (Republic of Ireland)
- Taoiseach Albert Reynolds (Republic of Ireland)
- Taoiseach John Bruton (Republic of Ireland)
- Taoiseach Bertie Ahern (Republic of Ireland)
- President Boris Yeltsin (Russia)
- President Wee Kim Wee (Singapore)
- President Ong Teng Cheong (Singapore)
- President Sellapan Ramanathan (Singapore)
- President Frederik Willem de Klerk (South Africa)
- President Nelson Mandela (South Africa)
- President Kim Dae-jung (South Korea)
- President Mikhail Gorbachev (Soviet Union)
- King Juan Carlos I (Spain)
- President Felipe González (Spain)
- President José María Aznar (Spain)
- Queen Elizabeth II (United Kingdom et al.)
- Prime Minister John Major (United Kingdom)
- Prime Minister Tony Blair (United Kingdom)
- President George H.W. Bush (United States)
- President Bill Clinton (United States)
- President Slobodan Milošević (Federal Republic of Yugoslavia)
Entertainers
Federal Republic of Yugoslavia]
Federal Republic of Yugoslavia]]
Federal Republic of Yugoslavia
- 2pac
- Ace of Base
- Adam Sandler
- Aaliyah
- Alice in Chains
- Alanis Morrissette (Jagged Little Pill)
- Annie Lennox
- Anthony Hopkins (The Silence of the Lambs, Titus)
- Ashley Judd
- Beavis and Butt-Head
- Ben Affleck (Good Will Hunting)
- Bill Hicks
- Billy Bob Thornton
- Boyz II Men
- Bret Hart
- Britney Spears
- Bruce Willis (the Die Hard series, Pulp Fiction)
- Mariah Carey
- Dana Carvey (Wayne's World)
- Dean Cain
- Carmen Electra
- Christina Aguilera
- Cuba Gooding Jr (Boyz N the Hood, Jerry Maguire)
- Amy Grant
- Dave Matthews Band
- Demi Moore (Ghost, Striptease, A Few Good Men)
- Denzel Washington ( Malcolm X, Mo' Better Blues, Philadelphia)
- Destiny's Child (Destiny's Child, The Writing's On The Wall)
- Ellen DeGeneres (Ellen)
- Elizabeth Berkley (Saved by the Bell, Showgirls)
- Eurythmics
- Friends
- Courtney Cox
- Jennifer Aniston
- Lisa Kudrow
- Matt LeBlanc
- Matthew Perry
- David Schwimmer
- The Fugees
- Green Day (Dookie, Nimrod)
- Gwyneth Paltrow (Shakespeare in Love, The Talented Mr. Ripley, Se7en)
- Liam Gallagher of Oasis
- Noel Gallagher of Oasis
- Teri Hatcher
- Whitney Houston (The Bodyguard, Waiting to Exhale)
- Halle Berry (Introducing Dorothy Dandridge,Bullworth)
- Hanson
- Harrison Ford
- Helen Hunt (Mad About You, Twister, As Good as It Gets)
- Hootie & The Blowfish
- Jack Nicholson
- Jerry Seinfeld (Seinfeld)
- Jerry Springer
- Jim Carrey (Ace Ventura: Pet Detective, The Mask)
- Julia Roberts (Pretty Woman, Notting Hill)
- Kate Winslet (Titanic)
- Keanu Reeves (The Matrix)
- Kurt Cobain
- Leonardo DiCaprio (Titanic)
- Liam Neeson
- Macaulay Culkin (Home Alone)
- The Undertaker
- Martin Lawrence (House Party, Martin, Bad Boys)
- Mary J Blige (What's the 411?)
- Matt Damon (Good Will Hunting)
- Meg Ryan
- Mel Gibson (Braveheart)
- Michael Jackson
- Michael Keaton
- Michelle Pfeiffer (The Age of Innocence, Batman Returns)
- Mike Myers (Wayne's World, Saturday Night Live, Austin Powers)
- Mira Sorvino
- Nicole Kidman (My Life, Eyes Wide Shut)
- Notorious B.I.G.
- Nirvana
- Oasis
- Phil Collins
- Pamela Anderson (Baywatch)
- Pearl Jam
- "Image:Princesymbol.png" The artist formerly known as Prince
- Queen Latifah (Living Single, Set It Off)
- Quentin Tarantino (Pulp Fiction)
- Ralph Fiennes (Schindler's List, The English Patient)
- Jeri Ryan (Star Trek: Voyager)
- Samuel L. Jackson (Goodfellas, Pulp Fiction)
- Sandra Bullock (Speed, A Time to Kill)
- Shawn Michaels
- Spice Girls
- Stone Cold Steve Austin
- Tim Burton (Edward Scissorhands, Batman Returns)
- Tiffani-Amber Thiessen (Saved by the Bell, Beverly Hills 90210 )
- TLC (Lisa "Left-Eye" Lopes, T-Boz, Rozonda "Chilli" Thomas)
- Tom Hanks (Forrest Gump, Saving Private Ryan, Philadelphia, Toy Story, The Green Mile)
- Toni Braxton ( Toni Braxton (album) )
- U2 (Achtung Baby)
- Uma Thurman (Pulp Fiction)
- Whoopi Goldberg (Sister Act, Ghost, Ghosts of Mississippi, Hollywood Squares)
- Will Smith (The Fresh Prince of Bel Air, Men In Black)
Films
See also: 1990s in film
Books & Literature
See also : 1990s Books
- The Bridges of Madison County, by Robert James Waller
- Chicken Soup for the Soul, by Jack Canfield and Mark Victor Hansen
- The Client, by John Grisham
- Cold Mountain, by Charles Frazier
- Divine Secrets of the Ya-Ya Sisterhood , by Rebecca Wells
- The Firm, by John Grisham
- The Greatest Generation, by Tom Brokaw
- Harry Potter and the Sorcerer's Stone, by J. K. Rowling
- How to Make an American Quilt, by Whitney Otto
- It Takes A Village, by Hillary Clinton
- Jazz, by Toni Morrison
- Men Are From Mars, Women Are From Venus, by John Gray
- The Perfect Storm, by Sebastian Junger
- The Way Things Ought to Be, by Rush Limbaugh
- The Sum of All Fears, by Tom Clancy
Sports figures
Se
PageRankPageRank, sometimes abbreviated to PR, is a family of algorithms for assigning numerical weightings to hyperlinked documents (or web pages) indexed by a search engine originally developed by Larry Page (thus the play on the words PageRank). Its properties are much discussed by search engine optimization (SEO) experts. The PageRank system is used by the popular search engine Google to help determine a page's relevance or importance. It was developed by Google's founders Larry Page and Sergey Brin while at Stanford University in 1998. As [http://www.google.com/technology/ Google puts it]:
:PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. Google interprets a link from page A to page B as a vote, by page A, for page B. But Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."
PageRank uses links as "votes"
In other words, a page rank results from a "ballot" among all the other pages on the World Wide Web about how important a page is. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it ("incoming links"). A page that is linked by many pages with high rank receives a high rank itself. If there are no links to a web page there is no support of this specific page.
In early 2005, Google implemented a new attribute, rel="nofollow", for the HTML link element, so that website builders and bloggers can make links that Google will not follow for the purposes of PageRank—they are links that no longer constitute a "vote" in the PageRank system. The nofollow attribute was added in an attempt to help combat comment spam.
The Google Toolbar PageRank goes from 0 to 10. It seems to be a logarithmic scale. The exact details of this scale are not public knowledge. The name PageRank is a trademark of Google. This is a pun on the name Larry Page. The PageRank process has been patented (). The patent is not assigned to Google but to Stanford University.
An alternative to the Page rank algorithm is the HITS algorithm proposed by Jon Kleinberg and the CLEVER project at IBM. Many HITS concepts are now incorporated into Teoma and Ask Jeeves.
Page rank algorithm
Simplified
Suppose a small universe of four web pages: A, B, C and D. If all those pages link to A, then the PR (PageRank) of page A would be the sum of the PR of pages B, C and D.
:
But then suppose page B also has a link to page C, and page D has links to all three pages. One page cannot vote twice, but split its vote over several pages. Thus, page B gives half a vote to page A and half a vote to page C. In the same logic, page D divides its votes over three pages and only one third of D's vote is counted for A's PageRank.
:
In other words, divide the PR by the total number of links that come from the page.
:
The actual PageRank formula incorporates two more considerations:
First of all, we trust indirect votes less than real votes: Let's say, a new page N links to page B, thus increasing the authority of B by one unit. As a consequence of the above equation, the authority of pages A and C would increase by half a unit (exactly as much as if the new page N would have linked directly to A and C instead of B). This is too much! Of course N links to B and considers it more authoritive than A and C. This problem is resolved by scaling down the votes by a factor q which is usually 0.85.
Finally, all pages get a small authority of 1-q=0.15 to start off. This choice results in the nice property that the average page rank of all pages will be one.
With these two modifications, our equations turns into the real page rank equation:
:
So one page's PageRank is calculated by the PageRank of other pages. Google is always recalculating the PageRanks. If you give all pages a PageRank of any number and constantly recalculate everything, all PageRanks will change and tend to stabilize at some point. It is at this point where the PageRank is used by the search engine.
Complex
The formula uses a model of a random surfer who gets bored after several clicks and switches to a random page. The PageRank value of a page reflects the frequency of hits on that page by the random surfer. It can be understood as a Markov process in which the states are pages, and the transitions are all equally probable and are the links between pages. If a page has no links to other pages, it becomes a sink and therefore makes this whole thing unusable, because the sink pages will trap the random visitors forever. However, the solution is quite simple. If the random surfer arrives to a sink page, it picks another URL at random and continues surfing again.
To be fair with pages that are not sinks, these random transitions are added to all nodes in the Web, with a residual probability of usually q=0.15, estimated from the frequency that an average surfer uses his or her browser's bookmark feature.
So, the equation is as follows:
:
where are the pages under consideration, is the set of pages that link to , is the number of links coming from page , and N is the total number of pages.
The PageRank values are the entries of the dominant eigenvector of the modified adjacency matrix. This makes PageRank a particularly elegant metric: the eigenvector is
:
where R is the solution of the equation
:
where the adjacency function is 0 if page does not link to , and normalised such that, for each j
:
i.e. the elements of each column sum up to 1.
The values of the PageRank eigenvector are fast to approximate (only a few iterations are needed) and in practice it gives good results.
As a result of Markov theory, it can be shown that the PageRank of a page is the probability of being at that page after lots of clicks. This happens to equal where is the expectation of the number of clicks (or random jumps) required to get from the page back to itself.
The main disadvantage is that it favors older pages, because a new page, even a very good one, will not have many links unless it is part of an existing site (a site being a densely connected set of pages).
That's why PageRank should be combined with textual analysis or other ranking methods. PageRank seems to favor Wikipedia pages, often putting them high or at the top of searches for several encyclopedic topics. A common theory is that this is because Wikipedia is very interconnected, with each article having many internal links from other articles, which in turn have links from many other sites on the Web pointing to them. Compared to Wikipedia, and similar high quality content-rich sites, the rest of the World Wide Web is relatively loosely connected.
Several strategies have been proposed [http://citeseer.ist.psu.edu/719287.html to accelerate] the computation of PageRank.
However, Google is known to actively penalize link farms and other schemes to artificially inflate PageRank. How Google tells the difference between highly inter-linked web sites and link farms is one of Google's trade secrets.
False or spoofed PageRank
While the PR shown is usually accurate for most sites it must be noted that it is also easily manipulated. A current flaw is that any low PageRank page that is redirected, via a 302 server header or a "Refresh" meta tag, to a high PR page causes the lower PR page to acquire the PR of the destination page. In theory a new, PR0 page with no incoming links can be redirected to the Google home page - which is a PR 10 - and by the next PageRank update the PR of the new page will be upgraded to a PR10. This is called spoofing and is a known failing or bug in the system. Any page's PR can be spoofed to a higher or lower number of the webmaster's choice and only Google has access to the real PR of the page.
Buying of PR
For SEO purposes webmasters often buy links for their sites. As links from higher PR pages are believed to be more valuable they tend to be more expensive. Google actively discourages the buying and selling of links purely based on PR, and their webmaster guidelines specifically forbid it. It is quite possible for unscrupulous webmasters to spoof PR with the intention of commanding a higher price for links.
See also
- List of websites with a high PageRank
- PigeonRank
- TrustRank
External links
- [http://www.google.com/technology/ Our Search: Google Technology] by Google
- [http://www-db.stanford.edu/~backrub/google.html The Anatomy of a Large-Scale Hypertextual Web Search Engine] by Sergey Brin & Lawrence Page - 1998
- [http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=1999-66&format=pdf&compression= The PageRank Citation Ranking: Bringing Order to the Web] Lawrence Page, Sergey Brin, Rajeev Motwani, & Terry Winograd - 1999 (PDF)
- [http://www.voelspriet2.nl/PageRank.pdf Page Rank Uncovered] by Chris Ridings & Mike Shishigin (PDF)
- [http://www.cs.washington.edu/homes/pedrod/papers/nips01b.pdf The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank] by Matthew Richardson and Pedro Domingos - 2002 (PDF)
- [http://members.optusnet.com.au/clausen/reputation/rep-cost-attack.pdf Online Reputation Systems: The Cost of Attack of PageRank] by Andrew Clausen - 2003 (PDF)
- [http://www.google.com/technology/pigeonrank.html PigeonRank]: a Google-sized violation of animal welfare by exposing helpless birds to possibly harmful webpages (humour).
- [http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html Preventing comment spam] using the nofollow attribute to neutralize a link's PageRank vote
Category:Google
Category:Search engine optimization
ja:ページランク
Alt attribute
The alt attribute is used in HTML (et al.) documents to specify text which is to be rendered when the element it is applied to cannot be. In HTML 4.01, the attribute is required for the img and area element types. It is optional for the input element type and the deprecated applet element type.
Alternative text is especially useful in the following situations:
- For people with low bandwidth connections, who may opt not to load graphics
- For people using handheld devices
- For people with disabilities who use assistive software or screen readers
- Search engine optimization. Many search engines can only interpret the meaning of objects by analysing their alt attribute
In the early years of Internet development, alternative text was particularly helpful to people using text-only browsers (like Lynx). Nowadays, even when graphical capabilities are a commodity, alternative text is still highly appreciated by users with accessibility requirements and users looking for ways to optimize their network bandwidth use.
Alt attributes may be empty (alt=""); while the use of meaningful alt text is necessary to comply with accessibility standards, and is good practice, when an image is used for purely decorative purposes an empty alt attribute is appropriate.
The alt attribute is commonly, but incorrectly, referred to as an image's "alt tag". It is not intended to provide "pop up" text or tooltips when a user's mouse hovers over the image, though alt text has historically been presented in this way in some web browsers; HTML's title attribute is intended for supplementary information that can be used in this way.
Example of usage in HTML
External links
- [http://www.w3.org/TR/html401/struct/objects.html#adef-alt W3C spec section on 'How to specify alternate text']
- [http://ppewww.ph.gla.ac.uk/~flavell/alt/alt-text.html Alt text guidance from Alan Flavell] with section on humourous gaffes
- [http://diveintoaccessibility.org/day_23_providing_text_equivalents_for_images.html Dive in to accessibility page on alt text]
- [http://www.yourhtmlsource.com/accessibility/contentaccessibility.html Content Accessibility Tutorial]
- [http://www.cs.tut.fi/~jkorpela/html/alt.html Guidelines on alt texts in img elements] by Jukka Korpela
- [http://www.hixie.ch/advocacy/alttext Mini-FAQ about the alternate text of images] by Ian Hickson
Category:HTML
Keyword stuffingKeyword stuffing is considered to be an unethical Search engine optimization (SEO) technique. Keyword stuffing is when a web page is loaded up with keywords in the Meta tags or on the Web page's content. The common techniques today for keyword stuffing are repeating the same word over and over again in the Meta tags, which is why many search engines don't look at the Meta tags any more, and also on the page with text that is the same color as the background, also known as invisible or hidden text.
The reason keyword stuffing is used is to obtain maximum search engine ranking and visibility for particular phrases with the purpose of getting visitors from the search engine (such as Google, Yahoo!, etc.) to come to their web page. However, if the word is repeated too much it will raise a red flag to the search engines and they will likely place a spam filter on the Web site or Web page.
See also
- Doorway pages
- Link spam
- Cloaking
External links
- [http://www.google.com/webmasters/seo.html Google Guidelines]
- [http://help.yahoo.com/help/us/ysearch/deletions/deletions-05.html Yahoo! Guidelines]
Category:Search engine optimization
Meta tagMeta tags are used to provide structured data about data.
Use in web pages
The most popular use is in web pages and is similar in many ways to the information provided in traditional library catalogue records.
Words which are often misspelled are often added as keywords. For example, accommodation and millennium are misspelled almost as often as they are spelt correctly. Adding these misspellings as keywords is therefore of potential value.
Commercial uses
x Meta tags have been the focus of a field of marketing research known as search engine optimization or SEO. In the mid to late 1990s, search engines were reliant on Meta tag data to correctly classify a web page. Webmasters quickly learned the commercial significance of having the right Meta tag, as it frequently led to a high ranking in the search engines - and thus, high traffic to the web site.
As search engine traffic achieved greater significance in online marketing plans, consultants were brought in who were well versed in how search engines perceive a web site. These consultants used a variety of techniques (legitimate and otherwise) to improve ranking for their clients. You can see any web site's Meta tags by viewing the source code.
The two most popular Meta tags used for search engine optimization is the Meta description tag and Meta keyword tag. The Meta description tag allows web site owners to describe what the web page is about. Search engines such as Google still use this to display the description of a web site in the search results. The Meta keyword tag allows site owners to put keywords in it to help search engines decide what a web site is about.
In the early 2000s, search engines veered away from reliance on Meta tags, as many web sites used inappropriate keywords or were keyword stuffing to obtain any and all traffic possible.
Some search engines, however, still take Meta tags into some consideration when delivering results. In recent years, search engines have become smarter, penalizing web sites that are cheating (by repeating the same keyword several times to get a boost in the search ranking). Instead of going up in the ranking, these sites will go down in ranking or on some search engines, will be kicked out of the search engine completely.
Alternative to meta tags
An alternative to meta tags for enhanced subject access within a website is the use of a back-of-book-style index for the website. See examples at the websites of the Australian Society of Indexers ([http://www.aussi.org www.aussi.org]) and the American Society of Indexers ([http://www.asindexing.org www.asindexing.org]).
See also
- HTML element
- Metadata
External links
- [http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.4 HTML 4.01 Specification: Meta data]
- [http://www.hotwebtools.com/meta Meta data generator]
Category:HTML
HyperlinkA hyperlink, or simply a link, is a reference in a hypertext document to another document or other resource. As such it is similar to a citation in literature. However, combined with a data network and suitable access protocol, it can be used to fetch the resource referenced. This can then be saved, viewed, or displayed as part of the referencing document.
Hyperlinks are part of the foundation of the World Wide Web created by Tim Berners-Lee.
There are a number of ways to format and present hyperlinks on a web page. The embedded link, a link that occurs within a sentence as is illustrated above, is one of the more common formats.
Hyperlinks in various technologies
Hyperlinks in HTML
Tim Berners-Lee saw the possibility of using hyperlinks to link every unit of information to any other unit of information over the Internet. Hyperlinks were therefore integral to the creation of the World Wide Web.
Links are specified in HTML using the <a> (anchor) elements.
Hyperlinks in XML
A special W3C Recommendation called the XML Linking Language, XLink, describes simple (i.e. as in HTML) and extended links for hyperlinking from, within, and between XML documents.
Hyperlinks in other technologies
Hyperlinks are used in PDF documents, word processing documents, spreadsheets, Apple's HyperCard and many others.
How hyperlinks work in HTML
A link has two ends, called anchors, and a direction. The link starts at the source anchor and points to the destination anchor. However, the term link is often used for the source anchor, while the destination anchor is called the link target.
The most common link target is a URL used in the World Wide Web. This can refer to a document, e.g. a webpage, or other resource, or to a position in a webpage. The latter is achieved by means of a HTML element with a "name" or "id" attribute at that position of the HTML document. The URL of the position is the URL of the webpage with "#attribute name" appended.
Link behaviour in web browsers
A web browser usually displays a hyperlink in some distinguishing way, e.g. in a different colour, font or style. The behaviour and style of links can be specified using the Cascading Style Sheets (CSS) language.
In a graphical user interface, the usage of a mouse cursor may also change into a hand motif to indicate a link. In most graphical web browsers, links are displayed in underlined blue text when not cached, but underlined purple text when cached. When the user activates the link (e.g. by clicking on it with the mouse) the browser will display the target of the link. If the target is not a HTML file, depending on the file type and on the browser and its plugins, another program may be activated to open the file.
The HTML code contains some or all of the five main characteristics of a link:
- link destination ("href" pointing to a URL)
- link label
- link title
- link target
- link class or link id
It uses the HTML element "a" with the attribute "href" and optionally also the attributes "title", "target", and "class" or "id":
:<a href="URL" title="link title" target="link target" class="link class">link label</a>
Example: To embed a link into a Page, blogpost, or comment, it may take this form:
BendGovt
Thus, the complex link string is reduced to, [BendGovt]. This contributes to a clean, easy to read text or document. [see: Embeded link]
When the cursor hovers over a link, depending on the browser and/or graphical user interface, some informative text about the link is shown:
- It pops up, not in a regular window, but in a special hover box, which disappears when the cursor is moved away (sometimes it disappears anyway after a few seconds, and reappears when the cursor is moved away and back). IE and Mozilla Firefox show the title, Opera also shows the URL.
- In addition, the URL may be shown in the status bar.
Normally, a link will open in the current frame or window, but sites that use frames and multiple windows for navigation can add a special "target" attribute to specify where the link will be loaded. Windows can be named upon creation, and that identifier can be used to refer to it later in the browsing session. If no current window exists with that name, a new window will be created using the ID.
Creation of new windows is probably the most common use of the "target" attribute. In order to prevent accidental reuse of a window, the special window names "_blank" and "_new" are usually available, and will always cause a new window to be created. It is especially common to see this type of link when one large website links to an external page. The intention in that case is to ensure that the person browsing is aware that there is no endorsement of the site being linked to by the site that was linked from. However, the attribute is sometimes overused and can sometimes cause many windows to be created even while browsing a single site.
Another special page name is "_top", which causes any frames in the current window to be cleared away so that browsing can continue in the full window.
Hyperlinks as the currency of the World Wide Web
The Google search engine uses PageRank, a measure of link popularity to determine which page should be ranked first. The more pages that have a hyperlink pointing to a page, the higher rank that page gets. It is actually slightly more complicated than that, see PageRank for more information.
History of the hyperlink
The term "hyperlink" was coined in 1965 (or possibly 1964) by Theodor Nelson at the start of Project Xanadu. Nelson had been inspired by "As We May Think," a popular essay by Vannevar Bush. In the essay, Bush described a microfilm-based machine in which one could link any two pages of information into a "trail" of related information, and then scroll back and forth among pages in a trail as if they were on a single microfilm reel. The closest contemporary analogy would be to build a list of bookmarks to topically related Web pages and then allow the user to scroll forward and backward through the list.
In a series of books and articles published from 1964 through 1980, Nelson transposed Bush's concept of automated cross-referencing into the computer context, made it applicable to specific text strings rather than whole pages, generalized it from a local desk-sized machine to a theoretical worldwide computer network, and advocated the creation of such a network. Meanwhile, working independently, a team led by Douglas Engelbart (with Jeff Rulifson as chief programmer) was the first to implement the hyperlink concept for scrolling within a single document (1966), and soon after for connecting between paragraphs within separate documents (1968). See NLS.
Legal issues concerning hyperlinks
While hyperlinking among pages of Internet content has long been considered an intrinsic feature of the Internet, some websites have claimed that linking to them is not allowed without permission, see e.g. [http://www.litmanlaw.com/content.aspx?page=243§ion=12] and [http://www.stib.irisnet.be/msgN.htm] (in Dutch).
See also deep linking.
In some jurisdictions it is or was (for example the Netherlands, see Karin Spaink) held that hyperlinks are not merely references or citations, but are devices for copying web pages. Although this principle is generally rejected by digerati [http://www.edge.org/digerati/], the courts that adhere to it see the mere publication of a hyperlink that connects to illegal material to be an illegal act in itself, regardless of whether referencing illegal material is illegal.
British Telecom sued Prodigy claiming that Prodigy infringed its patent () on web hyperlinks. However, after costly litigation, a court found for Prodigy, ruling that British Telecom's patent did not actually cover web hyperlinks. [1]
For an overview see also: [http://www.linksandlaw.com Links & Law] - case law summary, links to relevant court rulings worldwide and to relevant articles about linking
See also
- Hyperlinking objects
- HTML element.
ko:하이퍼링크
ja:ハイパーリンク
simple:Link
Category:Human-computer interaction
Link popularityLink popularity is a measure of the quantity and quality of other web sites that link to a specific site on the World Wide Web. It is an example of the move by search engines towards off-the-page-criteria to determine quality content. In theory, off-the-page-criteria adds the aspect of impartiality to search engine rankings.
Link popularity plays an important role in the visibility of a web site among the top of the search results. Indeed, some search engines require at least one or more links coming to a web site, otherwise they will drop it from their index.
Search engines such as Google use a special link analysis system to rank web pages. Citations from other WWW authors help to define a site's reputation. The philosophy of link popularity is that important sites will attract many links. Content-poor sites will have difficulty attracting any links. Link popularity assumes that not all incoming links are equal, as an inbound link from a major directory carries more weight than an inbound link from an obscure personal home page. In other words, the quality of incoming links counts more than sheer numbers of them.
To search for pages linking to a specific page, simply enter the URL on Google or Yahoo! this way:
link:http://yourdomainname/pagename.html
Here are some strategies that are generally considered to be important to increase link popularity:
- There should be links from the home page to all subpages so that a search engine can transfer some link popularity to the subpages.
- Appropriate anchor text with relevant keywords should be used in the text links that are pointing to pages within a site (technically, this helps link context, not link popularity).
- Getting links from other web sites, particularly sites with high PageRank, can be one of the most powerful site promotion tools. Therefore, the webmaster should try to get links from other important sites offering information or products compatible or synergistic to his/her own site or from sites that cater to the same audience the webmaster does. The webmaster should explain the advantages to the potential link partner and the advantages his/her site has to their visitors.
- One way links often count for more than reciprocal links.
- The webmaster should list his/her site in one or more of the major directories such as Yahoo! or the Open Directory Project.
- The webmaster should only link to sites that he/she can trust, i.e. sites that do not use "spammy techniques".
- The webmaster should not participate in link exchange programs or link farms, as search engines will ban sites that participate in such programs.
To increase link popularity, many webmasters interlink multiple domains that they own, but often search engines will filter out these links, as such links are not independent votes for a page and are only created to trick the search engines. See Spamdexing. In this context, closed circles are often used, but these should be avoided, as they hoard PageRank.
See also
- Hyperlink
- Reciprocal link
- Link exchange
- PageRank
- Search engine optimization
- Spamdexing
- Zipf's law
External links
- [http://www.hscripts.com/tools/HLPC/ HIOX Link Popularity Checker]
- [http://www.searchenginewatch.com/webmasters/article.php/2167951 Measuring Link Popularity]
Category:World Wide Web
Doorway pageDoorway pages are web pages that are created to rank high in search engine results for particular phrases with the purpose of sending you to a different page. They are also known as landing pages, bridge pages, portal pages, zebra pages, jump pages, gateway pages, entry pages and by other names.
If you click through to a typical doorway page from a search engine result page, in most cases you will be redirected with a fast META refresh command to another page. Doorway pages are easy to identify in that they have been designed primarily for search engines, not for human beings. Sometimes a doorway page is copied from another high ranking page, but this is likely to cause the search engine to detect the page as a duplicate and exclude it from the search engine listings.
Because many search engines give you a penalty for using the META refresh command, some doorway pages just trick you into clicking on a link to get you to the desired destination page.
See also
- Cloaking
- Keyword stuffing
External links
- [http://www.ewebmastersforum.com Doorwaypages Forum]
- [http://searchenginewatch.com/webmasters/article.php/2167831 What are doorway pages?]
- [http://websearch.about.com/od/seononos/a/doorways.htm Doorway pages: The Good, The Bad, The Ugly]
Category:Search engine optimization
META refreshMETA refresh is an HTML meta element specifying an interval after which a browser should auto-refresh the page. Including an URL will instruct the browser to fetch that resource, but this is a poor method of redirection.
Example:
The W3C's [http://www.w3.org/TR/WAI-WEBCONTENT/#tech-no-periodic-refresh Web Content Accessibility Guidelines (7.4)] discourage the creation of auto-refreshing pages, since most web browsers do not allow the user to disable or control the refresh rate.
The meta refresh is a poor choice for redirecting users, since it does not communicate to a web browser or a search engine anything about the original resource or the new one.
[http://firehttp.com/ FireHTTP] is a URL redirection service for hiding the referer using meta-refresh. The referer is not sent by most [https://bugzilla.mozilla.org/show_bug.cgi?id=266554 popular browsers] when this URL redirection method is used.
See also
- Meta tag
References
- [http://www.w3.org/TR/WAI-WEBCONTENT/#gl-movement W3C Web Content Accessibility Guidelines (1.0): Ensure user control of time-sensitive content changes]
- [http://www.w3.org/QA/Tips/reback Use standard redirects: don't break the back button!]
Category:HTML
Common Gateway InterfaceCommon Gateway Interface (CGI) is an important World Wide Web technology that enables a client web browser to request data from a program executed on the Web server. CGI specifies a standard for passing request data between a web server and the program used to service that request.
Originally, CGI was invented by NCSA for the NCSA HTTPd web server in 1993. This web server used Unix environment variables to store parameters passed from the web server execution environment before spawning the CGI program as a separate process.
The programming language Perl is often associated with CGI, but one of the aims of CGI is to be language-neutral. The Web server does not need to know anything about the language in question.
In fact, CGI programs can be written in any scripting language or a full-fledged programming language, as long as that language can be executed on the system. Besides Perl, examples include Unix shell scripts, Python, Ruby, PHP, Tcl, C/C++, Pascal, and Visual Basic.
An example of a CGI program is the one implementing a wiki. You hand it the name of an entry; it will retrieve the source of that entry's page (if one exists), transform it into HTML, and send the result back to the browser or tell it that you want to edit a page. All wiki operations are managed by this one program.
The way CGI works from the Web server's point of view is that certain locations (e.g. http://www.example.com/wiki.cgi) are defined to be served by a CGI program. Whenever a request to a matching URL is received, the corresponding program is called, with any data that the client sent as input. Output from the program is collected by the Web server, augmented with appropriate headers, and sent back to the client.
Because this technology generally requires a fresh copy of the program to be executed for every CGI request, the workload could quickly overwhelm web servers, inspiring more efficient technologies such as mod_perl that allow script interpreters to be integrated directly into web servers as modules, thus avoiding the overhead of repeatedly loading and initializing language interpreters.
Workarounds for scripting languages
The overhead of spawning new processes to compile the server code can be easily handled if the code is occasionally changed. One example is FastCGI while others include programming accelerators that take a web script when initially called and store a compiled version of the script in system location so that further requests for the file are automatically directed to the compiled code instead of invoking the script interpreter every time the script is called. When scripts are changed the temporary accelerator cache can be emptied to ensure that the new script is called instead of the old one.
Thus for languages such as C or Pascal, which are usually compiled anyway, CGI programs are no different from other programs in this regard, and require no special processing.
Another approach used for scripting languages is to embed the interpreter directly into the web server so that it can be executed without creating a new process. The Apache web server has a number of modules such as mod_perl, mod_php, mod_python, mod_ruby, and mod_mono which do this.
See also
- CGI.pm
- Simple Common Gateway Interface
External links
- The [http://www.w3.org/CGI/ CGI standard] at w3.org.
- The [http://hoohoo.ncsa.uiuc.edu/cgi/ CGI/1.1 specification].
- The complete list of CGI variables is at http://hoohoo.ncsa.uiuc.edu/cgi/env.html.
- The [http://www.mems-exchange.org/software/scgi/ SCGI] protocol is a replacement for the Common Gateway Interface (CGI) protocol.
Category:World Wide Web
ja:Common Gateway Interface
Java programming languageJava is an object-oriented programming language developed initially by James Gosling and colleagues at Sun Microsystems. Initially called Oak (named after the oak trees outside Gosling's office), it was intended to replace C++, although the feature set better resembles that of Objective-C. Sun Microsystems currently maintains and updates Java regularly.
Specifications of the Java language, the Java Virtual Machine (JVM) and the Java API are community-maintained through the Sun-managed Java Community Process. Java was developed in 1991 by Gosling and other Sun engineers, as part of the Green Project. After first being made public in 1994, it achieved prominence following the announcement at 1995's SunWorld that Netscape would be including support for it in their Navigator browser.
Java is often confused with JavaScript, with which it shares only a similar C-like syntax.
History
syntax
Early history
The Java platform and language began as an internal project at Sun Microsystems in December of 1990. Patrick Naughton, an engineer at Sun, had become increasingly frustrated with the state of Sun's C++ and C APIs (application programming interfaces) and tools. While considering moving to NeXT, Patrick was offered a chance to work on new technology and thus the Stealth Project was started.
The Stealth Project was soon renamed to the Green Project with James Gosling and Mike Sheridan joining Patrick Naughton. They, together with some other engineers, began work in a small office on Sand Hill Road in Menlo Park, California to develop a new technology, aimed at programming next generation smart appliances such as microwaves, which Sun expected to be a big application of future technology. The team originally considered C++ as the language to use, but many of them as well as Sun's chief scientist, Bill Joy, found C++ and the available APIs problematic for several reasons.
Their platform was an embedded platform and had limited resources. Many members found that C++ was too complicated and that developers often misused it. They found C++'s lack of garbage collection a problem, as well as its lack of portable facilities for security, distributed programming, and threading. Finally, they wanted a platform that could be easily ported to all types of devices.
According to the available accounts, Bill Joy had ideas of a new language combining the best of Mesa and C. In a paper called Further, he proposed to Sun that its engineers should produce an object-oriented environment based on C++. Initially, Gosling attempted to modify and extend C++, which he referred to as C++ ++ -- , but soon abandoned that in favor of creating an entirely new language, which he called Oak after the tree that stood just outside his office.
Like many stealth projects working on new technology, the team worked long hours and by the summer of 1992, they were able to demonstrate portions of the new platform including the Green OS, the Oak language, the libraries, and the hardware. Their first attempt focused on building a PDA-like device named Star7 having a highly graphical interface and a smart agent called "Duke" to assist the user. It was demonstrated on September 3, 1992.
In November of that year, the Green Project was spun off to become FirstPerson, Inc, a wholly owned subsidiary of Sun Microsystems, and the team relocated to Palo Alto. The FirstPerson team was interested in building highly interactive devices, and when Time Warner issued an RFP for a set-top box, FirstPerson changed their target and responded with a proposal for a set-top box platform. However, the cable industry felt that their platform gave too much control to the user and FirstPerson lost their bid to SGI. An additional deal with The 3DO Company for a set-top box also failed to materialize. Unable to generate any interest within the TV industry for their platform, the company was rolled back into Sun.
Java meets the Internet
In June and July of 1994, after a three-day brainstorming session with John Gage, James Gosling, Bill Joy, Patrick Naughton, Wayne Rosing, and Eric Schmidt, the team re-targeted its efforts yet again, this time to use the technology for the Web. They felt that with the advent of the Mosaic browser, the Internet was on its way to evolving into the same highly interactive vision that they had had for the cable TV network. As a prototype, Patrick Naughton wrote a small web browser, WebRunner, later renamed HotJava.
It was also in 1994 that Oak was renamed Java. A trademark search revealed that the name Oak had already been taken by a video adaptor card manufacturer, so the team searched for a new name. The name Java was coined at a local coffee shop frequented by some of the members. It is not clear whether the name is an acronym or not. Most likely it is not, although some accounts claim that it stands for the names of James Gosling, Arthur Van Hoff, and Andy Bechtolsheim, or Just Another Vague Acronym. Lending credence to the idea that Java owes its name to the products sold at the coffee shop is the fact that the first 4 bytes of any class file spells out the words CAFE BABE if read in hexadecimal.
In October of 1994, HotJava and the Java platform was demonstrated for Sun executives. Java 1.0a was made available for download in 1994, but the first public release of Java and the HotJava web browser came on May 23, 1995, at the SunWorld conference. The announcement was made by John Gage, the Director of Science for Sun Microsystems. His announcement was accompanied by a surprise announcement by Marc Andreessen, Executive Vice President of Netscape, that Netscape would be including Java support in its browsers. In January of 1996, the JavaSoft business group was formed by Sun Microsystems to develop the technology.
Recent history
After several years of popularity, Java's place in the browser has steadily eroded. Its usage for simple interactive animations has been almost completely supplanted by Macromedia Flash, and as of 2005 it tends only to be used for more complex applications like Yahoo! Games. It has also suffered from opposition by Microsoft, which no longer plans to ship a Java platform with new versions of Internet Explorer or Windows.
By contrast, on the server-side of the Web, Java is far more popular than ever, with many websites using JavaServer Pages and other Java-based technologies in their front-ends.
On the desktop, stand-alone Java applications remain relatively unusual because of their large overhead. However, with the great advances in computer power in the last decade, along with improvements in VM and compiler quality, several have gained widespread use, including the NetBeans and Eclipse IDEs, Limewire and the Azureus BitTorrent client. Also, Matlab's latest versions (at least from 6.0 and onwards) heavily depend on Java for rendering their user interface and part of their calculation functionalities.
Version history
user interface by clicking a desktop icon or a link on a website.]]
The Java language has undergone several changes since JDK 1.0 as well as numerous additions of packages to the standard library:
- 1.0 (1996) — Initial release.
- 1.1 (1997) — Major additions, most notably the extensive retooling of the event model, as well as the introduction of inner classes.
- 1.2 (December 4, 1998) — Codename Playground. Major changes were made to the API (reflection was introduced, the Swing graphical API was integrated into the core classes etc) and to Sun's JVM (which was equipped with a JIT compiler). These had little impact on the language itself, however: the only change to the Java language was the addition of the keyword strictfp. This and subsequent releases were rebranded "Java 2", but this had no effect on any software version numbers.
- 1.3 (May 8, 2000) — Codename Kestrel. The most notable changes were: ([http://java.sun.com/j2se/1.3/docs/relnotes/features.html Full list of changes])
- HotSpot JVM introduced
- RMI was changed to be based on CORBA
- 1.4 (February 13, 2002) — Codename Merlin. As of 2004, the most widely used version. Changes included: ([http://java.sun.com/j2se/1.4.2/docs/relnotes/features.html Full list of changes])
- assert keyword.
- regular expressions modeled after Perl regular expressions
- exception chaining allows an exception to encapsulate original lower-level exception
- unblocking NIO (New IO)
- logging API
- image IO API for reading and writing images in formats like JPEG and PNG
- integrated XML parser and XSLT processor
- integrated security and cryptography extensions (JCE, JSSE, JAAS)
- 5.0 (September 29, 2004) — Codename Tiger. (Originally numbered 1.5, which is still used as the internal version number.) Added a number of significant new language features. One in particular, annotations, has been argued to be modeled on Microsoft's C#, which was itself modeled on earlier versions of Java:
- Generics — Provides compile-time type safety for collections and eliminates the need for most typecasts.
- Autoboxing/unboxing — Automatic conversions between primitive types (such as int) and wrapper types (such as Integer).
- Metadata — also called annotations, allows language constructs such as classes and methods to be tagged with additional data, which can then be processed by metadata-aware utilities
- Enumerations — the enum keyword creates a typesafe, ordered list of values (such as Day.MONDAY, Day.TUESDAY, etc.). Previously this could only be achieved by non-typesafe constant integers or manually constructed classes (typesafe enum pattern).
- Enhanced for loop — the for loop syntax is extended with special syntax for iterating over each member of an array or Collection, using a construct of the form:
for (Widget w: box)
This example iterates over box, assigning each of its items in turn to the variable w, which is then printed to standard output.
- 6.0 (currently in development, estimated release date 2006) — Codename [https://mustang.dev.java.net/ Mustang]. An early development version of the Java SDK version 6.0 (internal version number 1.6) was made available in November 2004. New builds including enhancements and bug fixes are released on a regular basis.
- 7.0 — Codename Dolphin. As of 2005, this is in the early planning stages.[http://weblogs.java.net/blog/editors/archives/2004/09/evolving_a_lang.html]
In addition to the language changes, much more dramatic changes have been made to the Java class library over the years, which has grown from a few hundred classes in version 1.0 to over three thousand in Java 5.0. Entire new APIs, such as Swing and Java2D, have been introduced, and many of the original 1.0 classes and methods have been deprecated.
Language characteristics
There were five primary goals in the creation of the Java language:
- It should use the object-oriented programming methodology.
- It should allow the same program to be executed on multiple computer platforms.
- It should contain built-in support for using computer networks.
- It should be designed to execute code from remote sources securely.
- It should be easy to use and borrow the good parts of older Object Oriented languages like C++.
Especially for the latter part, however, extensions are sometimes required, like CORBA or OSGi.
Object orientation
The first characteristic, object orientation ("OO"), refers to a method of programming and language design.
Although there are many interpretations of OO, one primary distinguishing idea is to design software so that the various types of data it manipulates are combined together with their relevant operations. Thus, data and code are combined into entities called objects. An object can be thought of as a self-contained bundle of behavior (code) and state (data). The principle is to separate the things that change from the things that stay the same; often, a change to some data structure requires a corresponding change to the code that operates on that data, or vice versa. This separation into coherent objects provides a more stable foundation for a software system's design. The intent is to make large software projects easier to manage, thus improving quality and reducing the number of failed projects.
Another primary goal of OO programming is to develop more generic objects so that software can become more reusable between projects. It is easy to see why a generic "customer" object, for example, should in theory have roughly the same basic set of behaviors between different software projects, especially when these projects overlap on some fundamental level as they often do in large organizations. In this sense, software objects can hopefully be seen more as pluggable components, helping the software industry "erect" projects largely from existing and well tested pieces, thus leading to a massive reduction in development times. However, the reality of software reusability has met with mixed results, mostly due to two difficulties: the design of truly generic objects remains a poorly-understood art, and a methodology for broad communication of reuse opportunities eludes the science. Some open source communities are now emerging whose primary mission is to help ease the reuse problem by providing authors with ways to disseminate information about generally reusable objects and object libraries.
Platform independence
components is independent of the platform it is running on]]
The second characteristic, platform independence, means that programs written in the Java language must run similarly on diverse hardware. One should be able to write a program once and run it anywhere.
This is achieved by most compilers by compiling the Java language code "halfway" to bytecode—simplified machine instructions specific to the Java platform. The code is then run on a virtual machine (VM), a program written in native code on the host hardware that interprets and executes generic Java bytecode. Further, standardized libraries are provided to allow access to features of the host machines (such as graphics, threading and networking) in unified ways. Note that, although there's an explicit compiling stage, at some point, the Java bytecode is interpreted or converted to native machine instructions by the JIT compiler.
There are also implementations of Java compilers that compile to native object code, such as GCJ, removing the intermediate bytecode stage, but the output of these compilers can only be run on a single architecture.
Sun's license for Java insists that all implementations be "compatible". This resulted in a legal dispute with Microsoft after Sun claimed that the Microsoft implementation did not support the RMI and JNI interfaces and had added platform-specific features of their own. Sun sued and won both damages (some $20 million) and a court order enforcing the terms of the license from Sun. In response, Microsoft no longer ships Java with Windows, and in recent versions of Windows, Internet Explorer cannot support Java applets without a third-party plugin. However, Sun and others have made available Java run-time systems at no cost for those and other versions of Windows.
The first implementations of the language used an interpreted virtual machine to achieve portability. These implementations produced programs that ran more slowly than programs written in C or C++, so the language suffered a reputation for poor performance. More recent JVM implementations produce programs that run significantly faster than before, using multiple techniques.
The first technique is to simply compile directly into native code like a more traditional compiler, skipping bytecodes entirely. This achieves good performance, but at the expense of portability. Another technique, known as just-in-time compilation (JIT), translates the Java bytecodes into native code at the time that the program is run. More sophisticated VMs use dynamic recompilation, in which the VM can analyze the behavior of the running program and selectively recompile and optimize critical parts of the program. These latter two techniques allow the program to take advantage of the speed of native code without losing portability.
Portability is a technically difficult goal to achieve, and Java's success at that goal has been mixed. Although it is indeed possible to write programs for the Java platform that behave consistently across many host platforms, the large number of available platforms with small errors or inconsistencies led some to parody Sun's "Write once, run anywhere" slogan as "Write once, debug everywhere".
Platform-independent Java is, however, very successful with server-side applications, such as web services, servlets, or Enterprise Java Beans - and meanwhile also with Embedded systems based on OSGi, using Embedded Java environments.
Automatic garbage collection
One possible argument against languages such as C++ is the burden of having to perform manual memory management. In C++, memory is allocated by the programmer to create an object, then deallocated to delete the object. If a programmer forgets or is unsure when to deallocate, this can lead to a memory leak, where a program consumes more and more memory without cleaning up after itself. Even worse, if a region of memory is deallocated twice, the program can become unstable and will likely crash.
In Java, this potential problem is avoided by automatic garbage collection. Objects are created and placed at an address on the heap. The program or other objects can reference an object by holding a reference to its address on the heap. When no references to an object remain, the Java garbage collector automatically deletes the object, freeing memory and preventing a memory leak. Memory leaks, however, can still occur if a programmer's code holds a reference to an object that is no longer needed—in other words, they can still occur but at higher conceptual levels. But on the whole, Java's automatic garbage collection makes creation and deletion of objects in Java simpler and potentially safer than in C++.
It should be noted, however, that programmers have access to garbage collection in C++ via smart pointers, such as the ones provided by the Boost library or as specified in the C++ committee's technical report TR1 which will be incorporated into the next C++ ISO standard.
It should also be noted that garbage collection in Java is virtually invisible to the developer. That is, developers may have no notion of when garbage collection will take place as it is not necessarily a function of the code they themselves write.
Interfaces and classes
One thing that Java accommodates is creating an interface which classes can then implement. For example, an interface can be created like this:
public interface Deleteable
This code says that any class that implements the interface Deleteable will have a method named delete(). The exact implementation and function of the method are determined by each class. There are many uses for this concept; for example, the following could be a class:
public class Fred implements Deleteable
Then, in another class, the following is legal code:
public void deleteAll (Deleteable [] list)
because any objects in the array are guaranteed to have the delete() method. The Deleteable array may contain references to Fred objects, and the deleteAll() method needn't differentiate between the Fred objects and other Deleteable objects.
The purpose is to separate the details of the implementation of the interface from the code that uses the interface. For example, the Collection interface contains methods that any collection of objects might want to implement, like retrieving or storing objects, but a specific collection could be a resizeable array, a linked list, or any of a number of different implementations.
The feature is a result of compromise. The designers of Java decided not to support multiple inheritance because of the difficulty of C++'s multiple inheritance, but interfaces give some of the benefit of multiple inheritance with, arguably, less complexity, but at the price of code redundancy (since interfaces only defines the signature of a class but cannot contain any implementation, every class inheriting an interface must provide the implementation of the defined methods, unlike in multiple inheritence, where the implementation is also inherited).
Java interfaces behave much like the concept of the Objective-C protocol.
Input/Output
Versions of Java prior to 1.4 only supported stream-based blocking I/O. This required a thread per stream being handled, as no other processing could take place while the active thread blocked waiting for input or output. This was a major scalability and performance issue for anyone needing to implement any Java network service. Since the introduction of NIO (New IO) in Java 1.4, this scalability problem has been rectified by the introduction of a non-blocking I/O framework (though there are a number of open issues in the NIO API as implemented by Sun).
The non-blocking IO framework, though considerably more complex than the original blocking IO framework, allows any number of "channels" to be handled by a single thread. The framework is based on the Reactor Pattern.
APIs
Sun has defined three platforms targeting different application environments and segmented many of its APIs so that they belong to one of the platforms. The platforms are:
- Java 2 Platform, Micro Edition — targeting environments with limited resources,
- Java 2 Platform, Standard Edition — targeting workstation environments, and
- Java 2 Platform, Enterprise Edition — targeting large distributed enterprise or Internet environments.
The classes in the Java APIs are organized into separate groups called packages. Each package contains a set of related interfaces, classes and exceptions. Refer to the separate platforms for a description of the packages available.
The set of APIs is controlled by Sun Microsystems in cooperation with others through its Java Community Process program. Companies or individuals participating in this process can influence the design and development of the APIs. This process has been a subject of controversy.
In 2004, IBM and BEA publicly supported the notion of creating an official open source implementation of Java but as of 2005, Sun Microsystems has refused that.
Hello World example
For an explanation of the tradition of programming "Hello World", see Hello world program.
// The source file must be named WorldGreeting.java
public class WorldGreeting
The above example merits a bit of explanation for those accustomed to languages with inherently relaxed security, weak typing, and weak object orientation.
- Everything in Java is written inside a class, including stand-alone programs.
- Source files are by convention named the same as the class they contain, appending the mandatory suffix .java. Classes which are declared public are required to follow this convention. (In this case, the class is WorldGreeting, therefore the source must be stored in a file called WorldGreeting.java).
- The compiler will generate a class file for each class defined in the source file. The name of the class file is the name of the class, with .class appended. For class file generation, anonymous classes are are treated as if their name was the concatenation of the name of their enclosing class, a $, and an integer.
- Programs to be executed as stand-alone must have a main() method.
- The keyword void indicates that the main() method does not return anything.
- The main method must accept an array of strings. By convention, it is referenced as "args" although any other legal variable name can be used.
- The keyword static indicates that the method is a class method, associated with the class rather than object instances. Main methods must be static.
- The keyword public denotes that a method can be called from code in other classes, or that a class may be used by classes outside the class hierarchy.
- The printing facility is part of the java standard library: The System class defines a public field called "out". The "out" object provides the method println() for displaying data to the screen (standard out).
- Standalone programs are run by giving the Java runtime the name of the class whose main() method is to be invoked. For example, at a Unix command line java -cp . WorldGreeting will start the above program (compiled into WorldGreeting.class) from the current directory. The name of the class whose main method is to be invoked can also be specified in the MANIFEST of a Java archive (jar) file.
International and worldwide use
The language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of Java 5, the language also supports using UTF-16 via surrogates. Java program source may therefore contain any Unicode character.
The following is thus perfectly valid java code; it contains Chinese characters in the class and variable names as well as in a string literal:
public class 你好世界
Miscellaneous
Although the language has special syntax for them, arrays and strings are not primitive types: they are reference types that can be assigned to java.lang.Object.
Criticism
Java was intended to serve as a novel way to manage software complexity. Most consider Java technology to deliver reasonably well on this promise. However, Java is not without flaws, and it does not universally accommodate all programming styles, environments, or requirements.
- Not all projects or environments require enterprise-level complexity, such as stand-alone websites or sole-proprietorship programmers. Such individuals find Java's self-enforcing complexity management to be overkill.
- Java is often a focal point of discontent for those who are not enthusiastic about object-oriented programming.
- Java can be considered a less pure object-oriented programming language than for instance Ruby or Smalltalk because it makes certain compromises (such as the fact that not all values are objects) for performance reasons.
- As an established technology, Java inevitably invites comparison with contemporary languages such as C++, C#, Python, and others. Commenting upon Java's proprietary nature, supposed inflexibility to change, and growing entrenchment in the corporate sector, some have said that Java is "the new COBOL". Many consider this to be a somewhat hyperbolic assertion, although it does allude to some legitimate concerns with Java's prospects for the future.
Language issues
- The division between primitive types and objects is disliked by programmers familiar with languages such as Smalltalk and Ruby where everything is an object.
- Conversely, C++ programmers can become confused with Java because in Java primitives are always automatic variables and objects always reside on the heap, whereas C++ programmers are explicitly given the choice in both cases by means of pointers.
- Java code is often more verbose than code written in other languages due to its frequent type declarations.
- Java is predominantly a single-paradigm language. Historically, it has not been very accommodating of paradigms other than object-oriented programming. As of version 5.0, the procedural paradigm is somewhat better supported in Java with the addition of the ability to import static methods and fields so that they can be used globally as one could do in, for example, C.
- Java is a single inheritance language. This causes consternation to programmers accustomed to multiple inheritance, which is available in many other languages. However, Java employs interface classes, which are argued to address certain issues with multiple inheritance while retaining some of its benefits.
- Java does not support user-definable operator overloading, unlike C++.
- Versions of Java before 5.0 required many explicit casts to be written due to the lack of generic types.
- Java's support of text matching and manipulation is not as strong as languages such as perl or PHP, although regular expressions were introduced in J2SE 1.4.
Library issues
The look and feel of GUI applications written in Java using the Swing platform is often different from native applications. While programmers can choose to use the AWT toolkit that displays native widgets (and thus look like the operating platform), the AWT toolkit is unable to meet advanced GUI programming needs by wrapping around advanced widgets and not sacrificing portability across the various supported platforms, each of which have vastly different APIs especially for higher-level widgets. The Swing toolkit, written completely in Java, avoids this problem by reimplementing widgets using only the most basic drawing mechanisms that are guaranteed available on all platforms. The drawback is that extra effort is required to resemble the operating platform. While this is possible (using the GTK+ and Windows Look-and-Feel), most users do not know how to change the default Metal Look-And-Feel to one that resembles their native platform, and as a result they are stuck with Java applications that look radically different from their native applications. Of note however, Apple Computer's own optimized version of the Java Runtime, which is included within the Mac OS X distribution, by default implements its "Aqua" Look-And-Feel, giving Swing applications instant familiarity to Mac users.
Some parts of the standard Java libraries are considered excessively complicated, or badly designed, but cannot be changed due to the need for backward compatibility.
Performance issues
Java has obtained a reputation for slow performance, primarily because most users have targeted the Java virtual machine rather than compiling the language directly to native machine code. Using a JVM imposes a fairly large speed penalty, either spread throughout the whole program (if using an interpreter JVM) or imposed once at class loading time (if using a JIT-compiling JVM). In the latter case, the penalty is particularly noticable in programs which run for only a short time.
Whether or not modern implementations of Java are significantly slower than other languages is still hotly debated. Many argue that this is a misconception based on old benchmarks and information produced by competitors. Nevertheless, use of Java for major desktop applications still remains rare, and for highly CPU-intensive applications the language is not used at all.
A number of language features unavoidably harm performance and memory usage, even if native compilation is used:
- Garbage collection
- Array bounds checking
- Run-time type checking
Java was designed with an emphasis on security and portability, so low-level features like hardware-specific data types and pointers to arbitrary memory were deliberately omitted. In low-level applications which require these features, they must be accessed by calling C code using the Java Native Interface (JNI), which can itself be a performance bottleneck.
Java Runtime Environment
The Java Runtime Environment or JRE is the software required to run any application deployed on the Java platform. End-users commonly use a JRE in software packages and plugins. Sun also distributes a superset of the JRE called the Java 2 SDK (more commonly known as the JDK), which includes development tools such as the Java compiler, Javadoc, and debugger.
;Components of the JRE
- Java libraries - which are the compiled byte codes of source developed by the JRE implementor to support application development in Java. Examples of these libraries are:
- The core libraries, which include:
- Collection libraries which implement data structures such as lists, dictionaries, trees and sets
- XML Parsing libraries
- Security
- Internationalization and Localization libraries
- The integration libraries, which allow the application writer to communicate with external systems. These libraries include:
- The Java Database Connectivity (JDBC) API for database access
- Java Naming and Directory Interface (JNDI) for lookup and discovery
- RMI and CORBA for distributed application development
- User Interface libraries, which include:
- The (heavyweight, or native) Abstract Windowing Toolkit (AWT), which provides GUI components, the means for laying out those components and the means for handling events from those components
- The (lightweight) Swing libraries, which are built on AWT but provide (non-native) implementations of the AWT widgetry
- APIs for audio capture, processing, and playback
- A platform dependent implementation of Java virtual machine (JVM) which is the means by which the byte codes of the Java libraries and third party applications are executed
- Plugins, which enable applets to be run in web browsers
- Java Web Start, which allows Java applications to be efficiently distributed to end users across the Internet
- Licensing and documentation
Extensions and related architectures
Extensions and architectures closely tied to the Java programming language include:
- J2EE (Enterprise edition)
- J2ME (Micro-Edition for PDAs and cellular phones)
- JMF (Java Media Framework)
- JNDI (Java Naming and Directory Interface)
- JSML (Java Speech API Markup Language)
- JDBC (Java Database Connectivity)
- JDO (Java Data Objects)
- JAI (Java Advanced Imaging)
- JAIN (Java API for Integrated Networks)
- JDMK (Java Dynamic Management Kit)
- Jini (a network architecture for the construction of distributed systems)
- Jiro
- Java Card
- JavaSpaces
- Java Modeling Language (JML)
- JMI (Java Metadata Interface)
- JMX (Java Management Extensions)
- JSP (JavaServer Pages)
- JSF (JavaServer Faces)
- JNI (Java Native Interface)
- JXTA (Open Protocols for P2P Virtual Network)
- J3D (A high level API for 3D graphics programming)
- JOGL (A low level API for 3D graphics programming, using OpenGL)
- OSGi (Dynamic Service Management and Remote Maintenance)
- SuperWaba (JavaVMs for handhelds)
See also
- Java virtual machine
- Java applet
- Comparison of Java to C++
- Optimization of Java
- Java Platform Debugger Architecture
- Join Java programming language
- List of Java-programs
- Java User Group
- Java XML
- Java Servlet
- Java 2 Platform, Standard Edition (J2SE)
- List of Java scripting languages
- Javapedia
- Java Community Process
- JavaOS
- Java keywords
- zAAP (Java processor)
- Microsoft J++
References
- Jon Byous, [http://java.sun.com/features/1998/05/birthday.html Java technology: The early years]. Sun Developer Network, no date [ca. 1998]. Retrieved April 22, 2005.
- James Gosling, [http://today.java.net/jag/old/green/ A brief history of the Green project]. Java.net, no date [ca. Q1/1998]. Retrieved April 22, 2005.
- James Gosling, Bill Joy, Guy Steele, and Gilad Bracha, The Java language specification, second edition. Addison-Wesley, 2000. ISBN 0201310082.
- James Gosling, Bill Joy, Guy Steele, and Gilad Bracha, The Java language specification, third edition. Addison-Wesley, 2005. ISBN 0321246780.
- Tim Lindholm and Frank Yellin. The Java Virtual Machine specification, second edition. Addison-Wesley, 1999. ISBN 0201432943.
Notes
# The device was named Star7 after a telephone feature activated by - 7 on a telephone keypad, which enabled users to answer the telephone anywhere.
External links
Sun
- [http://java.sun.com/ Official Java home site]
- [http://java.sun.com/docs/books/jls/ The Java Language Specification, Third edition] Authoritative description of the Java language
- [http://java.sun.com/j2se/1.5.0/docs/api/ J2SE API reference, v5.0]
- [http://java.sun.com/docs/books/tutorial/ Sun's tutorial on Java Programming]
- [http://java.sun.com/docs/white/langenv/ Original Java whitepaper], 1996
- [http://www.java.com/en/download/help/testvm.xml Test your Java VM]
Alternatives
- [http://www.blackdown.org Blackdown Java] for Linux, includes Mozilla plugin
Books
- [http://www.computer-books.us/java.php Computer-Books.us] A collection of Java books available for free download
- David Flanagan, [Java in a Nutshell, Third Edition]. O'Reilly & Associates, 1999. ISBN 1565924878
- [http://www.bruceeckel.com/ Thinking in Java], by Bruce Eckel
- [http://www.vias.org/javacourse/ Java Course] The well-known book of A.B. Downey as an HTMLHelp based eBook
General
- Newsgroup [news:comp.lang.java comp.lang.java] ([http://groups.google.com/groups?group=comp.lang.java Google Groups link]), and its [http://www.ibiblio.org/javafaq/javafaq.html FAQ]
- [http://wiki.java.net/bin/view/Javapedia/ Javapedia project]
- [http://wiki.java.net/bin/view/Main/WebHome The Java Wiki]
- [http://mindprod.com/jgloss/jgloss.html A Java glossary]
- [http://www.cookienest.com/content/manual-javabasics.php Java Basics Manual]
- [http://en.pure-java.de/ Java-API with examples]
- [http://www.whizlabs.com/jwhiz.html Java Exam Preparation]
- [http://ei.cs.vt.edu/~history/Youmans.Java.html Java: Cornerstone of the Global Network Enterprise]
- [http://www.inesystems.com Sun Certification Resource ]
Historical
- [http://java.sun.com/features/1998/05/birthday.html Java(TM) Technology: The Early Years]
- [http://java.sun.com/people/jag/green/ A Brief History of the Green Project]
- [http://www.cs.umd.edu/users/seanl/stuff/java-objc.html Java Was Strongly Influenced by Objective-C]
- [http://www.wired.com/wired/archive/3.12/java.saga.html The Java Saga]
- [http://ei.cs.vt.edu/~wwwbtb/book/chap1/java_hist.html A history of Java]
Criticism
- Paul Graham, [http://www.paulgraham.com/javacover.html Java's Cover]. Paulgraham.com, April 2001.
- Simson Garfinkel, [http://www.salon.com/tech/col/garf/2001/01/08/bad_java/ Java: Slow, ugly and irrelevant]. Salon.com, January 8, 2001.
- [http://www.gnu.org/philosophy/java-trap.html Free But Shackled — The Java Trap], by Richard Stallman, April 12, 2004. ([http://today.java.net/jag/page7.html#59 James Gosling's response])
- [http://www.idinews.com/darkside.pdf The Dark Side of Java] (PDF), by Conrad Weisert, August 29, 1997. ([http://www.google.com/search?q=cache:www.idinews.com/darkside.pdf View as HTML])
- [http://www.jwz.org/doc/java.html java sucks], by Jamie Zawinski, 2000.
- [http://www.pobox.com/~schwern/papers/Why_I_Am_Not_A_Java_Programmer/why.html Why I Am Not A Java Programmer], by [http://mungus.schwern.org/~schwern/resume/resume.pdf Michael G. Schwern].
Portals, magazines, etc.
- [http://www.3java.net/ 3java.net] A comprehensive directory of Java open source software
- [http://javaopen.net/ Open Source Java] Open source Java/J2EE roundup
- [http://www.esus.com Esus.com] A site containing thousands of categorized links and Q&A's
- [http://www.jexamples.com JExamples.com] A site to find examples of Java API's
- [http://www.java-tips.org Java Tips] A site containing hundreds of categorized tips about Java API's
- [http://www.theserverside.com/ TheServerSide.com] A popular Java J2EE portal
- [http://www.javalobby.org/ Javalobby] A popular forum for Java discussions
- [http://www.java.net/ Java.Net] A site for Java articles and upcoming projects
- [http://www.onjava.com/ OnJava.com] An O'Reilly site for Java with many good articles
- [http://www.indicthreads.com/ IndicThreads.com] An upcoming portal for Java and J2EE
- [http://www.javapro.com/ JavaPro magazine] A popular Java magazine
- [http://www.java.sys-con.com/ Java Developer's Journal] Online edition of a popular Java magazine
- [http://www.javaworld.com/ JavaWorld magazine] A popular Java magazine
- [http://www.JavaKB.com/ Java KB] Offers Java discussions, news, articles, and an open source project directory.
- [http://www.javasight.com/ JavaSight.com] Java news and books.
- [http://www.javarss.com/ JavaRSS.com] A Java portal of Java websites rich in Java & J2EE News, Articles, Blogs, Groups and Forums.
- [http://www.javagamedevelopment.net/ Java Game Development] Daily news and articles on Java game development
- [http://JavaToolbox.com JavaToolbox] List of the available development tools and libraries for Java/J2EE
- [http://www.javaranch.com JavaRanch] A friendly place for Java greenhorns
- [http://www.javafree.org JavaFree] A popular Java community
- [http://www.javabeat.net JavaBeat] Java Certifications Site
- [http://www.techbookreport.com/JavaIndex.html TechBookReport] Java book and software reviews
Category:Programming languages
Category:C programming language family
Category:Java platform
Category:Java programming language
Category:Sun Microsystems
ko:자바 프로그래밍 언어
ja:Java言語
th:ภาษาจาวา
JavaScriptJavaScript is an object-based scripting programming language based on the concept of prototypes. The language is best known for its use in websites, but is also used to enable scripting access to objects embedded in other applications. It was originally developed by Brendan Eich of Netscape Communications Corporation under the name Mocha, then LiveScript, and finally renamed to JavaScript. Like Java, JavaScript has a C-like syntax, but it has far more in common with the Self programming language than with Java.
As of 2005, the latest version of the language is JavaScript 1.5, which corresponds to ECMA-262 Edition 3. ECMAScript, in simple terms, is a standardized version of JavaScript. Mozilla versions since 1.8 Beta 1 also have partial support of E4X, which is a language extension dealing with XML, defined in the ECMA-357 standard.
Java, JavaScript, and JScript
The change of name from LiveScript to JavaScript happened at roughly the time when Netscape was including support for Java technology in its Netscape Navigator web browser. JavaScript was first introduced and deployed in the Netscape browser version 2.0B3 in December of 1995. The choice of name proved to be a source of much confusion. There is no real relationship between Java and JavaScript; their similarities are mostly in syntax (that is, both derived from C). Their semantics are quite different: notably, their object models are unrelated and largely incompatible. Also worth mentioning is Microsoft's own VBScript, which, like JavaScript, is mainly used in web pages. VBScript's syntax derives from Visual Basic, and is only available on Internet Explorer.
Due to the success of JavaScript, Microsoft developed a compatible language known as JScript. JScript was first supported in the Internet Explorer browser version 3.0 released in August, 1996. When web developers talk about using JavaScript in the IE browser, they usually mean JScript. The need for common specifications for the two languages was the basis of the ECMA 262 standard for ECMAScript (see external links below), three editions of which have been published since the work started in November 1996 (and which in turn set the stage for the standardization of C# a few years later). One term often related to JavaScript, the Document Object Model (DOM), is actually not part of the ECMAScript standard; it's rather a standard on its own, developed by the W3C, and closely related to XML.
Usage
JavaScript is a prototype-based scripting language with a syntax loosely based on C. Like C, it has the concept of reserved keywords, which (being executed from source) means it is almost impossible to extend the language without breakage.
Also like C, the language has no input or output constructs of its own.
Where C relies on standard I/O libraries, a JavaScript engine relies on a host program into which it is embedded. There are many such host programs, of which web technologies are the most well known examples. These are examined first.
JavaScript embedded in a web browser connects through interfaces called Document Object Model (DOM) to applications, especially to the server side (web servers) and the client side (web browsers) of web applications. Many web sites use client-side JavaScript technology to create powerful dynamic web applications.
It may use unicode and can evaluate regular expressions (introduced in version 1.2 in Netscape Navigator 4 and Internet Explorer 4). JavaScript expressions contained in a string can be evaluated using the eval function.
One major use of web-based JavaScript is to write functions that are embedded in or included from HTML pages and interact with the DOM of the page to perform tasks not possible in static HTML alone, such as opening a new window, checking input values, changing images as the mouse cursor moves over, etc. Unfortunately, the DOM interfaces in various browsers differ and don't always match the W3C DOM standards. Different browsers expose different objects and methods to the script. It is therefore often necessary to write different variants of a JavaScript function for the various browsers, though this situation is improving. Major design methodologies using JavaScript to interact with DOM include DHTML, Ajax, and SPA.
Outside of the Web, JavaScript interpreters are embedded in a number of tools. Adobe Acrobat and Adobe Reader support JavaScript in PDF files. The Mozilla platform, which underlies several common web browsers, uses JavaScript to implement the user interface and transaction logic of its various products. JavaScript interpreters are also embedded in proprietary applications that lack scriptable interfaces. Dashboard Widgets in Apple's Mac OS X v10.4 are implemented using JavaScript. Microsoft's Active Scripting technology supports JavaScript-compatible JScript as an operating system scripting language. JScript .NET is a CLI-compliant language that is similar to JScript, but has further object oriented programming features.
Each of these applications provides its own object model which provides access to the host environment, with the core JavaScript language remaining mostly the same in each application.
Core language elements
Whitespace
Spaces, tabs, newlines and comments used outside string constants are called whitespace. Unlike C, whitespace in JavaScript source can directly impact semantics. Because of a technique called "semicolon insertion", any statement that is well formed when a newline is parsed will be considered complete (as if a semicolon were inserted just prior to the newline). Programmers are advised to supply statement terminating semicolons explicitly to enhance readability and lessen unintended effects of the automatic semicolon insertion.
Unnecessary whitespace, whitespace characters that are not needed for correct syntax, can increase the amount of wasted space, and therefore the file size of .js files. Where file compression techniques that remove unnecessary whitespace are used, performance can be improved if the programmers have included these so-called 'optional' semicolons.
Comment syntax is the same as in C++. That is, either blocked comments as / - ... - / or "rest of line" comments delimited by "//" .
Variables
Variables are generally dynamically typed.
Variables are defined by either just assigning them a value or by using the var statement.
Variables declared outside of any function, and variables declared without the var statement, are in "global" scope, visible in the entire web page; variables declared inside a function with the var statement are local to that function.
To pass variables from one page to another, a developer can set a cookie or use a hidden frame or window in the background to store them. This feature is not a part of JavaScript language, rather, it is part of the browser DOM.
Arithmetic
Numbers in JavaScript are represented in binary as IEEE-754 Doubles, which provides an accuracy to about 14 or 15 significant digits [http://www.jibbering.com/faq/#FAQ4_7 JavaScript FAQ 4.7]. Because they are binary numbers, they do not always exactly represent decimal numbers, particularly fractions.
This becomes an issue when formatting numbers for output (JavaScript has no methods to format number for output) For example:
alert(0.94 - 0.01) // displays 0.9299999999999999
As a result, rounding should be used whenever numbers are [http://www.jibbering.com/faq/#FAQ4_6 formatted for output]. The toFixed() method is not part of the ECMAScript specification and is implemented differently in various environments, so it can't be relied upon.
The '+' operator is overloaded; it is used for string concatenation and arithmetic addition and also to convert strings to numbers (not to mention that it has special meaning when used in a regular expression).
// Concatenate 2 strings
var a = 'This';
var b = ' and that';
alert(a + b); // displays 'This and that'
// Add two numbers
var x = 2;
var y = 6;
alert(x + y); // displays 8
// Adding a number and a string results in concatenation
alert( x + '2'); // displays 22
// Convert a string to a number
var z = '4'; // z is a string (the digit 4)
alert( z + x) // displays 42
alert( +z + x) // displays 6
Objects
For convenience, Types are normally subdivided into primitives and objects. Objects are entities that have an identity (they are only equal to themselves) and that map property names to values, ("slots" in prototype-based programming terminology). JavaScript objects are often mistakenly described as associative arrays or hashes, but they are neither.
JavaScript has several kinds of built in objects, namely Array, Boolean, Date, Function, Math, Number, Object, RegExp and String. Other objects are "host objects", defined not by the language but by the runtime environment. For example, in a browser, typical host objects belong to the DOM (window, form, links etc.).
Creating objects
Objects can be created using a declaration, an initialiser or a constructor function:
// Declaration
var anObject = new Object();
// Initialiser
var objectA = ;
var objectB = ;
// Constructor (see below)
Constructors
Constructor functions are a way to create multiple instances or copies of the same object. JavaScript is a prototype based object-based language. This means that inheritance is between objects, not between classes (JavaScript has no classes). Objects inherit properties from their prototypes.
Properties and methods can be added by the constructor, or they can be added and removed after the object has been created. To do this for all instances created by a single constructor function, the prototype property of the constructor is used to access the prototype object. Object deletion is not mandatory as the scripting engine will garbage collect any variables that are no longer being referenced.
Example: Manipulating an object
// constructor function
function MyObject(attributeA, attributeB)
// create an Object
obj = new MyObject('red', 1000);
// access an attribute of obj
alert(obj.attributeA);
// access an attribute using square bracket notation
alert(obj["attributeA"]);
// add a new property
obj.attributeC = new Date();
// remove a property of obj
delete obj.attributeB;
// remove the whole Object
delete obj;
JavaScript supports inheritance hierarchies through prototyping. For example:
function Base()
function Derive()
Derive.prototype = new Base();
d = new Derive();
d.Override();
d.BaseFunction();
d.__proto__.Override(); // mozilla only
will result in the display:
Derive::Override()
Base::BaseFunction()
Base::Override() // mozilla only
Object hierarchy may also be created without prototyping:
function red()
function blue()
function black ()
function anyColour()
var hugo = new anyColour()
hugo.sayRed()
hugo.sayBlue()
hugo.sayBlack()
hugo.sayPink()
alert(hugo.anotherName)
Data structures
A typical data structure is the Array, which is a map from integers to values. In JavaScript, all objects can map from integers to values, but
Arrays are a special type of object that has extra behavior and methods specializing in integer indices (e.g., join, slice, and
push).
Arrays have a length property that is guaranteed to always be larger
than the largest integer index used in the array. It is automatically updated if one creates a property with an even larger index. Writing a smaller number to the length property will remove larger indices. This length property is the only special feature of Arrays that distinguishes it from other objects.
Elements of Arrays may be accessed using normal object property access notation:
myArray[1]
myArray["1"]
These two are equivalent. It's not possible to use the "dot"-notation or strings with alternative representations of the number:
myArray.1 (syntax error)
myArray["01"] (not the same as myArray[1])
Declaration of an array can use either an Array literal or the Array constructor:
myArray = [0,1,,,4,5]; (array with length 6 and 4 elements)
myArray = new Array(0,1,2,3,4,5); (array with length 6 and 6 elements)
myArray = new Array(365); (an empty array with length 365)
Arrays are implemented so that only the elements defined use memory; they are "sparse arrays". Setting myArray[10] = 'someThing' and myArray[57] = 'somethingOther' only uses space for these two elements, just like any other object. The length of the array will still be reported as 58.
Object literals allow one to define generic structured data:
var myStructure = ;
This syntax has its own standard, JSON.
Control structures
If … else
if (condition) else
Also known as the ternary operator
condition ? statement : statement;
while (condition)
do while (condition);
for ([initial-expression]; [condition]; [increment-expression])
For ... in loop
This loop goes through all enumerable properties of an object (or elements of an array).
for (slot in object)
switch (expression)
Functions
A function is a block with a (possibly empty) argument list that is normally given a name.
A function may give back a return value.
function function-name(arg1, arg2, arg3)
Example: Euclid's original algorithm of finding the greatest common divisor. (This is a geometrical solution which subtracts the shorter segment from the longer):
function gcd(segmentA, segmentB)
The number of arguments given when calling a function may not necessarily correspond to the number of arguments in the function definition; a named argument in the definition that does not have a matching argument in the call will have the value undefined. Within the function the arguments may also be accessed through the arguments list (which is an object); this provides access to all arguments using indices (e.g. arguments[0], arguments[1], ... arguments[n]), including those beyond the number of named arguments.
Basic data types (strings, integers, ...) are passed by value wheras objects are passed by reference.
Functions as objects and anonymous functions
Functions are first-class objects in JavaScript. Every function is an instance of Function, a type of base object. Functions can be created and assigned like any other objects, and passed as arguments to other functions. Thus JavaScript supports higher-order functions. For example:
Array.prototype.fold =
function (value, functor)
var sum = [1,2,3,4,5,6,7,8,9,10].fold(0, function (a, b) )
results in the value:
55
Since Function can be instantiated, JavaScript allows the creation of anonymous functions, which can also be created using function, e.g.:
new Function( "return 1;" )
function()
The implicit object available within the function is the receiver object.
In the example below, the value of the alert property is an anonymous function:
function Point( x, y )
Point.prototype.alert = function()
var pt = new Point( 1, 0 );
pt.alert();
Methods can also be added within the constructor:
function Point( x, y )
var pt = new Point( 1, 0 );
pt.alert();
There is no need to use a constructor if only a single instance of an object is required - properties and values can be added directly using an initialiser:
var pt =
pt.alert();
Members declared as variables in the constructor are private; members assigned to this are public. Methods added or declared in the constructor have access to all private members; public methods added outside the constructor don't.
function myClass()
myObj = new myClass;
myObj.anotherPublicMember = function()
Note: The functions '__defineSetter__' and '__defineGetter__' are implementation-specific and not part of the ECMAScript standard.
For detailed control of member access, getters and setters can be used (e.g. to create a read only property or a property that the value is generated):
function Point( x, y )
Point.prototype.__defineGetter__(
"dimensions",
function()
);
Point.prototype.__defineSetter__(
"dimensions",
function( dimensions )
);
var pt = new Point( 1, 0 );
window.alert( pt.dimensions.length );
pt.dimensions = [2,3];
Depending on the development environment debugging used to be difficult. Since errors in JavaScript only appear in run-time (i.e., there is no way to check for errors without executing the code), and since JavaScript is interpreted by the web browser as the page is viewed, it may be difficult to track the cause for errors. However nowadays the Gecko-based browsers come with a fairly good debugger (Venkman) and a DOM inspector.
Newer versions of JavaScript (as used in Internet Explorer 5 and Netscape 6) include a try ... catch error handling statement. Purloined from the Java programming language, this is intended to help with run-time errors but does so with mixed results.
The try ... catch ... finally statement catches exceptions resulting from an error or a throw statement. Its syntax is as follows:
try catch(error) finally
Initially, the statements within the try block execute. If an exception is thrown, the script's control flow immediately transfers to the statements in the catch block, with the exception available as the error argument. Otherwise the catch block is skipped. Once the catch block finishes, or the try block finishes with no exceptions thrown, then the statements in the finally block execute. This is generally used to free memory that may be lost if a fatal error occurs—though this is less of a concern in JavaScript. This figure summarizes the operation of a try...catch...finally statement:
try
catch (...)
finally
The finally part may be omitted:
try
catch (err)
Error Scope
Scripting languages are especially susceptible to bugs, and since JavaScript has varying implementations it is common to spend a great deal of time debugging. Each script block is parsed separately. On pages where JavaScript in script blocks is mixed with HTML, syntax errors can be identified more readily by keeping discrete functions in separate script blocks, or (for preference), using many small linked .js files. This way, a syntax error will not cause parsing/compiling to fail for the whole page, and can enable a dignified die.
Offspring
The programming language used in Macromedia Flash (called ActionScript) bears a resemblance to JavaScript. ActionScript has similar syntax to JavaScript, but the object model is dramatically different.
JSON, or JavaScript Object Notation, is a general-purpose data interchange format that is defined as a subset of JavaScript.
JavaScript OSA (JavaScript for OSA, or JSOSA), is a Macintosh scripting language based on the Mozilla 1.5 JavaScript implementation, SpiderMonkey. It is a freeware component made available by Late Night Software. Interaction with the operating system and with third-party applications is scripted via a MacOS object. Otherwise, the language is virtually identical to the core Mozilla implementation. It was offered as an alternative to the more commonly used AppleScript language.
Of only historical interest now, ECMAScript was included in the VRML97 standard for scripting nodes of VRML scene description files.
See also
- Client-side JavaScript
- Server-side JavaScript
- JavaScript engine
- Dynamic HTML
- Single Page Application
- CorbaScript
- LiveConnect
- List of JavaScript engines
References
- Nigel McFarlane: Rapid Application Development with Mozilla, Prentice Hall Professional Technical References, ISBN 0131423436
- David Flanagan, Paula Ferguson: JavaScript: The Definitive Guide, O'Reilly & Associates, ISBN 0596000480
- Danny Goodman, Scott Markel: JavaScript and DHTML Cookbook, O'Reilly & Associates, ISBN 0596004672
- Danny Goodman, Brendan Eich: JavaScript Bible, Wiley, John & Sons, ISBN 0764533428
- Andrew H. Watt, Jinjer L. Simon, Jonathan Watt: Teach Yourself JavaScript in 21 Days, Pearson Education, ISBN 0672322978
- Thomas A. Powell, Fritz Schneider: JavaScript: The Complete Reference, McGraw-Hill Companies, ISBN 0072191279
- Scott Duffy: How to do Everything with JavaScript, Osborne, ISBN 0072228873
- Andy Harris, Andrew Harris: JavaScript Programming, Premier Press, ISBN 0761534105
- Joe Burns, Andree S. Growney, Andree Growney: JavaScript Goodies, Pearson Education, ISBN 0789726122
- Gary B. Shelly, Thomas J. Cashman, William J. Dorin, Jeffrey Quasney: JavaScript: Complete Concepts and Techniques, Course Technology, ISBN 0789562332
- Nick Heinle, Richard Koman: Designing with JavaScript, O'Reilly & Associates, ISBN 1565923006
- Sham Bhangal, Tomasz Jankowski: Foundation Web Design: Essential HTML, JavaScript, CSS, PhotoShop, Fireworks, and Flash, APress L. P., ISBN 1590591526
- Emily Vander Veer: JavaScript For Dummies, 4th Edition, Wiley, ISBN 0764576593
External links
Specifications
- [http://mozilla.org/js/language/js20/ Proposal for JavaScript 2.0]
- [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference Reference for JavaScript 1.5]
- [http://research.nihonsoft.org/javascript/CoreReferenceJS14/ Reference for JavaScript 1.4]
- [http://research.nihonsoft.org/javascript/ClientReferenceJS13/ Reference for JavaScript 1.3]
- [http://research.nihonsoft.org/javascript/jsref/index.htm Reference for JavaScript 1.2]
- [http://research.nihonsoft.org/javascript/CoreGuideJS15/index.html Guide for JavaScript 1.5]
- [http://research.nihonsoft.org/javascript/CoreGuideJS14/index.html Guide for JavaScript 1.4]
- [http://research.nihonsoft.org/javascript/ClientGuideJS13/index.html Guide for JavaScript 1.3]
- [http://research.nihonsoft.org/javascript/jsguide/index.htm Guide for JavaScript 1.2]
- [http://wp.netscape.com/eng/mozilla/3.0/handbook/javascript/index.html Guide for JavaScript 1.1 as used by Navigator 3.x]
- [http://e-pla.net/documents/manuals/javascript-1.0/index.html Guide for JavaScript 1.0]
Documentations
- [http://developer.mozilla.org/en/docs/JavaScript Mozilla JavaScript Language Documentation]
- [http://www.jibbering.com/faq/ The official comp.lang.javascript FAQ]
- [http://www.javascript-tutorial.com.br/content-cat-2.html Basic JavaScript Tutorial]
- [http://www.yourhtmlsource.com/javascript/ HTML Source JavaScript Tutorials]
- [http://www.remast.de/javascript.php JavaScript functional programming Tutorial]
- [http://www.w3schools.com/js/ W3Schools.com JavaScript Tutorial]
- [http://www.openjsan.org/ JavaScript Archive Network]
- [http://www.crockford.com/javascript/ Douglas Crockford's JavaScript page]
Common Problems
- [http://www.softwaresecretweapons.com/jspwiki/Wiki.jsp?page=JavascriptStringConcatenation Performance impact of JavaScript string concatenations]
History
- [http://wp.netscape.com/comprod/columns/techvision/innovators_be.html Innovators of the Net: Brendan Eich and JavaScript] (Marc Andreesen, Netscape TechVision, 24 Jun 1998)
- [http://inventors.about.com/library/inventors/bl_javascript.htm Brendan Eich and JavaScript] (about.com)
- [http://weblogs.mozillazine.org/roadmap/archives/008325.html Brendan's Roadmap Updates: JavaScript 1, 2, and in between] - the author's blog entry
Category:JavaScript programming language
Category:Curly bracket programming languages
Category:Domain-specific programming languages
Category:Prototype-based programming languages
Category:Object-based programming languages
Category:Scripting languages
ko:자바스크립트
ja:JavaScript
th:จาวาสคริปต์
Server side redirectA server side redirect is computer code that operates on a web server rather than the user's web browser to redirect the user to a second web page with a different URL.
Common uses of server-side redirects include:
- redirecting people away from a discontinued server
- load balancing
- redirecting to error pages if a discontinued URL is used
- link use tracking, as done at the AltaVista search engine
- spamdexing
Cloaking:For the cloaking used in science fiction, see invisibility.
Cloaking is a search engine optimization technique in which the content presented to the search engine spider is different from that presented to the users' browser; this is done by delivering content based on the IP addresses or the User-Agent HTTP header of whatever is requesting the page. The only legitimate uses for cloaking used to be for delivering content to users that search engines couldn't parse, like Macromedia Flash. However, cloaking is often used to try to trick search engines into giving the relevant site a higher ranking; it can also be used to trick search engine users into visiting a site based on the search engine description which site turns out to have substantially different - or even pornographic - content. For this reason some search engines threaten to ban sites using cloaking.
Cloaking is a form of doorway pages technique.
A similar technique is also used on the Open Directory Project web directory. It differs in several ways from search engine cloaking:
- It is intended to fool human editors, rather than computer search engine spiders.
- The decision to cloak or not is based upon the HTTP referrer, which tells the URL of the page on which a user clicked a link to get to the page. Some cloakers will give the fake page to anyone who comes from a web directory website, since directory editors will usually examine sites by clicking on links that appear on a directory webpage. Other cloakers give the fake page to everyone except those coming from a major search engine; this makes it harder to detect cloaking, while not costing them many visitors, since most people find websites by using a search engine.
In more recent times several well known and well respected sites have taken up cloaking to deliver personalised content to their regular customers. In fact, many of the top 1000 sites - including household names like Amazon.com - actively cloak. None of these have been banned from search engines purely because of cloaking.
See also
- Keyword stuffing
- Link farms
External links
- [http://www.ewebmastersforum.com Cloaking Forum]
- [http://www.searchengineworld.com/misc/cloaking.htm Cloaking and the Big Picture] - Covers types of cloaking, etcetra
- [http://www.searchenginewatch.com/searchday/article.php/2157141 What Search Engines See Isn't Always What You Get]
- [http://www.searchenginewatch.com/searchday/01/sd0709-cloaking.html In Defense of Search Engine Cloaking], a response to the above article
- [http://searchenginewatch.com/searchday/article.php/2157341 Search Engine Cloaking: The Controversy Continues], a response to In Defense of Search Engine Cloaking
- [http://linkcloak.com/ Link Cloaking]
Category:Search engine optimization
Google:For the search engine produced by this company, see Google search; for the underlying technology, see Google platform; for other uses see Google (disambiguation).
Google, Inc. () is a U.S. public corporation, initially established as a privately held corporation in 1998, which designed and currently manages the Internet Google search engine. Google's corporate headquarters is at the "Googleplex" in Mountain View, California and employs almost 5,000 workers. Dr. Eric Schmidt, former CEO of Novell, was named the Chief Executive Officer when co-founder Larry Page stepped down. The company's overview web page states that "Google's mission is to organize the world's information and make it universally accessible and useful."
History
Beginnings
Larry Page
Google began as a research project in January 1996 [http://www.google.com/intl/en/corporate/history.html] by Larry Page and Sergey Brin, two Ph.D. students at Stanford. They developed the hypothesis that a search engine based on analysis of the relationships between Web sites would produce improved results over the basic techniques then in use. (At the time, other search engines ranked results essentially based on how many times the search term appeared on a page.) It was originally nicknamed BackRub because the system checked backlinks to estimate a site's importance. (A small search engine called RankDex was already exploring a similar strategy.)
Convinced that the pages with the most links to them from other highly relevant Web pages must be the most relevant pages associated with the search, Page and Brin tested their thesis as part of their studies, and laid the foundation for their search engine. Originally the search engine used the Stanford website with the domain google.stanford.edu (see the [http://www.archive.org/web/web.php Internet Archive Wayback Machine] search for [http://web.archive.org/web/ - /http://google.stanford.edu http://google.stanford.edu]). The domain google.com was registered on September 15, 1997. They formally incorporated their company, Google Inc., on September 7, 1998 at a friend's garage in Menlo Park, California.
In March 1999, the company moved into offices at 165 University Avenue in Palo Alto, home to a number of other noted Silicon Valley technology startups. Google received a big break in 1999 when one of the most popular search engines, AltaVista, relaunched itself as a user Web entry point, or portal. This unexpected change alienated part of AltaVista's user base. Google quickly outgrew its University Avenue home. After outgrowing two subsequent sites, the company settled into a complex of buildings (referred to by some as "The Googleplex") in Mountain View at 1600 Amphitheater Parkway, in 2003.
The Google search engine attracted a loyal following among the growing number of Internet users. They were attracted to its simple, uncluttered, clean design — a competitive advantage to attract users who did not wish to enter searches on web pages filled with visual distractions. This appearance, while imitating the early AltaVista, had behind it Google's unique search capabilities. In 2000, Google began selling advertisements associated with the search keyword to produce enhanced search results for the user. This strategy was important for increasing advertising revenue, which is based upon the number of "hits" users make upon ads. The ads were text-based in order to maintain an uncluttered page design and to maximize page loading speed. It also only cost a very small amount per click to the websites that advertised this way. The model of selling keyword advertising was originally pioneered by Goto.com (renamed Overture, and now Yahoo! Search Marketing)[http://www.content.overture.com/d/USm/about/news/mile.jhtml]. While many of its dot-com rivals failed in the new Internet marketplace, Google quietly rose in stature while generating revenue.
describing Google's ranking mechanism (PageRank) was granted on September 4 2001. The patent was officially assigned to Stanford University and lists Lawrence Page as the inventor.
In February 2003, Google acquired Pyra Labs, owner of Blogger, a pioneering and leading weblog hosting Web site. Some analysts considered the acquisition inconsistent with Google's business model. However, the acquisition secured the company's competitive ability to use information gleaned from blog postings to improve the speed and relevance of articles contained in a companion product to the search engine, Google News.
At its peak in early 2004, Google handled upwards of 84.7 percent of all search requests on the World Wide Web through its Web site and through its partnerships with other Internet clients like Yahoo!, AOL, and CNN.[http://www.onestat.com/html/aboutus_pressbox21.html] In February 2004 Yahoo! dropped its partnership with Google in order to provide users at its site independent search results and to maintain their loyalty. Google lost user share of the search market. Yet Yahoo!'s move highlighted Google's own distinctiveness and today the verb "to google" has entered a number of languages first as a slang verb and now as a standard word meaning, "to perform a web search".
Google's declared code of conduct is "Don't Be Evil", a phrase which they went so far as to include in their prospectus (aka "red herring" or "S-1") for their IPO, noting "We believe strongly that in the long term, we will be better served — as shareholders and in all other ways — by a company that does good things for the world even if we forgo some short term gains."
IPO
The Google site includes humorous features such as cartoon modifications [http://www.google.com/holidaylogos.html] of the Google logo to recognize special occasions and anniversaries, known as "Google Doodles". Not only may decorative drawings be attached to the logo, but as well the font design may mimic a fictional or humorous language such as the Star Trek Klingon[http://www.google.com/intl/xx-klingon/] and Leet[http://www.google.com/intl/xx-hacker/]. The logo is notorious among web users for April Fool's Day tie-ins and jokes about the company.
Analysts speculate that Google's response to Yahoo! will be to continue to make technical and visual enhancements to personalized searches, using the personal data that is gathering from Orkut, Gmail, and Froogle to produce unique results based on the user. Frequently, new Google enhancements or products appear in its inventory. Products and demos [http://labs.google.com/ Google Labs], the experimental section of Google.com help Google maximize its relationships with its users by including them in the beta development, design and testing stages of new products and enhancements of already existing ones.
Original Hardware
The [http://web.archive.org/web/19990209043945/google.stanford.edu/googlehardware.html original hardware] used by Google included:
- Sun Ultra II with dual 200MHz processors, and 256MB of RAM. This was the main machine for the original Backrub system.
- 2 x 300 MHz Dual Pentium II Servers donated by Intel, they included 512MB of RAM and 9 x 9GB hard drives between the two. It was on these that the main search ran.
- F50 IBM RS6000 donated by IBM, included 4 processors, 512MB of memory and 8 x 9GB hard drives.
- Two additional boxes included 3 x 9GB hard drives and 6 x 4GB hard drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
- IBM disk expansion box with another 8 x 9GB hard drives donated by IBM.
- Homemade disk box which contained 10 x 9GB SCSI hard drives
Logo Evolutions
The [http://www.google.com/intl/en/stickers.html Google logo] has changed over the years. The following are the official Google logos.
Image:googlelogo_5.jpg|Late 1996
Image:googlelogo_6.jpg|1998 - July 1999
Image:googlelogo_current.gif|July 1999 - Present
Google is also known for its innovative holiday logos; [http://www.google.com/holidaylogos.html see their logo archive]. A website has been created that relives these imaginative logos by displaying them randomly on every page-load: [http://google.abrahamjoffe.com.au/ Holiday Google].
The site [http://www.logoogle.com/ logoogle] contains images users have made about google
Etymology
The name "Google" is a play on the word "Googol", which was coined by Milton Sirotta, nine-year-old nephew of U.S. mathematician Edward Kasner in 1938, to refer to the number represented by 1 followed by one hundred zeros.
Google's use of the term reflects the company's mission to organize the immense amount of information available on the Web. As a further play on this, Google's headquarters are referred to as "the Googleplex" — a googolplex being 1 followed by a googol of zeros, and the HQ being a complex of buildings (cf. multiplex, cineplex, etc).
The name has also been interpreted as a merging of the words "Go ogle", though this is widely accepted to be coincidental. The term appears in James Joyce's Finnegans Wake: "who thought him a Fonar all, feastking of shellies by googling Lovvey" [231.12]. To "throw a googly" means to ask a difficult or unanswerable question in British slang, a googly being a tricky ball in the game of cricket. Chambers Twentieth Century Dictionary (1972) allows the verb "to google" from this, and the phrase has come to be synonymous with "to search for on the Internet".
Financing and IPO
The first funding for Google as a company was secured in the form of a $100,000 check from Andy Bechtolsheim, co-founder of Sun Microsystems, made out to a corporation which didn't yet exist. After a frantic few weeks, this was topped up to give an initial investment of almost $1 million. Around six months later, a much larger round of funding was announced, with the major investors being rival venture capital firms Kleiner Perkins Caufield & Byers and Sequoia Capital.
In October 2003, while discussing a possible IPO (Initial Public Offering of shares), Microsoft approached the company about a possible partnership or merger; no such deal ever materialized. In January 2004, Google announced the hiring of Morgan Stanley and Goldman Sachs Group to arrange an IPO. That IPO (one of the most anticipated in history) was projected to raise as much as $4 billion. According to a banker involved in the transaction, the deal would yield an estimated $12 billion market capitalization for Google.
On April 29, 2004, Google made an S-1 form SEC filing for an IPO to raise as much as USD $2,718,281,828 (with a touch of mathematical humor as e = 2.718281828...). April 29th was the 120th day of 2004, and according to section 12(g) of the Securities Exchange Act of 1934, "a company must file financial and other information with the SEC 120 days after the close of the year in which the company reaches $10 million in assets and/or 500 shareholders, including people with stock options.[http://management.itmanagersjournal.com/article.pl?sid=04/05/21/1934249&tid=103&tid=4] Google has stated in its Annual filing for 2004 that every one of its 3,021 employees, "except temporary employees and contractors, are also equity holders, with significant collective employee ownership", so Google would have needed to make its financial information public by filing them with the SEC regardless of whether or not they intended to make a public offering. As Google stated in the filing, their "growth has reduced some of the advantages of private ownership. By law, certain private companies must report as if they were public companies. The deadline imposed by this requirement accelerated our decision." The SEC filing revealed that Google turned a profit every year since 2001 and earned a profit of $105.6 million on revenues of $961.8 million during 2003.
In May 2004, Google officially cut Goldman Sachs from the IPO, leaving Morgan Stanley and Credit Suisse First Boston as the joint underwriters. They chose the unconventional way of allocating the initial offering through an auction (specifically, a "Dutch auction"), so that "anyone" would be able to participate in the offering. The smallest required account balances at most authorized online brokers that are allowed to participate in an IPO, however, are around $100,000. In the run-up to the IPO the company was forced to slash the price and size of the offering, but the process did not run into any technical difficulties or result in any significant legal challenges. The initial offering of shares was sold for $85 a piece. The public valued it at $100.34 at the close of the first day of trading which saw 22,351,900 shares change hands.
Before Google initiated its initial public offering, Larry Page & Sergey Brin faced legal action for giving Playboy an interview about themselves and Google. The SEC (Security & Exchange Commission) forbids giving out information pertaining to a company's specifications before an IPO is launched.
After some initial stumbles, Google's initial public offering took place on August 19, 2004. 19,605,052 shares were offered at a price of $85 per share. Of that, 14,142,135 (another mathematical reference as √2 = 1.4142135...) were floated by Google and 5,462,917 by selling stockholders. The sale raised $1.67 billion, of which approximately $1.2 billion went to Google. The vast majority of Google's 271 million shares remained under Google's control. The IPO gave Google a market capitalization of more than $23 billion. Many of Google's employees became instant paper millionaires. Yahoo!, a competitor of Google, also benefited from the IPO because it owns 2.7 million shares of Google. The company was listed on the NASDAQ stock exchange under the ticker symbol GOOG.
On August 18 2005 (one year after the initial IPO), Google announced that it would sell 14,159,265 (a mathematical joke, see pi) more shares of its stock to raise money. The move would double Google's cash stockpile to $7 billion. Google said it would use the money for "acquisitions of complementary businesses, technologies or other assets". [http://informationweek.com/story/showArticle.jhtml?articleID=169400356]
Today
Since the IPO, Google's stock market capitalization has risen greatly and the stock price has more than quadrupled. On August 19, 2004 the number of shares outstanding was 172.85 million while the "free float" was 19.60 million (which makes 89% held by insiders). In January 2005 the shares outstanding was up 100 million to 273.42 million, 53% of that was held by insiders which made the float 127.70 million (up 110 million shares from the first trading day). The two founders are said to hold almost 30% of the outstanding shares. The actual voting power of the insiders is much higher, however, as Google has a dual class stock structure in which each Class B share gets ten votes compared to each Class A share getting one. Page says in the prospectus that Google has "a dual class structure that is biased toward stability and independence and that requires investors to bet on the team, especially Sergey and me." The company has not reported any treasury stock holdings as of the Q3 2004 report.
On June 1, 2005, Google shares gained nearly 4 percent after Credit Suisse First Boston raised its price target on the stock to $350. On the same day, rumors circulated in the financial community that Google would soon be included in the S&P 500. (Source: ) When companies are first listed on the S&P 500 they typically experience a bump in share price. On June 7, 2005, Google was valued at nearly $52 billion, making it one of the world's biggest media companies by stock market value.
With Google's increased size comes more competition from large mainstream technology companies. One such example is the rivalry between Microsoft and Google [http://www.pcmag.com/article2/0,1759,1706872,00.asp]. Microsoft has been touting its MSN Search engine to counter Google's competitive position. Furthermore, the two companies are increasingly offering overlapping services, such as webmail (Gmail vs. Hotmail), search (both online and local desktop searching), and other applications (for example, Microsoft's Virtual Earth competes with Google Earth). Some have even suggested that in addition to an Internet Explorer replacement Google is designing its own Linux based operating system called Google OS to directly compete with Microsoft Windows. Rumors of a Google browser are fueled by the fact that Google is the owner of the domain name "gbrowser.com". This corporate feud is most directly expressed in hiring offers and defections. Many Microsoft employees who worked on Internet Explorer have left to work for Google. This feud boiled over into the courts when Kai-Fu Lee, a former vice-president of Microsoft, quit Microsoft to work for Google. Microsoft sued to stop his move by citing Lee's non-compete contract (he had access to much sensitive information regarding Microsoft's plans in China). [http://today.reuters.com/news/newsArticle.aspx?type=internetNews&storyID=2005-09-02T215817Z_01_MCC278865_RTRIDST_0_NET-MICROSOFT-GOOGLE-DC.XML] The case is still in the courts.
While Google is the #1 search engine, the company struggles to keep up with rivals such as the well known Yahoo. Although Google and Yahoo differ greatly in the services they offer, Google is trying to redefine itself from an Internet search company to an Internet media company, similar to Yahoo!. Google is trying to become a jack of all trades for the Internet. They are foraying into other businesses which other companies have recently dominated. On June 21 2005 Google announced it has plans to release a pay service and a classified ads service, to rival companies like eBay [http://www.iht.com/articles/2005/06/20/business/google.php].
During the third quarter 2005 Google Conference Call, Eric Schmidt said, "We don't do the same thing as everyone else does. And so if you try to predict our product strategy by simply saying well so and so has this and Google will do the same thing, it's almost always the wrong answer. We look at markets as they exist and we assume they are pretty well served by their existing players. We try to see new problems and new markets using the technology that others use and we build."
Salaries
2005
Prior to the IPO offering, typical salaries at Google were considered within the industry to be quite low. For instance, some system administrators earned no more than $33,000 — while $37,000 at that time was considered to be low by Bay Area employment market levels. Nevertheless, Google's excellent stock performance following the IPO has enabled these early employees to be competitively compensated by participation in the corporation's remarkable equity growth. In 2005 Google has implemented other employee incentives such as the [http://query.nytimes.com/gst/abstract.html?res=F40C1EFC395F0C728CDDAB0894DD404482 Founder's award], as well making higher salary offers to new employees.
Beyond monetary compensation, Google's workplace amenities, culture, global popularity, stellar prospects (relative to most Bay Area companies), and strong brand recognition continues to attract far more applicants than there are positions available. (It is estimated that less than one job offer is made per thousand resumes submitted.) Google reportedly employs one in-house legal recruiter just to assist the legal department in evaluating the high volume of resumes from attorneys seeking to join the corporation.
Management
Position: name, age, compensation in USD (as of June 2005)
- CEO: Eric E. Schmidt, 50, $1 see [http://money.cnn.com/2005/04/08/technology/google_salary/index.htm]
- CFO: George Reyes, 51, $781K
- President of Technology: Sergey Brin, 31, $1 see [http://money.cnn.com/2005/04/08/technology/google_salary/index.htm]
- President of Products: Larry E. Page, 32, $1 see [http://money.cnn.com/2005/04/08/technology/google_salary/index.htm]
- Sr. VP of Worldwide Sales: Omid Kordestani, 41, $572K
- VP of Corp. Development, Secretary and Gen. Counsel: David C. Drummond, 42, $776K
Founders Brin and Page reportedly earned $1 billion in 2004, but after the IPO in Aug 2004, their compensation is reported in SEC filings annually. Page, Brin, and Schmidt have all declined recent offers of bonuses and increases in compensation by Google's board of directors. Institutional Shareholder Services ranked Google's corporate governance dead last in the list of members of the Standard & Poor's 500. [http://sfgate.com/cgi-bin/article.cgi?file=/chronicle/archive/2004/08/24/BUGBU8D46M1.DTL&type=business]
According to the Forbes 400 list (2005), the combined net worth of [http://www.forbes.com/lists/2005/54/XFXI.html Larry Page] and [http://www.forbes.com/lists/2005/54/D664.html Sergey Brin] is $22 billion US.
But due to the recent surge in stock price (April 2005-June 2005), their net worth is significantly higher. When recorded on the Forbes 400, Google's stock was around $111. In late 2005 Google shares were valued at $400. Page and Brin, however, had sold $2 billion before some of the largest stock gains.
Analysts
Research analysts covering Google Inc. See also [http://finance.yahoo.com/q/sa?s=GOOG GOOG: Star Analysts for GOOGLE - Yahoo! Finance]
- Mark Mahaney (Citigroup Investment Research)
- John Tinker (Thinkequity Partners)
- Michael Gallant (CIBC World Markets)
- Steve Weinstein (Pacific Crest Securities)
- Imran Kahn (J.P. Morgan Chase)
- Heath Terry (Credit Suisse First Boston)
- Marianne Wolk (Susquehanna Financial Group)
- Nafi Bekteshi (SOS Group)
Technology
Google's services are run on several server farms, each consisting of many thousand low-cost commodity computers running customized versions of Linux. While the company does not provide detailed information about its hardware, it was estimated in 2004 that they were using over 60,000 Linux machines. See Google platform for the details.
Corporate culture
Philosophy
Google is known for its relaxed corporate culture, reminiscent of the Dot-com boom. Google's corporate philosophy is based on many casual principles including: "You can make money without doing evil", "You can be serious without a suit" and "Work should be challenging and the challenge should be fun." A complete list of corporate fundamentals is available on Google's Web site [http://www.google.com/corporate/tenthings.html]. The company encourages equality within corporate levels. Twice a week there is a roller hockey game in the company parking lot.
"Twenty percent" time
Every Google engineer is encouraged to spend 20 percent (20%) of their work time on projects that interest them. Some of these end up as Google services, notably Adsense/Adwords (which provide the majority of the company's revenue), as well as Gmail, Google News and Orkut.
Googleplex
Google's headquarters is called the Googleplex. The lobby is decorated with a piano, lava lamps, and a real-time projection of current search queries. The hallways are full of exercise balls and bicycles. Each employee has access to the corporate recreation center. Recreational amenities are scattered throughout the campus, and include a workout room with weights and rowing machines, locker rooms, washers and dryers, a massage room, assorted video games, Foosball, a baby grand piano, a pool table, and ping pong. In addition to the rec room, there are snack rooms stocked with various cereals, gummy bears, toffee, licorice, cashews, yogurt, carrots, fresh fruit, and dozens of different drinks including fresh juice, soda, and make-your-own cappuccino. After eating, people can relieve themselves on digital toilets similar to Japanese toilets.
IPO and culture
Many people have suggested that after Google's IPO the corporate culture will not be able to stay so "fun" and focused on the future.[http://www.wired.com/news/business/0,1367,63241,00.html?tw=wn_story_related] [http://www.ciol.com/content/news/2004/104043001.asp] The company may be required to answer to its new shareholders who may press the company to reduce employee benefits and to focus on short term advances. Also, it may be more challenging for the company to maintain a collegial atmosphere when approximately 1,000 (30%) of the employees are paper-millionaires. In a report given to potential investors, co-founders Sergey Brin and Larry Page promised that the IPO would not change the company's culture. Later Mr. Page said, "We think a lot about how to maintain our culture and the fun elements."
In 2005, articles in The New York Times and other news sources [http://www.smh.com.au/news/technology/search-giant-may-outgrow-its-fans/2005/08/25/1124562975596.html]
began suggesting that Google had lost its anti-corporate, no evil philosophy. The New York Times article was headlined, "Relax, Bill Gates; It's Google's Turn as the Villain" [http://www.nytimes.com/2005/08/24/technology/24valley.html].
Google partnerships
On Sept 28 Google announced a partnership with NASA which would involve Google building an R&D center at NASA's Ames Research Center. As reported by SearchEnginejournal.com, NASA and Google were said to be planning to work together on a variety of areas, including large-scale data management, massively distributed computing, bio-info-nano convergence, and encouragement of the entrepreneurial space industry. The new building would also include labs, offices, and housing for Google engineers.
Google also has a partnership with Sun Microsystems to help share and distribute each other's technologies [http://www.vnunet.com/computing/news/2143242/sun-partners-google]. As part of the partnership Google will hire employees to help the open source office program OpenOffice.org.
Google has an unknown partnership with the Mozilla Foundation. They are looking for software engineers to join them (Google) in collaborative development on the FireFox browser. This is confirmed by a [http://www.google.com/support/jobs/bin/answer.py?answer=29553 job listing] posted on Google. They also offer a download of Firefox with the Google Toolbar pre-installed.
Google's Acquisitions
2001
- Feb 2001: Deja (the Usenet archive, not the company) was acquired, and was incorporated to become part of the re-launched Google Groups [http://groups.google.com/googlegroups/deja_announcement.html].
- Sep 2001: Google acquired Outride Inc. [http://www.google.com/press/pressrel/outride.html].
2003
- Feb 2003: Google acquired Pyra Labs, a weblogging provider and owner of Blogger [http://www.google.com/corporate/timeline.html].
- Apr 2003: Neotonic Software was acquired as part of Google's plan to bring its CRM technology in house [http://www.searchenginejournal.com/index.php?p=621].
- Apr 2003: [http://www.appliedSemantics.com Applied Semantics] was acquired [http://www.appliedsemantics.com] for $102 Million [http://www.businessweek.com/magazine/content/05_49/b3962001.htm].
- Sep 2003: Kaltix was acquired to develop and launch Google Personal [http://www.clickz.com/news/article.php/3085921].
- Oct 2003: Sprinks was acquired to enhance Google's Adwords and AdSense program [http://www.searchnewz.com/searchnewz-12-20031105GoogleAcquiresSprinks.html].
- Oct 2003: Google acquired Genius Labs, another web logging provider [http://www.bizstone.com/archive/2003_10_01_archive.html#106553958799049227].
2004
- Apr 2004: Ignite Logic was acquired [http://battellemedia.com/archives/000653.php].
- Jun 2004: Google made a $10M investment into partial ownership of Baidu [http://english.people.com.cn/200406/16/eng20040616_146493.html].
- Jul 2004: [http://www.picasa.com Picasa] was acquired to provide picture management tools to Blogger [http://www.google.com/press/pressrel/picasa.html].
- Oct 2004: Keyhole was acquired to provide the core mapping capabilities in Google Maps and Google Earth [http://www.google.com/press/pressrel/keyhole.html].
- Sept-Dec 2004, Google revealed in its annual [http://www.sec.gov/Archives/edgar/data/1288776/000119312505065298/d10k.htm 10-K filing] that it had acquired 2 Silicon Valley start-up companies: [http://www.zipdash.com ZipDash] and Where2. The technology provided by ZipDash was used to develop and launch Google Ride Finder. Where2 was a mapping software provider.
2005
- Mar 2005: Web analytics tools provider Urchin Software Corporation was acquired [http://www.google.com/intl/en/press/pressrel/urchin.html].
- May 2005: DodgeBall [http://www.dodgeball.com], a social networking software provider for mobile devices, was acquired [http://www.dodgeball.com/aboutus_dball_google.php].
- Jul 2005: Google, in combination with Goldman Sachs, and the Hearst Corp., invests a total of $100 Million into [http://www.currentgroup.com Current Communications Group] [http://www.lightreading.com/document.asp?doc_id=76942&WT.svl=news1_5].
- Jul 2005: Google announced in its Q2 quarterly conference call that it had acquired [http://www.akwan.com.br/index_en.html Akwan Information Technologies] as a part of its plan to open an R&D office and expand its presence into Latin and South America. [http://blog.searchenginewatch.com/blog/050720-175228]
- Aug 2005: Google acquires [http://www.Android.com Android Inc.], a software provider for mobile devices [http://www.businessweek.com/technology/content/aug2005/tc20050817_0949_tc024.htm]
- September 28: both Google and Ames Research Center disclosed details to a long-term research partnership. In addition to pooling engineering talent, Google plans to build a 1-million square foot facility on the ARC campus.[http://www.nasa.gov/centers/ames/news/releases/2005/05_50AR.html]
Criticism and controversy
Copyright issues
A number of organizations have used the Digital Millennium Copyright Act to demand that Google remove references to allegedly copyrighted material on other sites. Google typically handles this by removing the link as requested and including a link to the complaint in the search results.
There have also been complaints that Google's Web cache feature violates copyright. However, Google provides mechanisms for requesting that caching be disabled (which Google respects; it also honors the robots.txt file which is another mechanism that allows operators of a website to request that part or all of their site not be included in search engine results).
On June 2005, Google Watch revealed the details of a contract between the University of Michigan and Google to create digitized copies of the copyrighted materials stored at the University's library. This contract is part of Google Print's effort to digitize millions of books and make the full text searchable. There are claims that it is a violation of copyright laws to use copyrighted material for profit by placing search ads beside the search results of these digitized books. Also, Google is setting a new precedent by making digital copies of copyrighted material on a wide scale without explicit permission from copyright holders. Meanwhile, Google claims that it is in compliance with all existing and historical applications of copyright laws regarding books. The contract between Google and the U. of Michigan does make it clear that Google will provide only excerpts of copyright text in a search. The contract says that it will comply with "fair use", an exemption in copyright law that allows people to reproduce portions of text of copyrighted material for research purposes.
Dispute with Agence France Presse
In March 2005, Agence France Presse (AFP) sued Google for $17.5 million, alleging that Google News infringed on its copyright because "Google includes AFP's photos, stories and news headlines on Google News without permission from Agence France Presse." [http://news.dcealumni.com/376/20305-googles-news-sued-for-infringing-agence-france-presse-copyrighted-work/] It was also alleged that Google ignored a cease and desist order, though Google counters that it has opt-out procedures which AFP could have followed but did not.
It is possible that AFP will make additional arguments in court that it has not yet made in public, but currently, many pundits are confused by the decision to sue [http://weblog.physorg.com/news1362.html][http://www.bayoubuzz.com/articles.aspx?aid=3538][http://www.earthtimes.org/articles/show/2080.html] because Google does not display the full article on its site, provides a link to one of AFP's 600 online clients such as Singapore's Channel NewsAsia (which presumably benefits AFP because more people view the article and advertising), and because the articles are available via the providers' websites regardless of Google's actions. It was argued that had AFP wanted to prevent free use of its articles, it should have asked its providers to require subscriptions rather than suing Google. Additionally, "in 2002, a federal appeals court ruled that Web sites may reproduce and post 'thumbnail' or downsized versions of copyrighted photographs," so Google News' thumbnails are likely legal. [http://news.dcealumni.com/376/20305-googles-news-sued-for-infringing-agence-france-presse-copyrighted-work/] Still, AFP argues that the headline and first sentence of an article constitutes the "heart" of the work and that reproducing it is copyright infringement.
According to the Canada Free Press, "Google Inc. is now attempting to remove all postings of Agence-France Presse material from its site, although AFP spokesmen say that even if this is done, the lawsuit will continue... It seems that the basis of the lawsuit is just the abstract notion of copyright without any real damages to justify the action." The article concluded "It would be a sad day for those who look to the Internet for news if AFP is successful in limiting what Google can display... AFP's lawsuit, if successful, is bound to have a major impact on how news is delivered on the Internet."
The lawsuit's outcome will likely depend on whether Google can successfully argue that its use of AFP's material constitutes "fair use" under copyright law. Google could even argue that it "adds value" to AFP's news without harming the French news wholesaler.[http://www.michigandaily.com/vnews/display.v/ART/2005/03/29/424942b9271ad]
Lawsuit by Authors Guild
On September 20, 2005, the Authors Guild, a group that represents 8,000 U.S. authors, with a children's book author, and a former Poet Laureate of the United States, filed a class action suit in federal court in Manhattan against Google over its unauthorized scanning and copying of books through its Google Library program. The lawsuit seeks damages and an injunction that will prevent the company from continuing their very ambitious digitization project. Arguments in the case will hinge around the interpretation of the four factors of Fair Use.
Many commentators in the world of copyright law and technology were not surprised by this development as The Authors Guild has also been involved in attempting to make online publishers pay royalties to writers whose stories appear in any number of online databases without their express consent. In a concession to general concerns about the nature of their project, Google had announced plans back in August that they would respect the wishes of copyright holders who contacted the company to inform them that they did not want their works included in this digitization project.
- [http://scout.wisc.edu/Reports/ScoutReport/2005/scout-050923-inthenews.php#1 Scout Report] "Authors’ group files lawsuit against Google" Sept, 2005
- [http://www.policybandwidth.com/doc/googleprint.pdf The Google Print Library Project: A Copyright Analysis - .pdf]
- [http://www.washingtonpost.com/wp-dyn/content/article/2005/09/20/AR2005092001416.html Washington Post] Sept. 20, 2005 "Google library push faces lawsuit by US authors"
Multinational corporation
Google is a multinational corporation, having offices in over a dozen countries [http://www.google.ie/intl/en/corporate/address.html]. In order to comply with the varying laws of these countries, several versions of Google restrict very specific keyword searches. According to American law, any copyright owner can require material to be removed via the Digital Millennium Copyright Act, whereas under French and German law, for example, hate speech and Holocaust denial are illegal. Google complies with these laws by banning keyword searches related to these terms. Google's Terms of Service allow it to comply with the laws of any one country, providing information that was originated (or that Google stores) in another country. Any data stored on Google is therefore subject to being turned over to any country, including China.
China's Censoring
The People's Republic of China, whose human rights record has been widely criticized by the international community, has in the past restricted citizen access to popular search engines such as Altavista, Yahoo!, and Google. The mirror search site elgooG has been used by Chinese citizens to get around blocked content. This complete ban is currently lifted. However, the government remains active in filtering Internet content.
In the summer of 2005 Google's name became associated with commercial contracts between the Government of China, Microsoft and Cisco Systems which block access to websites using words like "democracy." Google has been involved with the removal of specific sites that are blocked in China from their Chinese news portal. The French news agency, AFP, reported that Microsoft, Yahoo! and Google have all agreed to cooperate in censoring the Internet from their China based sites by filtering out content objectionable to the Chinese government. The list of forbidden words includes "democracy," "freedom," "human rights," and "Taiwan independence."
In October 2005, Blogger and access to the Google Cache were made available in China; however, in December 2005, some Chinese users of Blogger reported that their access to the site was once again restricted.
Legal issues
Google's efforts to refine its database has led to some legal controversy, notably a lawsuit in October 2002 from the company SearchKing which sought to sell advertisements on pages with inflated Google rankings. In its defense, Google stated that its rankings are its constitutionally protected opinions of the web sites that it indexes. A judge subsequently threw out SearchKing's lawsuit in mid-2003 on precisely these grounds.
In late 2003 and early 2004, there were rumors that Google would be sued by the SCO Group over their use of the Linux operating system, in conjunction with SCO's lawsuit against IBM over the claimed ownership of intellectual property rights relating to Linux.
In May 2004, the Baltimore Sun interviewed Peri Fleisher, a great-niece of Edward Kasner, the mathematician whose nephew coined the word googol, who said Kasner's descendants were "exploring" legal action against Google due to its name.
Google recently settled a patent infringement lawsuit with Yahoo! by issuing 2.7 million shares. Yahoo! had earlier alleged that Google's AdSense program violated a patent held by Yahoo!'s Overture unit. The settlement cost Google around $275 million which resulted in the company posting a net loss in the third quarter of 2004.
Personnel issues
Former Google sales executive Christina Elwell, promoted to national sales director at Google in late 2003, accused her supervisor of discrimination against her after informing him of her pregnancy [http://news.com.com/Google+hit+with+job+discrimination+lawsuit/2100-1030_3-5807158.html?tag=nl]. After the loss of 3 of her quadruplets, which she claimed was due to the stressful circumstances created by Google, Elwell sued the company. She also refused an offer from Shona Brown, Google Vice President of Business Operations, to reinstate her to a "low-level operations position".
Partiality
In February 2003, Google banned the ads of Oceana, a two-and-a-half-year-old non-profit organization, which was protesting the environmental effects of a major cruise ship operation's sewage treatment practices. Google claimed that their editorial policy states, "that Google does not accept advertising if the ad or site advocates against other individuals, groups, or organizations."
Offensive search results
In April 2004, Google received complaints that a search for "Jew" on its site listed the anti-Semitic website Jew Watch at or near the top of the list. Google responded that this was due to the content-neutrality of the PageRank algorithm, and the fact that racists used the specific word "Jew" (as opposed to "Jewish" or "Judaism") more often than others. [http://www.google.com/explanation.html]
As a reaction, some webloggers launched a Google bomb to put the corresponding Wikipedia article at the top of the search results. As of December 2005, Jew Watch remains the #1 link.
There is also an option for google account users, who are logged in, to remove offensive search results.
Privacy
Main article: Google and privacy issues
Some have pointed out the dangers and privacy implications of having a centrally located, widely popular data warehouse of millions of Internet users' searches, and how under controversial existing U.S. law, Google can be forced to hand over all such information to the U.S. government, or any other government of a country which Google serves.
It has been claimed that Google infringes the privacy of visitors by uniquely identifying them using cookies which are used to track Web users' search history. The cookies possess notably distant expiration dates and it is claimed users' searches are recorded without permission for advertising purposes. In response Google claims cookies are necessary to maintain user preferences between sessions and offer other search features. The use of cookies with such distant expiration dates is not very uncommon.
Some users believe the processing of email message content by Google's Gmail service goes beyond proper use. The point is often made that people without Gmail accounts, who have not agreed to the Gmail terms of service, but send email to Gmail users have their correspondence analyzed without permission. Google claims that mail sent to or from Gmail is never read by a human being beyond the account holder, and is only used to improve relevance of advertisements. Other popular email services such as Hotmail also scan incoming email to try to determine whether it is unsolicited spam email (which Gmail also does). Chris Hoofnagle, associate director of the Electronic Privacy Information Center in Washington, DC warned that "As courts become more frequent integrators of electronic records, there is a greater risk of Google ... becoming a serious privacy threat."
The PageRank system
Google's central PageRank system has been criticized. Some, such as Daniel Brandt, calling it "undemocratic". Common arguments are that the system is unfairly biased towards large web sites, and that the criteria for a page's importance are not subject to peer review. It must be stated in Google's defense that PageRank is a fully automated system which is impartial insofar as it knows no personal bias. However, it must also be stated that Google's system relies on human oversight, and use of company names on Adwords, or deletion of critical sites from Google results (for example, sites critical of Scientology), is decided by individual human beings according to company policy. It remains unclear whether any process could assert the importance of a page in a way that would draw less criticism than the current PageRank system.
The system is also susceptible to manipulation and fraud through the use of dummy sites, an issue which does, however, plague all search engines. See Google bomb and Spamdexing.
Specific searches
Spamdexing
See also List of Google services and tools
For users searching for more specific results, at the top of Google pages are additional tabs to more narrowly define a user's search results.
- Images: Allows the user to limit a search to images on the Internet; the images are identified by Google by the image name saved on the webpage and context information about the page.
- Groups: Allows the user to create, search and browse groups for discussion.
- News: Brings the user directly to the Google News search page, formatted similar to news websites such as MSNBC or BBC News. The search page provides the option for twenty countries. Google.com.au allows selection criteria for Australia.
- Froogle: Allows the user to shop online searching websites within a user specified budget.
- Local: Searches for places (such as shops or other landmarks) in a geographical area, and displays maps and driving directions. Maps include road maps, medium-resolution satellite images, and "hybrid" maps combining both. See also Google Maps. Currently it provides full service only in the U.S., Canada, and the U.K.
- Earth: Allows the user to download a program to have a 3D version of satellite pictures.
- Desktop: Allows the user to search their computer for files, folders, and emails. See Google Desktop.
- Talk: Allows users with Gmail accounts to communicate with each other through instant messaging and have online conversations.
- Videos: Allows the user to limit a search to videos on the Internet; Use Google to find reviews and showtimes for movies playing somewhere near you.
- Blogs: Blog Search allows the user to only search blogs based on RSS feeds. Results can be sorted by relevance or by date. Although it allows you to search specific blogs, this feature is currently malfunctioning.
- Scholar: Allows users to search some peer-reviewed, scholarly journals. Non-peer reviewed material is also included in the index.
Clicking on the "More" tab at the top directs the user to even more Google Services such as Blogger, University Searches, Google products in their Labs section, Help and Alerts.
April Fool's Day jokes
Main article: Google's hoaxes
Google has a tradition of creating April Fool's Day jokes such as [http://www.google.com/mentalplex/ Google MentalPlex] which featured the use of mental power to search the Web. In 2002 they claimed that pigeons were the [http://www.google.com/technology/pigeonrank.html secret] behind their growing search engine. In 2004 it featured [http://www.google.com/jobs/lunar_job.html Google Lunar] which featured jobs on the moon and in 2005 a fictitious, brain-boosting drink termed [http://www.google.com/googlegulp/ Google Gulp] was announced. You can find other pranks hidden between google's pages. In the languages list you can find the [http://www.google.com/intl/xx-bork/ Bork! Bork! Bork!] version. Bork! is the mock Swedish of the Muppets Show's Swedish Chef.
Some people thought the announcement of Gmail in 2004 around April Fools Day was actually a joke.
See also
- Proceratium google, an ant species named in honor of Google Earth
- List of Google services and tools
- List of search engines
- TrustRank
- Google (search engine)
- GAMEY
- Google employees category
- Computer History Museum, where the original Google web server is on display
- Google Space
- Googlebot
References
- Mahadevan, Jeremy (Nov. 16, 2005). "Googlicious". New Straits Times, p. L12–L13.
- "What's the catch?" (Nov. 16, 2005). New Straits Times, p. L13.
Further reading
-
-
External links
Company websites
- [http://www.google.com Google]
- [http://base.google.com/ Google Base]
- [http://www.google.com/help/features.html Google Help: Search Features]
- [http://www.google.com/sms Google SMS Search]
- [http://www.google.com/ig Google Personalized Start Page]
- [http://video.google.com/ Google Video Search] - Also see: [http://video.google.com/videoplay?docid=3383042311441257769&q=google+factory+tour Google Factory Tour]
- [http://www.google.com/downloads/ Google Software Downloads]
- [http://www.google.com/corporate/ Google Corporate Information]
- [http://www.google.com/corporate/history.html Google's History]
- [http://gmail.google.com Gmail] — Google's e-mail service
- [http://www.ipogoogle.org/ Google's Initial Public Offering]
- [http://www.google.org Google.org] — The philanthropic arm of Google
- [http://googleblog.blogspot.com Google Blog] — Off
Website Website.]]
A website, web site or WWW site (often shortened to just site) is a collection of web pages, typically common to a particular domain name or sub-domain on the World Wide Web on the Internet.
A web page is an HTML/XHTML document accessible generally via HTTP.
All publicly accessible websites in existence comprise the World Wide Web. The pages of a website will be accessed from a common root URL called the homepage, and usually reside on the same physical server. The URLs of the pages organise them into a hierarchy, although the hyperlinks between them control how the reader perceives the overall structure and how the traffic flows between the different parts of the sites.
Some websites require a subscription to access some or all of their content. Examples of subscription sites include many Internet pornography sites, parts of many news sites, gaming sites, message boards, Web-based e-mail services and sites providing real-time stock market data.
Overview
A website will may be the work of an individual, a business or other organization and is typically dedicated to some particular topic or purpose. Any website can contain a hyperlink to any other website, so the distinction between individual sites, as perceived by the user, may sometimes be blurred.
Websites are written in, or dynamically converted to, HTML (Hyper Text Markup Language) and are accessed using a software program called a web browser, also known as a HTTP client. Web pages can be viewed or otherwise accessed from a range of computer based and Internet enabled devices of various sizes, examples of which include desktop computers, laptop computers, PDAs and cell phones.
A website is hosted on a computer system known as a web server, also called an HTTP Server, and these terms can also refer to the software that runs on these system and that retrieves and delivers the web pages in response to requests from the web site users. Apache is the most commonly used web server software (according to Netcraft statistics) and Microsoft's Internet Information Server (IIS) is also commonly used.
A static website, is one that has content that is not expected to change frequently and is manually maintained by some person or persons using some type of editor software. There are two broad categories of editor software used for this purpose which are
- Text editors such as Notepad, where the HTML is manipulated directly within the editor program
- WYSIWYG editors such as Microsoft FrontPage and Macromedia Dreamweaver, where the site is edited using a GUI interface and the underlying HTML is generated automatically by the editor software.
A dynamic website is one that may have frequently changing information. When the web server receives a request for a given page, the page is automatically generated by the software in direct response to the page request; thus opening up many possibilities including for example: a site can display the current state of a dialogue between users, monitor a changing situation, or provide information in some way personalised to the requirements of the individual user.
There are a large range of software systems, such as Active Server Pages (ASP), Java Server Pages (JSP) and the PHP programming language that are available to generate dynamic web systems and dynamic sites also often include content that is retrieved from one or more databases or by using XML-based technologies such as RSS.
Static content may also be dynamically generated periodically or if certain conditions for regeneration occur (cached) to avoid the performance loss of initiating the dynamic engine on a per-user or per-connection basis.
Plugins are available for browsers, which use them to show active content, such as Flash, Shockwave or applets written in Java. Dynamic HTML also provides for user interactivity and realtime element updating within Web pages (i.e., pages don't have to be loaded or reloaded to effect any changes), mainly using the DOM and JavaScript, support for which is built-in to most modern browsers.
Types of websites
There are many varieties of websites, each specialising in a particular type of content or use, and they may be arbitrarily classified in any number of ways. A few such classifications might include:
- Archive site: used to preserve valuable electronic content threatened with extinction. Two examples are: Internet Archive which since 1996 preserves billions of old (and new) Web pages, and Google Groups which in early 2005 was archiving over 845,000,000 messages posted to Usenet news/discussion groups.
- Blog (or weblog) site: site used to log online readings or to post online diaries; may include discussion forums.
- Business site: used for promoting a business or service.
- Commerce site or eCommerce site: for purchasing goods, such as Amazon.com.
- Community site: a site where persons with similar interests communicate with each other, usually by chat or message boards.
- Database site: a site whose main use is the search and display of a specific database's content such as the Internet Movie Database or the Political graveyard.
- Development site: a site whose purpose is to provide information and resources related to software development, Web design and the like.
- Directory site: a site that contains varied contents which are divided into categories and subcategories, such as Yahoo! directory, Google directory and Open Directory Project.
- Download site: strictly used for downloading electronic content, such as software, game demos or computer wallpaper.
- Game site: a site that is itself a game or "playground" where many people come to play, such as MSN Games, Pogo.com and the MMORPGs Planetarion and Kings of Chaos.
- Information site: contains content that is intended merely to inform visitors, but not necessarily for commercial purposes; such as: RateMyProfessors.com, Free Internet Lexicon and Encyclopedia.
- News site: similar to an information site, but dedicated to dispensing news and commentary.
- Pornography site: a site that shows pornographic images and videos.
- Search engine site: a site that provides general information and is intended as a gateway or lookup for other sites. A pure example is Google, and the most widely known extended type is Yahoo!.
- Shock site: includes images or other material that is intended to be offensive to most viewers.
- Vanity site (or "personal site"): run by an individual or a small group (such as a family) that contains information or any content that the individual wishes to include.
- Web portal site: a website that provides a starting point, a gateway, or portal, to other resources on the Internet or an intranet.
- Wiki site: a site which users collaboratively edit (such as Wikipedia).
Some sites may be included in one or more of these categories. For example, a business website may promote the business's products, but may also host informative documents, such as white papers. There are also numerous sub-categories to the ones listed above. For example, a porn site is a specific type of eCommerce site or business site (that is, it is trying to sell memberships for access to its site). A fan site may be a vanity site on which the administrator is paying homage to a celebrity.
Many business Websites have the appearance of brochures—that is, an advertisement that can be strolled around. Some websites act as vehicles for users to communicate with other people via webchat.
Websites are constrained by architectural limits (e.g. the computing power dedicated to the Website). Very large websites, such as Yahoo!, Microsoft, Google and most other very large sites employ several servers and load balancing equipment, such as Cisco Content Services Switches
Mousetrapping
Mousetrapping is a technique employed by some "aggressive" commercial websites, especially ones that are pornographic in nature, which prevents the user from leaving the site, depending on Web browser settings. Typically, this form of trapping is employed by the use of Javascript code (or Dynamic HTML) that detects a user's attempt to either close the browser window or leave the Website to view another site. These attempts may easily fail if the user disabled javascript on their Web browser; however, disabling Javascript may also impact how well certain pages on the current site or other Websites load. Tools such as pop-up blockers can help in preventing this annoyance but by no means will solve the problem entirely. [http://www.webopedia.com/TERM/M/mousetrapping.html]
Prizes
The Webby Awards are a set of awards presented to the world's "best" Websites.
Spelling
As noted above, there are several different spellings for this term. Although "website" is commonly used (particularly by some newspapers and other media), Reuters, Microsoft, academia, and dictionaries such as Oxford, prefer to use the two-word, capitalised spelling "Web site". An alternate version of the two-word spelling is not capitalised. As with many newly created terms, it may take some time before a common spelling is finalised. (This controversy also applies to derivative terms such as "Web master"/"webmaster".)
The Associated Press Stylebook, a guide to newspaper style, suggests "Web site" and "Web page". "WWW site" is rarely used.
See also
- Webmaster
- Cyberspace
- Web application
- Web content management
- Web service
- Web template
- World Wide Web Consortium (Web standards)
- Microsoft FrontPage
- Macromedia Dreamweaver
- Web hosting
External links
- [http://www.w3.org/ World Wide Web Consortium]
- [http://www.isoc.org/ The Internet Society (ISOC)]
- [http://www.icann.org/ Internet Corporation For Assigned Names and Numbers]
- [http://www.useit.com Useit.com Internet Usability]
- [http://www.cgisecurity.com/questions/securewebsite.shtml How do I secure my website?] CGISecurity.com - Website Security Portal
-
ko:웹사이트
ja:ウェブサイト
simple:Website
Domain name
The term domain name has multiple meanings, all related to the Domain Name System (main article).
- a name that is entered into a computer (e.g. as part of a website or other URL, or an email address) and then looked up in the global [Domain Name System] which informs the computer of the IP address(es) with that name.
- the product that registrars provide to their customers.
- a name looked up in the DNS for other purposes.
They are sometimes colloquially (and incorrectly) referred to by marketers as "web addresses".
Domain names are Hostnames that provide rememberable names to stand in for numeric IP addresses. They allow for any service to move to a different location in the topology of the Internet (or another internet), which would then have a different IP address.
Each string of letters, digits and hyphens between the dots is called a label in the parlance of the domain name system (DNS). Valid labels are subject to certain rules, which have relaxed over the course of time. Originally labels must start with a letter, and end with a letter or digit; any intervening characters may be letters, digits, or hyphens. Labels must be between 1 and 63 characters long (inclusive). Letters are ASCII A–Z and a–z; domain names are compared case-insensitively. Later it became permissible for labels to commence with a digit (but not for domain names to be entirely numeric), and for labels to contain internal underscores, but support for such domain names is uneven. These are the rules imposed by the way names are looked up ("resolved") by DNS. Some top level domains (see below) impose more rules, such as a longer minimum length, on some labels. Fully qualified names (FQDNs) are sometimes written with a final dot.
Translating numeric addresses to alphabetical ones, domain names allow Internet users to localize and visit websites. Additionally since more than one IP address can be assigned to a domain name, and more than one domain name assigned to an IP address, one server can have multiple roles, and one role can be spread among multiple servers. One IP address can even be assigned to several servers, such as with anycast and hijacked IP space.
Examples
The following examples illustrates the difference between a URL (Uniform Resource Locator) and a domain name:
: URL: http://www.example.com/
: Domain name: www.example.com
As a general rule, the IP address and the server name are interchangeable. For most internet services, the server will not have any way to know which was used. However, the explosion of interest in the web means that there are far more websites than servers. To accommodate this, the hypertext transfer protocol (HTTP) specifies that the client tells the server which name is being used. This way, one server with one IP address can provide different sites for different domain names. This feature is goes under the name virtual hosting and is commonly used by web hosts.
For example, the server at 192.0.34.166 handles all of the following sites:
: www.example.com
: www.example.net
: www.example.org
Top-level domains
Every domain name ends in a top-level domain (TLD) name, which is always either one of a small list of generic names (three or more characters), or a two characters territory code based on ISO-3166 (there are few exceptions and new codes are integrated case by case).
Examples of (gTLD) extensions are:
- .com
- .net
- .org
- .biz
- .info
- .name
- .museum
- .travel
- .pro
- .aero
- .xxx (disapproved by ICANN)
Examples of country code top-level domain (ccTLD) extensions are:
- .au
- .eu (not an ISO-3166 code, and not a country, but used anyway for the European Union. Scheduled to be launched December 7, 2005)
- .us
- .uk (not an ISO-3166 code, but used anyway)
- .br
- .fr
- .es
- .de
- .in
- .it
- .jp
- .ca
- .nz
- .su (not an existing country at the moment - Soviet Union, but used anyway)
Official assignment
ICANN (Internet Corporation for Assigned Names and Numbers) has overall responsibility for managing the DNS. It controls the root domain, delegating control over each top-level domain to a domain name registry. For ccTLDs, the domain registry is typically controlled by the government of that country. ICANN has a consultation role in these domain registries but is in no position to regulate the terms and conditions of how a domain name is allocated or who allocates it in each of these country level domain registries. On the other hand, generic top-level domains (gTLDs) are governed directly under ICANN which means all terms and conditions are defined by ICANN with the cooperation of the gTLD registries.
Domain names which are theoretically leased can be considered in the same way as real estate, due to a significant impact on online brand building, advertising, search engine optimization, etc.
Uses and abuses
As domain names became attractive to marketers, rather than just the technical audience for which they were originally intended, they began to be used in manners that in many cases did not fit in their intended structure. As originally planned, the structure of domain names followed a strict hierarchy in which the top level domain indicated the type of organization (commercial, governmental, etc.), and addresses would be nested down to third, fourth, or further levels to express complex structures, where, for instance, branches, departments, and subsidiaries of a parent organization would have addresses which were subdomains of the parent domain. Also, hostnames were intended to correspond to actual physical machines on the network, generally with only one name per machine. However, once the World Wide Web became popular, site operators frequently wished to have memorable addresses, regardless of whether they fit properly in the structure; thus, since the .com domain was the most popular and memorable, even noncommercial sites would often get addresses under it, and sites of all sorts wished to have second-level domain registrations even if they were parts of a larger entity where a logical subdomain would have made sense (e.g., abcnews.com instead of news.abc.com). A website found at http://www.example.org will often be advertised without the "http://", and in most cases can be reached by just typing "example.org" into a web browser. In the case of a .com, the website can sometimes be reached by just typing "example" (depending on browser versions and configuration settings, which vary in how they interpret incomplete addresses). With "virtual hosting", often many domain names would point to the same physical server.
The popularity of domain names also led to uses which were regarded as abusive by established companies with trademark rights; this was known as cybersquatting, in which somebody took a name that resembled a trademark in order to profit from traffic to that address. To combat this, various laws and policies were enacted to allow abusive registrations to be forcibly transferred, but these were sometimes themselves abused by overzealous companies committing reverse domain hijacking against domain users who had legitimate grounds to hold their names, such as their being generic words as well as trademarks in a particular context, or their use in the context of fan or protest sites with free speech rights of their own.
Generic domain names — problems arising out of unregulated name selection
Within a particular top-level domain, parties are generally free to select an unallocated domain name as their own on a first come, first served basis. For generic or commonly used names, this may sometimes lead to the use of a domain name which is inaccurate or misleading. This problem can be seen with regard to the ownership or control of domain names for a generic product or service.
By way of illustration, there has been tremendous growth in the number and size of literary festivals around the world in recent years. In this context, currently a generic domain name such as literary.org is available to the first literary festival organisation which is able to obtain registration, even if the festival in question is very young or obscure. Some critics would argue that there is greater amenity in reserving such domain names for the use of, for example, a regional or umbrella grouping of festivals. Related issues may also arise in relation to non-commercial domain names.
Unconventional domain names
Due to the rarity of one-word dot-com domain names, many unconventional domain names, domain hacks, have been gaining popularity. They make use of the top-level domain as an integral part of the website's title. Two of the most visited domain hack websites are del.icio.us and blo.gs, which spell out 'delicious' and 'blogs', respectively.
Some unconventional domain names are also used to create email hacks. Non-working examples that spell 'James' are j@m.es and j@mes.com, which use the domain names m.es (of Spain's .es) and mes.com.
Commercial resale of domain names
An economic effect of the widespread usage of domain names has been the resale market for generic domain names that has sprung up in the last decade. Certain domains, especially those related to business, gambling, pornography, and other commercially lucrative fields have become very much in demand to corporations and entrepreneurs due to their intrinsic value in attracting clients. In fact, the most expensive internet domain name to date, according to Guinness World Records, is business.com which was resold in 1999 for $7.5 million. Another high value domain name, sex.com, was stolen from its rightful owner by means of a forged transfer instruction via fax. During the height of the dot-com era, the domain was earning millions of dollars per month in advertising revenue from the large influx of visitors that arrived daily. Two long-running US lawsuits resulted, one against the thief and one against the domain registrar VeriSign[http://www.wired.com/news/business/0,1367,63142,00.html]. In one of the cases, the judge found in favor of the plaintiff, leading to an unprecendented ruling that classified domain names as property, granting them the same legal protections. In 1999, Microsoft traded the valuable name Bob.com for the name Windows2000.com which was the name of their new operating system.[http://www.theregister.com/1999/11/11/windows2000_com_owner_sells_domain/]
One of the reasons for the value of domain names is that even without advertising or marketing, they attract clients seeking services and products who simply type in the generic name. Furthermore, generic domain names such as Rent.com or Books.com are extremely easy for potential customers to remember, increasing the probability that they become repeat customers or regular clients.
Although the current domain market is nowhere as strong as it was during the dot-com heyday, it remains strong and is currently experiencing solid growth again. Annually tens of millions of dollars change hands due to the resale of domains. Large numbers of registered domain names lapse and are deleted each year. On average 25,000 domain names drop (are deleted) every day.
Caveat Emptor
Care should always be exercised when registering a domain name: DNS is case-insensitive and the modern trend of words run together with intercapping can be misinterpreted when converted to lowercase. Who Represents, a database of artists and agents, chose
http://www.whorepresents.com; Experts Exchange, the programmers' site, famously had http://www.expertsexchange.com; Pen Island unwisely chose http://www.penisland.net; a therapists' network thought http://www.therapistfinder.com looked good and of course the Italian power company PowerGen Italia became http://www.powergenitalia.com.
Fortunately the dash is allowable in DNS, a fact possibly unknown to those organisations listed above.
DNS is case-insensitive, so CAMFT's website can be advertised as http://www.TherapistFinder.com (instead of http://www.therapistfinder.com).
See also
- Uniform Resource Locator
- webpage
- website
- World Wide Web
- cname
- domain hack
- Free domain names
External links
- [http://www.dnjournal.com/ Domain Name Journal] - Covering the Domain Name Industry with Profiles and News.
- [http://www.domainnamewire.com/ Domain Name Wire] - Latest news about Domain Name Industry, domain sales, and legal issues.
- [http://www.gobin.info/domainname/ Domain Name Universe] - List of all existing Domain Name Registries, global Domain Name Search, Latest news.
- [http://www.faqs.org/rfcs/std/std13.html STD 13/RFC 1034], Domain Names—Concepts and Facilities, an Internet Protocol Standard.
- [http://www.icann.org/ ICANN] - Internet Corporation for Assigned Names and Numbers.
- [http://www.icann.org/udrp/udrp.htm UDRP], Uniform Domain-Name Dispute-Resolution Policy.
- [http://www.internic.net/ Internic.net], public information regarding Internet domain name registration services.
- [http://lifeofawebsite.com/begin/country-specific-domains.php List of Country Specific Domains]
- [http://www.circleid.com/ CircleID], Community discussions on TLDs and Internet infrastructure.
- [http://xona.com/domainhacks/ Domain Hacks] - unconventional domain name search utility
- The authoritative definition is that given in
- RFC 1032 - Domain administrators guide
- RFC 1033 - Domain administrators operations guide
- RFC 1034 - Domain names - concepts and facilities
- RFC 1035 - Domain names - implementation and specification
Category:Domain Name System
Category:InternetCategory:Information technology
Category:Identifiers
als:Domäne (Internet)
ja:ドメイン名
Wikis
A wiki (IPA: or (according to Ward Cunningham) is a type of website that allows users to add and edit content and is especially suited for collaborative authoring.
The term wiki also sometimes refers to the collaborative software itself (wiki engine) that facilitates the operation of such a website (see wiki software).
In essence, a wiki is a simplification of the process of creating HTML pages combined with a system that records each individual change that occurs over time, so that at any time, a page can be reverted to any of its previous states. A wiki system may also provide various tools that allow the user community to easily monitor the constantly changing state of the wiki and discuss the issues that emerge in trying to achieve a consensus about the wiki content.
Some wikis, notably Wikipedia, allow almost completely unrestricted access so that people are able to contribute to the site without necessarily having to undergo a process of 'registration' as had usually been required by various other types of interactive web sites such as Internet forums or chat sites.
The WikiWikiWeb is named after the "Wiki Wiki" line of Chance RT-52 buses in Honolulu International Airport. The name is based on the Hawaiian term wiki, meaning "quick", "fast", or "to hasten" [http://wehewehe.org/cgi-bin/hdict?e=q-0hdict--00-0-0--010---4----den--0-000lpm--1en-Zz-1---Zz-1-home-wiki--00031-0000escapewin-00&a=q&d=D21021 (Hawaiian dictionary)]. Sometimes wikiwiki (or Wikiwiki) is used instead of wiki [http://wehewehe.org/cgi-bin/hdict?a=q&r=1&hs=1&e=q-0hdict--00-0-0--010---4----den--0-000lpm--1en-Zz-1---Zz-1-home---00031-0000escapewin-00&q=wikiwiki&j=pm&hdid=0&hdds=0 (Hawaiian dictionary)].
Wiki is sometimes interpreted as the backronym for "What I know is", which describes the knowledge contribution, storage and exchange function.
Key characteristics
A wiki enables documents to be written collectively (co-authoring) in a simple markup using a web browser. A single page in a wiki is referred to as a "wiki page", while the entire body of pages, which are usually highly interconnected via hyperlinks, is "the wiki"; in effect, a very simple, easier-to-use database.
A defining characteristic of wiki technology is the ease with which pages can be created and updated. Generally, there is no review before modifications are accepted. Most wikis are open to the general public without the need to register any user account. Sometimes session log-in is requested to acquire a "wiki-signature" cookie for autosigning edits. More private wiki servers require user authentication. However, many edits can be made in real-time, and appear almost instantaneously online.
Pages and editing
In a traditional wiki, there are three representations for each page:
- The user-editable "source code", which is also the format stored locally on the server. It usually is plain text, made visible to the user only when the edit operation shows it in a browser form.
- A template (possibly internally generated) that defines layout and elements common to all pages.
- The rendered HTML code produced by the server on the fly from the source text when a particular page is requested.
The source format, sometimes known as "wikitext", is augmented with a simplified markup language to indicate various structural and visual conventions. A common example of one such convention is to start a line of text with an asterisk (" - ") so as to mark it as an item in a bulleted list. Style and syntax can vary a great deal among implementations, some of which also allow HTML tags.
The reasoning behind this design is that HTML, with its many cryptic tags, is not especially human-readable. Making typical HTML source visible makes the actual text content very hard to read and edit for most users. It is therefore better to promote plain-text editing with a few simple conventions for structure and style.
It is also sometimes viewed as beneficial that users cannot directly use all the functionality that HTML allows, such as JavaScript and Cascading Style Sheets. Consistency in look and feel is also achieved, along with some extra safety for the user. In many wiki implementations, an active hyperlink is exactly as it is shown, unlike in HTML where the invisible hyperlink can have an arbitrary visible anchor text.
(Quotation above from Foundation by Isaac Asimov)
Some recent wiki engines use a different method: they allow "WYSIWYG" editing, usually by means of JavaScript or an ActiveX control that translates graphically entered formatting instructions such as "bold" and "italics" into the corresponding HTML tags. In those implementations, saving an edit amounts to submitting a new HTML version of the page to the server, although the user is shielded from this technical detail as the markup is generated transparently. Users who do not have the necessary plugin can generally edit the page, usually by directly editing the raw HTML code.
Standard
While for years the de facto standard was the syntax of the original WikiWikiWeb, currently the formatting instructions vary considerably depending on the wiki engine. Simple wikis allow only basic text formatting, whereas more complex ones have support for tables, images, formulas, or even interactive elements such as polls and games. Many people switch between wiki engines. Because of the difficulty in using several syntaxes, many people are putting considerable effort into defining a wiki markup standard (see efforts by Meatball and [http://tikiwiki.org/tiki-index.php?page=RFCWiki TikiWiki]).
Linking and creating pages
Wikis are a true hypertext medium, with non-linear navigational structures. Each page typically contains a large number of links to other pages. Hierarchical navigation pages often exist in larger wikis, often a consequence of the original page creation process, but they do not have to be used. Links are created using a specific syntax, the so-called "link pattern".
Originally, most wikis used CamelCase as a link pattern, produced by capitalizing words in a phrase and removing the spaces between them (the word "CamelCase" is itself an example of CamelCase). While CamelCase makes linking very easy, it also leads to links which are written in a form that deviates from the standard spelling. CamelCase-based wikis are instantly recognizable from the large number of links with names such as "TableOfContents" and "BeginnerQuestions". Note: It is easy for a wiki to render the visible anchor for such links "pretty" by reinserting spaces, and possibly also reverting to lower case.
CamelCase has many critics, and wiki developers looked for alternative solutions. The first to introduce so called "free links" using this _(free link format) was Cliki. Various wiki engines use single brackets, curly brackets, underscores, slashes or other characters as a link pattern.
Links across different wiki communities are possible using a special link pattern called InterWiki.
New pages in a wiki are usually created simply by creating the appropriate links on a topically related page. If the link does not exist, it is typically emphasized as a "broken link". Following that link opens an edit window, which then allows the user to enter the text for the new page. This mechanism ensures that so-called "orphan" pages (which have no links pointing to them) are rarely created, and a generally high level of connectedness is retained..
Searching
Most wikis offer at least a title search, and sometimes a full text search. The scalability of the search depends on whether the wiki engine uses a database or not; indexed database access is necessary for high speed searches on large wikis. On Wikipedia, the so-called "Go button" allows readers to directly view a page that matches the entered search criteria as closely as possible. The MetaWiki search engine was created to enable searches across multiple wikis.
Server-side versus client-side wiki
By far the most common wiki systems are server-side (Wikipedia is a server-side wiki). In essence, the edit, display and control functions are provided on the server through the wikiengine that renders the content into a HTML-based page for display in a web browser.
A client-side wiki system only requires the server to "serve" wiki files in much the same way as a web server allows HTML files to be retrieved using HTTP. In this type of wiki system, all the execution required to convert the underlying wiki text into an onscreen formatted display page resides in the client browser. Likewise, the editing tools and functionality reside with the browser.
The client-side wiki system parallels HTML in that the page becomes a rendering instruction for the browser to interpret.
Client-side wiki systems may be little more than a code plugin to traditional web browsers.
Controlling changes
Wikipedia
Wikis generally are designed with the philosophy of making it easy to correct mistakes, rather than making it difficult to make them. Thus while wikis are very open, they provide a means to verify the validity of recent additions to the body of pages. The most prominent, on almost every wiki, is the "Recent Changes" page—a specific list numbering recent edits, or a list of all the edits made within a given timeframe. Some wikis can filter the list to remove minor edits and edits made by automatic importing scripts ("bots").
From the change log, other functions are accessible in most wikis: the Revision History showing previous page versions; and the diff feature, highlighting the changes between two revisions. Using the Revision History, an editor can view and restore a previous version of the article. The diff feature can be used to decide whether or not this is necessary. A regular wiki user can view the diff of an edit listed on the "Recent Changes" page and, if it is an unacceptable edit, consult the history, restoring a previous revision; this process is more or less streamlined, depending on the wiki software used.
In case unacceptable edits are missed on the "Recent Changes" page, some wiki engines provide additional content control. It can be monitored to ensure that a page, or a set of pages, keeps its quality. A person willing to maintain pages will be warned of modifications to the pages, allowing him or her to quickly verify the validity of new editions!
Vandalism
The open philosophy of most wikis—of allowing anyone to edit content—does not ensure that editors are well intentioned. Wiki vandalism is a constant problem for wikis, though perhaps overrated. Studies from IBM have shown that most vandalism to Wikipedia is reverted in 5 minutes or less.
History
Wiki software originated in the design pattern community as a way of writing and discussing pattern languages. The WikiWikiWeb was the first wiki, established by Ward Cunningham on March 25, 1995, as a complement to the Portland Pattern Repository. [http://c2.com/cgi/wiki?WikiHistory] He invented the wiki name and concept, and implemented the first wiki engine. Some people maintain that only the original wiki should be called Wiki (upper case) or the WikiWikiWeb.
Cunningham coined the term wiki after the "wiki wiki" or "quick" shuttle buses at Honolulu Airport. Wiki wiki was the first Hawaiian term he learned on his first visit to the islands, when the airport counter agent directed him to take the wiki wiki bus between terminals. According to Cunningham, "I chose wiki-wiki as an alliterative substitute for 'quick' and thereby avoided naming this stuff quick-web." [http://c2.com/cgi/wiki?WikiHistory] See also: List of computer term etymologies.
In the late 1990s, wikis increasingly were recognized as a promising way to develop private- and public-knowledge bases, and this potential inspired the founders of the Nupedia encyclopedia project, Jimbo Wales and Larry Sanger, to use wiki technology as a basis for an electronic encyclopedia: Wikipedia was launched in January 2001; it originally was based upon UseMod software, but later switched to its own, open source codebase, now adopted by many other wikis.
In the early 2000s, wikis were increasingly adopted in the enterprise as collaborative software. Common uses included project communication, intranets and documentation, initially for technical users. In December 2002, Socialtext launched the first commercial open source wiki solution. Open source wikis such as MediaWiki, Kwiki and TWiki grew to over 1 million downloads on the Sourceforge repository by 2004. Today some companies use wikis as their only collaborative software and as a replacement for static intranets. There is arguably greater use of wikis behind firewalls than on the public internet!
In 2005, the Los Angeles Times experimented with using a wiki in the editorial section of its web site. The Wikitorial project was quickly shuttered as vandals quickly defaced it and features to help distribute administration of the site had been disabled.
Wiki communities
The largest wikis are listed at List of largest wikis and
[http://www.usemod.com/cgi-bin/mb.pl?BiggestWiki#Biggest_wikis_by_page_count_on_July_3_2004 Meatball: Biggest wikis]. Today, the English-language Wikipedia is, by far, the world's largest wiki; the German-language Wikipedia is the second-largest, while the other Wikipedias fill many of the remaining slots. Other large wikis include the WikiWikiWeb, Wikitravel, World66 and Susning.nu, a Swedish-language knowledge base. The all-encompassing nature of Wikipedia is a significant factor in its growth, while many other wikis are highly specialized. Some also have attributed Wikipedia's rapid growth to its decision not to use CamelCase.
Many public wikis are listed at
[http://www.worldwidewiki.net/wiki/SwitchWiki WorldWideWiki: SwitchWiki], which currently lists about 1000 public wiki communities (as of 2004-06-12).
One way of finding a wiki on a subject in which someone is interested is to follow the wiki-node network from wiki to wiki, or one could take a Wiki bus tour: TourBusStop.
For those interested in creating their own wiki, there are many publicly available "wiki farms", some of which can also make private, password-protected wikis. Socialtext, PeanutButterWiki, [http://seedwiki.com/ SeedWiki], [http://jotspot.com/ JotSpot], [http://communitywiki.org/odd/HomePage OddWiki], WikiCities, and [http://www.wikispaces.org/ Wikispaces] are seven such services; more at List of wiki farms. Wikipolls are also emerging. One site, [http://www.opinionrepublic.com/ Opinion Republic], is an experiment to capture public opinion and then converge on the most broadly accepted opinions.
Many wiki communities are private, particularly within enterprises as collaborative software. They are often used as internal documentation for in-house systems and applications.
For describing related wikis, there exist WikiNodes — pages on wikis describing related wikis. They are usually organized as neighbors and delegates. A neighbor wiki is simply a wiki that may discuss similar content or may otherwise be of interest. A delegate wiki is a wiki that agrees to have certain content delegated to that wiki.
References
- Aigrain, Philippe (2003). [http://www.debatpublic.net/Members/paigrain/texts/icoic.html The Individual and the Collective in Open Information Communities]. Invited talk at the 16th Bled Electronic Commerce Conference, Bled, Slovenija, June 11 2003.
- Aronsson, Lars (2002). [http://aronsson.se/wikipaper.html Operation of a Large Scale, General Purpose Wiki Website: Experience from susning.nu's first nine months in service]. Paper presented at the 6th International ICCC/IFIP Conference on Electronic Publishing, November 8, 2002, Karlovy Vary, Czech Republic.
- Benkler, Yochai (2002). Coase's penguin, or, Linux and The Nature of the Firm. The Yale Law Journal. v.112, n.3, pp.369–446.
- Cunningham, Ward and Leuf, Bo (2001): The Wiki Way. Quick Collaboration on the Web. Addison-Wesley, ISBN 0-201-71499-X.
- Delacroix, Jérôme (2005): Les wikis, espaces de l'intelligence collective, M2 Editions, Paris, ISBN 2-9520514-4-5.
- Jansson, Kurt (2002): [http://de.wikipedia.org/wiki/Benutzer:Kurt_Jansson/Vortrag_auf_dem_19C3 "Wikipedia. Die Freie Enzyklopädie."] Lecture at the 19th Chaos Communications Congress (19C3), December 27, 2002 Berlin, Germany.
- Lange, Christoph (ed., 2005). [http://www.cul.de/wiki.html Wiki - Planen, Einrichten, Verwalten]. Computer- und Literaturverlag, ISBN 3-936546-28-2.
- Mattison, David (2003). [http://www.infotoday.com/searcher/apr03/mattison.shtml "QuickiWiki, Swiki, TWiki, ZWiki, and the Plone Wars: Wiki as PIM and Collaborative Content Tool."] Searcher: The Magazine for Database Professionals, v. 11, no. 4 (April 2003): 32-48
- Möller, Erik (2003). [http://opencultures.t0.or.at/oc/participants/moeller Loud and clear: How Internet media can work]. Presentation at the Open Cultures conference, June 5 & 6, 2003 Vienna, Austria.
- Möller, Erik (2003). [http://www.humanist.de/erik/tdg/ Tanz der Gehirne]. Telepolis, May 9–30. Four parts: (i) "Das Wiki-Prinzip", (ii) "Alle gegen Brockhaus", (iii) "Diderots Traumtagebuch", und (iv) "Diesen Artikel bearbeiten".
- Nakisa, Ramin (2003). [http://www.linuxuser.co.uk/images/stories/pdf/lud29-Collaborative_Software-Wiki.pdf "Wiki Wiki Wah Wah"]. Linux User and Developer v.29, pp.42–48.
- Remy, Melanie. (2002). Wikipedia: The Free Encyclopedia. Online Information Review. v.26, n.6, p.434.
See also
- Bliki
- CyborgLog
- List of wikis
- Social software
- Wiki farm
- Wiki software
- Comparison of wiki software
- WikiNote
- List of wiki software
- Massively distributed collaboration
External links
- [http://www.evowiki.org/wiki.phtml?title=Wiki_evolution EvoWiki: How wikis evolve]
- [http://www.npost.com/interview.jsp?intID=INT00126 Interview with Jimmy Wales, WikiPedia Founder]
- [http://www.freewiki.info Free Wiki: Wiki Demos, Wiki Screenshots, Wiki Info, Wiki Feeds, Wiki Links - Search for Wikis by Custom Criteria]
- [http://sharewarewiki.com SharewareWiki]
- [http://www.linuxbazis.hu/keres.php?mod=keres&hol=linkek&q=wiki Wiki pages around Linux]
- [http://computer.howstuffworks.com/wiki.htm Wikis] at HowStuffWorks.
- [http://www.wired.com/news/culture/0,1284,66382,00.html?tw=wn_tophead_2: "Information Wants to be Liquid"] — Wired magazine article
- [http://www.usemod.com/cgi-bin/mb.pl?TourBusStop "Tour bus stop.." at MeatballWiki]
- [http://www.usemod.com/cgi-bin/mb.pl?WikiCommunityList Wiki Community List]
- [http://c2.com/cgi/wiki?WikiEngines Wiki Engines]
- [http://en.wikibooks.org/wiki/Wiki_Science Wiki Science]:
- [http://wikibooks.org/wiki/Wiki_Science:How_to_start_a_Wiki How to start a wiki] (on Wikibooks) — help write the book on starting a wiki
- [http://c2.com/cgi/wiki?WelcomeVisitors WikiWikiWeb] (the first wiki)
- [http://nrg78.com/ipw-web/b2/index.php?p=23 NRG78] article discussing the role of "enterprise" wikis in capturing and managing corporate memory
Category:Internet terminology Category: Groupware Category: Wiki
zh-min-nan:Wiki
ko:위키위키
ms:Wiki
ja:ウィキ
simple:Wiki
th:วิกิ
Cadgwith
Cadgwith is a picturesque fishing village in Cornwall, England.
It is situated on The Lizard peninsula between Lizard and Coverack.
External links
- [http://www.cadgwith.com/ Village Web Site]
Cadgwith
sylwester hosting snowboard w austrii Gry cukrzyca |
|
|
| :: RELATED NEWS :: |
|
|
|
Runtah
Runtah nyaéta sésa bahan nu teu dipikahayang nalika réngsena hiji prosés. "Runtah" mangrupa konsép jieunan manusa lantaran dina prosés alam mah konsép runtah teu dipikawanoh. Nu aya dina prosés alam mah ngan mangrupa produk nu teu aktip atawa inert.
Runtah bisa aya dina unggal wujud bahan: padet, caér, atawa gas. Nalika dileupaskeun dina dua wujud nu disebutkeun pangahirna, utamana
|
Template:Navigasiruntah
|
Wikipédia:Kasalahan umum
Wikipédia mangrupa proyek balaréa pikeun ngawujudkeun énsiklopédi nu pangmunelna dina sakabéh basa di dunia, kaasup Basa Sunda. Kukituna, hayu babarengan jeung nurut kana sababaraha aturan ngeunaan Wikipédia di handap ieu.
Wikipédia téh lain:
- Kamus. Cikan tuliskeun artikel anu rada munel. Kamus ayana di proyek Wiktionary.
- Bulletin board atawa tempat neundeun pesen. Cikan tulis artikel anu boga harti, ulah nuli
|
Template:KolomWikipédia
Alam jeung Élmu Alam
Alam -
Astronomi -
Biologi -
Ékologi -
Élmu alam -
Élmu bumi -
Fisika -
Read More... |
|