Home - Dashboard - Add URL - Tools
Recent searches: t965t - tryu administrator compo Worksheet php - poc php - f16 excelwriter Writer BIFFwriter php poc php - t709t - t121t - t252t - inc header php step_one php wp content plugins myflash myflash button php - t221t - toolbar php
On this day of 16. July 2007 the first version of this 100% PHP powered search engine went live, indexing a whopping 2 pages per hour (I turn up the speed when I'm present) to not kill the shared hosting account in case there would be any evil bugs left in the code.
The total size of PHP code and HTML for this search engine is below 1000 lines of code and was written in two days time borrowing some code from a search algorithm I used in our directory. It uses an index to quickly find pages from a large database (It's not that large yet though, but growing) and it ranks pages according to the number of times a keyword is found on the page.
The search algorithm lets you input multiple keywords separated by space and it will lookup the page that ranks best for all of the keywords. It's basically tha page with the highest rank for keyword1_score + keyword2_score + keyword3_score
The theories below is from the planning department and will inevitably make the code grow larger than 1KLOC when implemented.
You can add your page to the index by going to the Add URL page and submitting your website address!
Well everywhere people are talking about how difficult it is to produce relevant search results. I just want to prove that I can do it, and I think there is a place for human reviewed search results.
This is going to be the worlds most relevant search engine, better than google, Yahoo and all the rest. :) Well at least I want it to be family friendly and answer your questions.
First of all we have to ask ourselves a couple of questions. What are we searching for? A webpage? A product, image, website, manual, article, blog? We don't have to search blogs if the user want's to find a product to buy. But maybe some blogger has already found the product, and knows a phone number to call...
The questions are endless, to find the page with the most similar keywords to the query is quite simple, to find what someone is looking for is a lot harder.
Relevancy is a score ranking how well a particular page answers a particular search query. It measures how close the keyword combination searched for and the content/context of the page are related.
Importance on the other hand has nothing to do with how well a particular keyword combination relates to the page. It is a measure of how trustworthy, popular and stable a particular web page is.
Of you have two pages both giving their opinion of SEO (search engine optimization) they may both have about the same keyword density, and incoming links anchor texts (they are about the same topic anyway). Which one of these pages would you put before the other in the search results?
The answer is page importance. One page may be written by some guy who just started with the Internet and thinks he could make a heap of money teaching SEO to the world (even though he is just learning it himself), the other page is made by a SEO pro of 7 years that have an information database in connection with his SEO firms website.
In this case page importance is a lot hight for the SEO pro (in human terms), it is just a better page with more hard facts and less babble. But this leads us to a tougher question:
The old classic in this case is the PageRank™ algorithm made and patented by Google founders Larry Page and Sergey Brin. It measures how many other pages that links to a particular page and uses that score as the importance of a page. In fact it doesn't use the number of inbound links as importance rank of a page but it uses the sum of the already calculated importances of the pages that link to this page.
This is a truly clever mathematical formula that indicates how popular a particular page is. And it worked wonders in the early days of search when only good pages where popular. Nowadays as PageRank™ has become the mark of popularity, webmasters seak all kinds of un-holy alliances with the sole purpose of increasing their own rank. If you link to me, I'll link to you and we will both benefit. You don't need to be good as long as you're popular.
PageRank™ (or similar unpatented algos) are still one factor in determining the importance of a page. But to make importance more real and make it harder to manipulate we must make it depend on many more factors. Heres a list of some factors that play a part in importance.
These are some factors that tells us how important a particular web page is. But to the most pare importance is connected to the domain name that page is located on.
Factors that affect domain importance. I would say that the factors we need to take into account has a lot to do with how much does it cost to run this domain? And how technically advanced is the computer infrastructure surrounding it?
This research paper by Krishna Bharat and George A. Mihaila describes a way of ranking documents by the number of expert pages inside the same topic linking to it. The algorithm finds and ranks expert pages and then ranks all pages according to the links to them from the pages in the expert group.
Read more at Hilltop: A Search Engine based on Expert Documents
Nutch is an open source search engine from the Apache Software Foundation, check that out here
Some great background and tech info about Google
I asked the warriors how to get money out of a search engine, and here's the answer
Article about making a full text search engine using PHP and SQL
Byholm.com - Casual Clothing - Search - Directory - eBooks - eCards - eBiz Tutorial - Bible Info - Mount St. Helens - Scam Info