PHP/MYSQL sorting algorythms

Desova

Limp Gawd
Joined
Jun 24, 2002
Messages
467
I'm trying to figure out the best way to search through a table in a mysql database with php.

the script is taking in a search string, containing an unknown number of words, and they are only being compared to 1 (or possibly 2 fields if its not much harder to do).

i was going to split the string into terms and ereg each term for each row, but that will only show entries which match on any term, wheras most other search systems that i've seen use all terms and sort them. so whats the best way to go about that?

compare each term, give it a score, stick it into an array and then sort it?

or is there something cunning already out there that i dont know about (which php usually has)?
 
got an interesting problem here, it seems only varchar's can be fulltext, so what if i want the description to be more than 255 characters long?
 
FULLTEXT indexes are used with MyISAM tables only and can be created from CHAR, VARCHAR, or TEXT columns at CREATE TABLE time or added later with ALTER TABLE or CREATE INDEX
Seems, from this description, that FULLTEXT indexes can be CHAR, VARCHAR or TEXT columns.
 
it wasnt giving me the option with any of the texts :s

so i changed it to a varchar and added the fulltext index,but it gives an error when theres more than 1 field being searched, but works fine when only 1 is being used:

Code:
Query: SELECT id, title FROM venue WHERE MATCH (title,description) AGAINST ('industrial')
Search failed: Can't find FULLTEXT index matching the column list

whereas both of these work:

Code:
Query: SELECT id, title FROM venue WHERE MATCH (title) AGAINST ('industrial')
Query: SELECT id, title FROM venue WHERE MATCH (description) AGAINST ('industrial')
 
gawd damn i really wish i found that before. i made this whole php based search class. ARGG :mad: guess that means i have to rewrite it

well maybe it will somehow help someone anyway:
my search class
Code:
<?php
class SQLsearch {
	var $words = array();
	var $type;
	var $id;
	var $columns = array();
	var $table;
	var $weighting = array();

	function SQLsearch($words, $type, $id, $columns, $table, $weight = 1) {
		$this->words = explode(' ',trim($words));
		$type = strtolower($type);
		switch($type) {
			case 'exact':
			case 'all'  :
			case 'any'  : $this->type = $type; break;
			default     : $this->type = 'all'; break;
		}
		$this->id      = $id;
		$this->columns = explode(',',$columns);
		$this->table   = $table;
		
		$this->weighting = explode(',',$weight);
		for ( $i = count($this->weighting);
			  $i < count($this->columns)  ; $i++ )
			 $this->weighting[$i] = 1;

		$this->doSearch();
	}
	
	function doSearch() {
		switch($this->type) {
			case 'all':
			case 'any':
			    for ($i=0; $i<count($this->words); $i++)
                    $whereQarry[] = $this->words[$i];
                break;
			case 'exact':
			    $whereQarry[] = implode(" ",$this->words);
                break;
        }

		if ($this->type == 'exact') {
	        foreach ($this->columns as $val)
			$whereQarry2[mainSql] = "!(^|\W)(".implode('\W',$whereQarry).")(\W|$)+!i";
        } else {
        	for ($i = 0; $i < count($whereQarry); $i++)
				$whereQarry2[] = "!(^|\W)(".$whereQarry[$i].")\w*(\W|$)+!i";
        }
        
        $tmpQ = "CREATE TEMPORARY TABLE `tmpScore` (".
                "`".$this->id."` INT(10) UNSIGNED NOT NULL ,".
                "`score` DECIMAL(10,3) UNSIGNED NOT NULL".
                ");";
        mysql_query($tmpQ);
        
        $tmpQ = new SQLQuery("SELECT",$this->id.','.implode(',',$this->columns),$this->table);
        $tmpR = $tmpQ->performSQL();
        
        for ($i = 0; $r = mysql_fetch_array($tmpR); $i++) {
        	$tmpScore[$i][$this->id] = $r[$this->id];
        	$tmpScore[$i][score]     = 0;
			foreach($this->columns as $val) {
				if ($this->type == 'exact') {
					$thisIndex = ($val == 'title') ? $val : 'mainSql';
	                preg_match_all($whereQarry2[$thisIndex],$r[$val],$reg);
	                $scoreCount[$val] = count($reg[0]);
                } else {
					for ($j = 0; $j < count($whereQarry2); $j++) {
						preg_match_all($whereQarry2[$j],$r[$val],$reg);
						$scoreCount[$val][] = count($reg[0]);
						unset($reg);
					}
                }
			}
			
			switch($this->type) {
				case 'exact':
                    foreach($this->columns as $count => $val)
						$tmpScore[$i][score] += $scoreCount[$val] * $this->weighting[$count];
					break;
				case 'any':
				    foreach($this->columns as $count => $val)
				        for ($j = 0; $j < count($scoreCount[$val]); $j++)
				            $tmpScore[$i][score] += $scoreCount[$val][$j] * $this->weighting[$count];
	                break;
				case 'all':
                    for ($j = 0; $j < count($scoreCount[$this->columns[0]]); $j++)
						$scoreCount[finalTally][$j] = 0;
				    foreach($this->columns as $count => $val) {
				    	for ($j = 0; $j < count($scoreCount[$val]); $j++) {
				    		$tmpScore[$i][score] += $scoreCount[$val][$j] * $this->weighting[$count];
				    		$scoreCount[finalTally][$j] += $scoreCount[$val][$j];
				    	}
					}
					for ($j = 0; $j < count($scoreCount[finalTally]); $j++)
			            if ($scoreCount[finalTally][$j] == 0) {
			                $tmpScore[$i][score] = 0;
							break;
						}
	                break;
	        }
			unset($scoreCount);
        }
        
        for ($i = 0; $i < count($tmpScore); $i++)
            mysql_query("INSERT INTO `tmpScore` (`".$this->id."`,`score`) VALUES ('".$tmpScore[$i][$this->id]."','".$tmpScore[$i][score]."');");
        $trM = mysql_query("SELECT MAX(score) from `tmpScore`;");
        list($maxScore) = mysql_fetch_array($trM);
        mysql_query("UPDATE `tmpScore` SET score = (score / $maxScore) * 100");
	}
}
?>
i don't really have time to go through it now, but if you can figure it out, more power to ya (i know it works, although i removed a section that changed title colums from "My Title, The/An/A" to "The/An/A My Title", because i use it here

of course i use my sqlquery class which you need as well here (i haven't made a page for the search class yet)
 
Originally posted by Desova
it wasnt giving me the option with any of the texts :s

so i changed it to a varchar and added the fulltext index,but it gives an error when theres more than 1 field being searched, but works fine when only 1 is being used:

Code:
Query: SELECT id, title FROM venue WHERE MATCH (title,description) AGAINST ('industrial')
Search failed: Can't find FULLTEXT index matching the column list

whereas both of these work:

Code:
Query: SELECT id, title FROM venue WHERE MATCH (title) AGAINST ('industrial')
Query: SELECT id, title FROM venue WHERE MATCH (description) AGAINST ('industrial')
in my slow attempt to start rewriting my search class, i've added the fultext inexes to the appropriate tables in my database. it appears that you have to have one index with all of the colums you want to search at once for and not create a seperate fulltext index for each. ie
Code:
sql->ALTER TABLE `whatever` ADD FULLTEXT(`col1`,`col2`,`col3`)
not add each seperately. phpmyadmin is really useful for this for me at least.

one thing i don't understand is the size field when you add a new index in phpmyadmin
 
after playing with it for a few minutes, have i mentioned how much exactly i wish i found this before !!! :eek: :( :mad: :confused:
 
have you got it working then?

i've also found a new problem, it only matches whole words and not partial words (i.e. if i search for apple, then it'll return apple and not apples since apples doesnt match).

ideas?
 
there's actually a user comment about the multiple fields thingy
Posted by Martin Glancy on August 29 2002 12:06pm [Delete] [Edit]

A clarification for those very new to mySQL:
MATCH (body,title) will only work if you have an
index on both fields together - something created
like this:
ALTER TABLE some_tbl ADD FULLTEXT (body,title);
It will not work if you have an index on each
of 'body' and 'title' separately.
from my understanding, in order to get partial words, you need to use 'IN BOOLEAN MODE' but when i try it i get an error
Error
SQL-query :

SELECT *
FROM `tng`
WHERE MATCH (

title, shortDescription, longDescription
)
AGAINST (

'+the'
IN BOOLEAN
MODE
)
LIMIT 0, 30

MySQL said:

You have an error in your SQL syntax near 'BOOLEAN MODE ) LIMIT 0, 30' at line 1
i have mysql v3.23.54
 
i read the indexing thing as well and messed it up when the description wasnt indexable, better go and try to index them again then!

i'll have another look into the partial matches sometime tommorrow
 
from the documentation page
As of Version 4.0.1, MySQL can also perform boolean full-text searches using the IN BOOLEAN MODE modifier.
ZUIT ALORS!

i don't want to have to upgrade mysql on my linux server (especially since i'm still not terribly comfortable with linux) but i guess i'll have to

..of course this page may prove helpful
 
that page was in fact helpful

this quote worked perfectly (substituting v 4.0.17 since that's what i dl'ed)
Posted by [name withheld] on October 22 2003 2:11pm [Delete] [Edit]

I've successfully upgraded MySQL on a Redhat 9 server by following these steps:

1. Download the server, client, and "Dynamic client libraries
(including 3.23.x libraries)" rpms.
2. rpm -Uvh --nodeps MySQL-server-4.0.16-0.i386.rpm
3. rpm -Uvh MySQL-shared-compat-4.0.16-0.i386.rpm
4. rpm -Uvh MySQL-client-4.0.16-0.i386.rpm
5. I had to manually kill the mysqld process and restart, but after that everything works fine, including my php code.
 
i have MySQL 4.015 so the IN BOOLEAN MODE works fine, but it still only returns full matches. i tried to put in '*term*' but still only returned the full matches (apple and not apples)
 
Originally posted by Desova
i have MySQL 4.015 so the IN BOOLEAN MODE works fine, but it still only returns full matches. i tried to put in '*term*' but still only returned the full matches (apple and not apples)
what was your more or less exact query (exact table/columns/words not necessary but basic structure)?

wouldn't 'apple%' return apples without in boolean mode since % is mysql's wildcard character?
 
Originally posted by tim
wouldn't 'apple%' return apples without in boolean mode since % is mysql's wildcard character?
it seems the wildcards are ignored in regular fulltext searching from my testing. i can get IN BOOLEAN MODE to match apple* and *apple* fine and return apples (i actually entered "apple" and "apples" in my test table).

my problem is that IN BOOLEAN MODE doesn't sort by relevence and in fact if you try to show the relevence score by doing something like
Code:
SELECT *, MATCH ( col1,col2 ) AGAINST ( 'expr' IN BOOLEAN MODE ) 
FROM `table` [WHERE MATCH ( col1,col2 ) AGAINST ( 'expr' IN BOOLEAN MODE ) ]
it only shows a 0 or 1, like either it matched or not, it doesn't show a useful score like it does normally.
 
also, the syntax of a search is
Code:
MATCH (col1,col2,...) AGAINST (expr [IN BOOLEAN MODE | WITH QUERY EXPANSION] )
but i don't see any explanation of WITH QUERY EXPANSION on the search page.

... some minutes later

ok so there is actually this paragraph
As of Version 4.1.1, full-text search supports query expansion (in particular, its variant ``blind query expansion''). It is generally useful when a search phrase is too short, which often means that the user is relying on implied knowledge that the full-text search engine usually lacks. For example, a user searching for ``database'' may really mean that ``MySQL'', ``Oracle'', ``DB2'', and ``RDBMS'' all are phrases that should match ``databases'' and should be returned, too. This is implied knowledge. Blind query expansion (also known as automatic relevance feedback) works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few top found documents from the first search. Thus, if one of these documents contained the word ``databases'' and the word ``MySQL'', then the second search will find the documents that contain the word ``MySQL'' but not ``database''. Another example could be searching for books by Georges Simenon about Maigret, when a user is not sure how to spell ``Maigret''. Then, searching for ``Megre and the reluctant witnesses'' will find only ``Maigret and the Reluctant Witnesses'' without query expansion, but all books with the word ``Maigret'' on the second pass of a search with query expansion. Note: because blind query expansion tends to increase noise significantly, by returning non-relevant documents, it's only meaningful to use when a search phrase is rather short.
but it doesn't seem to quite fully explain what it does
 
after looking throguh all the user comments, i found this
It should be noted in the documentation that IN
BOOLEAN MODE will almost always return a
relevance of 1.0. In order to get a relevance that is
meaningful, you'll need to:

SELECT MATCH('Content') AGAINST ('keyword1
keyword2') as Relevance FROM table WHERE MATCH
('Content') AGAINST('+keyword1 +keyword2' IN
BOOLEAN MODE) HAVING Relevance > 0.2 ORDER
BY Relevance DESC

Notice that you are doing a regular relevance query
to obtain relevance factors combined with a WHERE
clause that uses BOOLEAN MODE. The BOOLEAN
MODE gives you the subset that fulfills the
requirements of the BOOLEAN search, the relevance
query fulfills the relevance factor, and the HAVING
clause (in this case) ensures that the document is
relevant to the search (i.e. documents that score
less than 0.2 are considered irrelevant). This also
allows you to order by relevance.

This may or may not be a bug in the way that IN
BOOLEAN MODE operates, although the comments
I've read on the mailing list suggest that IN
BOOLEAN MODE's relevance ranking is not very
complicated, thus lending itself poorly for actually
providing relevant documents. BTW - I didn't notice
a performance loss for doing this, since it appears
MySQL only performs the FULLTEXT search once,
even though the two MATCH clauses are different.
Use EXPLAIN to prove this.
...will be trying it out
 
Originally posted by tim
also, the syntax of a search is
Code:
MATCH (col1,col2,...) AGAINST (expr [IN BOOLEAN MODE | WITH QUERY EXPANSION] )
but i don't see any explanation of WITH QUERY EXPANSION on the search page.

... some minutes later

ok so there is actually this paragraphbut it doesn't seem to quite fully explain what it does ...
gawd, i just can't read the whole thing at once. it clearly says it's in mysql v 4.1.* which is the alpha release of the next version
 
Originally posted by tim
also, the syntax of a search is
Code:
MATCH (col1,col2,...) AGAINST (expr [IN BOOLEAN MODE | WITH QUERY EXPANSION] )
but i don't see any explanation of WITH QUERY EXPANSION on the search page.

... some minutes later

ok so there is actually this paragraphbut it doesn't seem to quite fully explain what it does

does it not basically mean that with query expansion a search for apple will also include any results from a search with apple in it, i.e. bananas as well, but i think it means it would ignore apple and only return bananas.
 
Originally posted by tim
gawd, i just can't read the whole thing at once. it clearly says it's in mysql v 4.1.* which is the alpha release of the next version

still glad you found this now? :p
 
Originally posted by Desova
still glad you found this now? :p
i'm having mixed feelings. i really would like to be able to search like the regular search but be able to use wild cards and score modifiers like the IN BOOLEAN MODE search. i can get IN BOOLEAN MODE scores to be different from 1 but they don't vary enough to be useful. i do like that if a search term is in more than half the rows, then the rows aren't returned (hmm, i bet i could somehow add that to my search class).

when i use my search class, i'm only searching a mere ~170 rows, the largest of the possible tables i search is 1 megabyte. i'd be fascinated to see some times for a very large table
 
Back
Top