#016127: Searching with numeric values

Description:

Searching with numeric values gives very strange results.

If I search for a numeric value on an ezcSearchDocumentDefinition::STRING field, it does not matter what the search is for, all indexed items are always returned.

If I search for a numeric value on an ezcSearchDocumentDefinition::INT field, it returns the proper results if any matched the number, but if nothing matched the number all indexed items are returned.

I've not had the time to test this with more fields.

Using the Zend Lucene backend


Environment:

Operating System:
PHP Version: 5.3.1
Database and version:
Browser (and version):


- Attachments

No attachments for this issue.


- Comments

Both issues are caused by "bugs" in Zend_Search_Lucene.

For ::INT: If a part of a query does not match anything, it is considered irrelevant, and that may cause the whole query to be reduced into a null-query, which then returns all results.

For ::STRING: The query tokenizer in ZSL just ignores numeric values. A patch for this is simple, but still should be fixed by the Zend-guys. I will open bug reports for both upstream, and link those issues here. The simple patch for ::STRING is:


Index: /Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php
===================================================================
--- Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php    (revision 19211)
+++ Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php    (working copy)
@@ -75,7 +75,7 @@
 
 
         do {
-            if (! preg_match('/[a-zA-Z]+/', $this->_input, $match, PREG_OFFSET_CAPTURE, $this->_position)) {
+            if (! preg_match('/[a-zA-Z0-9]+/', $this->_input, $match, PREG_OFFSET_CAPTURE, $this->_position)) {
                 // It covers both cases a) there are no matches (preg_match(...) === 0)
                 // b) error occured (preg_match(...) === FALSE)
                 return null;
#264636 by Kore Nordmann on February 12th, 2010 [Permanent Link]

Issue http://framework.zend.com/issues/browse/ZF-5236url showed a workaround, so it has been fixed and tested in revision #11371.

#264638 by Kore Nordmann on February 12th, 2010 [Permanent Link]

Thanks for the fix.

On a somewhat related note, wouldn't it be better to use the case insensitive version instead?

new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive()

instead of

new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num()

#264682 by Nathan Guse on February 15th, 2010 [Permanent Link]

Reopening as it it seems nobody has noticed my reply about using Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive()

#264881 by Nathan Guse on February 25th, 2010 [Permanent Link]

Switched to that, even it will only affect ASCII chars, and no other Unicode-characters. Still, it won't hurt, I guess.

#265110 by Kore Nordmann on March 5th, 2010 [Permanent Link]

See above.

#265111 by Kore Nordmann on March 5th, 2010 [Permanent Link]

- History
Properties
Type Bug
Priority Medium
Component Components » Search
Affects Unknown
Fix Version -
Reporter Nathan Guse
Responsible Kore Nordmann
Status 0 Closed
Resolution Fixed
Created February 6th, 2010
Updated March 5th, 2010
Resolved March 5th, 2010
 
Navigation [Permanent Link]
Previous Issue
Back to Issues List
Next Issue: #015537
  Graph shows to small and truncated rotated axis labels