Current time: 12-16-2017, 03:50 AM Hello There, Guest! (LoginRegister)

Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Database Query Search Time
08-16-2010, 09:30 AM
Post: #21
RE: Database Query Search Time
Hey Patrick,

I have only indexed a handful of products. I wanted to be sure I set the index up correctly before I indexed millions and millions of products. After I index all the products, and get the script up and running I will compare the performances and let everyone know. I have to remember though my old script was searching a database with 2.6 million products and now I have 5.6 million products and growing. I have my fingers crossed that all this work will be worth it. Will let you guys know when its all done Smile

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
08-16-2010, 04:53 PM
Post: #22
RE: Database Query Search Time
Greetings,

At this time, I am going to elaborate a bit on how I setup the price facet. My solution was inspired by an article I found on the net:

http://www.lucidimagination.com/Communit...earch-Solr

I am taking a quote out of the article:

If we request field faceting on the "price" field, we get back counts for individual prices. However, we want price ranges, not individual prices. One workaround is to index another field that contains the ranges that the prices fall into (for example 100_200, 200_300, 300_400) and use field faceting on that field. A more flexible solution is to utilize query facets that provide the ability to retrieve facet counts for arbitrary queries.

Let's assume that we have an indexed "price" field and we want to get the facet counts for the following ranges of prices: $100 or less, $100-$200, $200-$300, $300-$400, $400-$500, and $500 or over. We simply add a facet.query command to our query request for each desired range


I want two drop down lists of prices on our product search page containing the following values for minimum and maximum price:
0,10,20,30,40,50,75,100,150,200,250,300,350,400,450,500,600,700,800,900,1000,125​0,1500,1750,2000,3000,4000,5000,1000000000

Let's say we have a product whose price is $549.99. With the custom2 field, I can setup price ranges that this product falls into:

02p 102p 202p 302p 402p 502p 752p 1002p 1502p 2002p 2502p 3002p 3502p 4002p 4502p 5002p p2600 p2700 p2800 p2900 p21000 p21250 p21500 p21750 p22000 p23000 p24000 p25000

The value "2" separates the price and p. If I want to search for products with prices between 300 and 750 dollars, I can set the criteria like this: criteria="custom2:p2750 and custom2:3002p"

I tried different delimiters in place of the "2" above only to find that all results were returned to me instead of filtered results. I even tried to leave a delimiter out to save some space in the cfcollection. I am not sure if something was being cached or what but I did not retrieve the results I expected. However, when I leave the "2" in there, I get exactly what I want. I have tested this many times on a cfcollection of 100 products. I have the mindset on this case if it is not broke do not fix it or improve it for that matter.

I believe that this method is more difficult than it should be. I do not see why there can not be a way we can just store prices in a custom field and have some sort of range search within the coldfusion cfsearch wrapper. I am a bit disappointed with that.

My initials thoughts on SOLR so far is that it seems very touchy when using cfsearch. I find documentation hard to find when it comes to coldfusion and SOLR. Adobe has some basic tutorials but nothing advanced like price facets. I found that the skillset for coldfusion and SOLR development really hard to find.

I am not sure if anyone else has this problem. For cfsearch, if I try to set the "type" parameter to any value thats in the documentation, coldfusion throws an error. I am not sure if this is a problem with coldfusion in general or if an error occurred when updating my server. I intend to look into this more.

I am going to play around with categories / subcategories next. After that, I should be able to import all products into the cfcollection. Then I can see how fast SOLR is compared to just using CFQUERY.

I will release a bit of code as well to help people that come across this. Will do that after the script is completed.

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
08-16-2010, 06:08 PM
Post: #23
RE: Database Query Search Time
Hello Again,

One thing about the SOLR searches is that there are only four custom fields. The Coldfusion documentation states that custom1 through custom4 do not work with the SOLR engine. However, that is not true.

I have more than four fields I want to do narrow down the results with. With custom4, I am setting this field to do boolean searches on different variables.

PRODUCT_VALIDATED = Has the product been validated by the admin staff?
PRODUCT_PHOTO = Does this product have a local image?
PRODUCT_GOOGLE_IMAGE_URL = Does this product have an external image?

All I need is one sequence of numbers for each boolean variable I want to do searches with. Because I have three boolean variables, I need three sequences of numbers.

The first number in the sequence will be 122 or 022. The second number in the sequence will be 212 or 202. The last number in the sequence will be 221 or 220.

To find products that have been validated and have photos on the local server I can set the criteria to this:

criteria="custom4:122 AND custom4:212"

To find products that have been validated and have photos on an external server I can set the criteria to this:

criteria="custom4:122 AND custom4:221"

More to come...

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
08-16-2010, 08:43 PM
Post: #24
RE: Database Query Search Time
Hey Guys,

The following code uses cfcollection and cfindex to create and populate a SOLR collection of data. 10,000 products are added to the collection every 90 seconds. The script is called from a scheduled task setup using the coldfusion administrator.

Code:
<cfsilent>

  <cfset myOutput = '<p>START: #NOW()#</p>'>

  <cfsetting requestTimeOut = "6000">

  <cfset collectionName = 'products_id'>
  <cfset collectionPath = 'F:\collections\'>

  <cfcollection action="list" name="COLLECTION_LIST" engine="solr">

  <cfquery name="COLLECTION_CHECK" dbtype="query">
  SELECT * FROM COLLECTION_LIST  
  WHERE COLLECTION_LIST.NAME = <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#collectionName#">
  </cfquery>  

  <cfif #COLLECTION_CHECK.RecordCount# NEQ 0>

    <cfset myOutput = '#myOutput#<p>Collection Exists</p>'>

    <cfquery datasource="#dsnName#" name="PRODUCT_LIST">
    SELECT TOP 10000 PRODUCT.PRODUCT_ID, PRODUCT.PRODUCT_PRIORITY_ID, PRODUCT.BUSINESS_ID, PRODUCT.PRODUCT_NAME,
    PRODUCT.PRODUCT_PRICE, PRODUCT.PRODUCT_DESCRIPTION, PRODUCT.PRODUCT_PHOTO, PRODUCT.PRODUCT_KEYWORD1,
    PRODUCT.PRODUCT_KEYWORD2, PRODUCT.PRODUCT_KEYWORD3, PRODUCT.PRODUCT_KEYWORD4, PRODUCT.PRODUCT_GOOGLE_IMAGE_URL,
    PRODUCT.PRODUCT_VALIDATED
    FROM PRODUCT
    WHERE PRODUCT.PRODUCT_INDEXED = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="0">
    </cfquery>
    
    <cfset myOutput = '#myOutput#<p>#PRODUCT_LIST.RecordCount# Product(s) Need Updated</p>'>
    
    <cfset priceRangeArray = ArrayNew(1)>
    <cfset productBoolArray = ArrayNew(1)>
    <cfset productSubcatIDArray = ArrayNew(1)>
  
    <cfif #PRODUCT_LIST.RecordCount# NEQ 0>

      <cfset priceList = '0,10,20,30,40,50,75,100,150,200,250,300,350,400,450,500,600,700,800,900,1000'>
      <cfset priceList = '#priceList#,1250,1500,1750,2000,3000,4000,5000,1000000000'>
    
      <cfset priceArray = '#ListToArray(priceList)#'>
      <cfset loopLength = #ArrayLen(priceArray)# - 1>
  
      <cfloop index="i" from="1" to="#PRODUCT_LIST.RecordCount#">
            
        <cfset productGooglePhoto = '#PRODUCT_LIST.PRODUCT_GOOGLE_IMAGE_URL[i]#'>
        <cfset productValidated = '#PRODUCT_LIST.PRODUCT_VALIDATED[i]#'>
        <cfset productPhoto = '#PRODUCT_LIST.PRODUCT_PHOTO[i]#'>
        <cfset productPrice = '#PRODUCT_LIST.PRODUCT_PRICE[i]#'>  
        <cfset productID = '#PRODUCT_LIST.PRODUCT_ID[i]#'>

        <cfset priceRangeArray[i] = ''>      
        <cfset productBoolArray[i] = ''>
        <cfset productSubcatIDArray[i] = ''>
      
        <cfquery datasource="#dsnName#" name="SUBCAT_LIST">
        SELECT ASSIGN_SUBCAT_PRODUCT.PRODUCT_SUBCATEGORY_ID
        FROM ASSIGN_SUBCAT_PRODUCT
        WHERE ASSIGN_SUBCAT_PRODUCT.PRODUCT_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#productID#">
        </cfquery>      
        
        <cfif #SUBCAT_LIST.RecordCount# NEQ 0>
        
          <cfloop index="j" from="1" to="#SUBCAT_LIST.RecordCount#">
          
            <cfset subcatID = '#SUBCAT_LIST.PRODUCT_SUBCATEGORY_ID[j]#'>
          
            <cfif #ListLen(productSubcatIDArray[i])# NEQ 0>
              <cfset productSubcatIDArray[i] = '#productSubcatIDArray[i]#,#subcatID#'>
            <cfelse>
              <cfset productSubcatIDArray[i] = '#subcatID#'>
            </cfif>
          
          </cfloop>
        
        </cfif>
        
        <cfif #productValidated# EQ 1>        
          <cfset productBoolArray[i] = '122'>                
        <cfelse>
          <cfset productBoolArray[i] = '022'>
        </cfif>

        <cfif #productGooglePhoto# NEQ "">        
          <cfset productBoolArray[i] = '#productBoolArray[i]# 212'>                
        <cfelse>
          <cfset productBoolArray[i] = '#productBoolArray[i]# 202'>
        </cfif>

        <cfif #productPhoto# NEQ "">        
          <cfset productBoolArray[i] = '#productBoolArray[i]# 221'>                
        <cfelse>
          <cfset productBoolArray[i] = '#productBoolArray[i]# 220'>
        </cfif>

        <cfloop index="j" from="1" to="#loopLength#">
        
          <cfset myPrice = '#priceArray[j]#'>
          <cfset loopFrom = #i#+j>
          
          <cfif #productPrice# GTE #myPrice#>

            <cfif #ListLen(priceRangeArray[i])# NEQ 0>
              <cfset priceRangeArray[i] = '#priceRangeArray[i]# #myPrice#2p'>
            <cfelse>
              <cfset priceRangeArray[i] = ' #myPrice#2p'>
            </cfif>
          
          </cfif>
          
          <cfif #productPrice# LTE #myPrice#>

            <cfif #ListLen(priceRangeArray[i])# NEQ 0>
              <cfset priceRangeArray[i] = '#priceRangeArray[i]# p2#myPrice#'>
            <cfelse>
              <cfset priceRangeArray[i] = ' p2#myPrice#'>
            </cfif>
          
          </cfif>          
                                    
        </cfloop>
                
      </cfloop>
    
    </cfif>
  
    <cfset nColumnNumber = QueryAddColumn(PRODUCT_LIST,"PRICE_RANGE","VarChar",priceRangeArray)>    
    <cfset nColumnNumber2 = QueryAddColumn(PRODUCT_LIST,"BOOL_VALUES","VarChar",productBoolArray)>  
    <cfset nColumnNumber3 = QueryAddColumn(PRODUCT_LIST,"PRODUCT_SUBCATEGORIES","VarChar",productSubcatIDArray)>  
    
    <cfset myColumns = 'PRODUCT_NAME,PRODUCT_DESCRIPTION,PRODUCT_KEYWORD1,PRODUCT_KEYWORD2'>
    <cfset myColumns = '#myColumns#,PRODUCT_KEYWORD3,PRODUCT_KEYWORD4'>  

    <cfindex action="update" collection="#collectionName#" query="PRODUCT_LIST" type="custom" key="PRODUCT_ID"
    body="#myColumns#" title="PRODUCT_NAME" custom1="BUSINESS_ID" custom2="PRICE_RANGE" custom3="PRODUCT_PRIORITY_ID"
    custom4="BOOL_VALUES" category="PRODUCT_SUBCATEGORIES">

    <cfif #PRODUCT_LIST.RecordCount# NEQ 0>
  
      <cfloop index="i" from="1" to="#PRODUCT_LIST.RecordCount#">
  
        <cfset productID = '#PRODUCT_LIST.PRODUCT_ID[i]#'>
      
        <cfquery datasource="#dsnName#" maxrows="1">
        UPDATE PRODUCT
        SET PRODUCT.PRODUCT_INDEXED = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="1">
        WHERE PRODUCT.PRODUCT_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#productID#">
        </cfquery>
    
      </cfloop>
  
    </cfif>
      
  <cfelse>

    <cfset myOutput = '#myOutput#<p>Collection Does Not Exist</p>'>
    
    <cfcollection action="create" categories="yes" collection="#collectionName#" engine="solr" path="#collectionPath#">
    
    <cfset myOutput = '#myOutput#<p>Collection Created</p>'>

  </cfif>

  <cfset myOutput = '#myOutput#<p>END: #NOW()#</p>'>

</cfsilent>

<cfoutput>#myOutput#</cfoutput>

I plan to add some code so that the script can not be called while one is already executing. Besides that, I think the script should be pretty solid. This should take about 19 hours to execute for about 5.6 million products. After that, I should be able to tell you how well the cfsearch performs in comparison with the cfquery search.

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
08-17-2010, 01:14 AM
Post: #25
RE: Database Query Search Time
since you are adding 10,000 products in a batch, i recommend you optimize after each batch... http://wiki.apache.org/solr/UpdateXmlMes...ptimize.22

cdnplanet.com: We help you select the right CDN
Visit this user's website Find all posts by this user
Quote this message in a reply
08-17-2010, 11:55 AM
Post: #26
RE: Database Query Search Time
Hey Guys,

If anyone is working with MS SQL Server, I recommend creating a view like this:

Code:
SELECT     PRODUCT_ID, PRODUCT_INDEXED
FROM         dbo.PRODUCT
WHERE     (PRODUCT_INDEXED = 0)

For faster execution, the PRODUCT_LIST cfquery can be updated to this:

Code:
<cfquery datasource="#dsnName#" name="PRODUCT_LIST">
SELECT PRODUCT.PRODUCT_ID, PRODUCT.PRODUCT_PRIORITY_ID, PRODUCT.BUSINESS_ID, PRODUCT.PRODUCT_NAME, PRODUCT.PRODUCT_PRICE,
PRODUCT.PRODUCT_DESCRIPTION, PRODUCT.PRODUCT_PHOTO, PRODUCT.PRODUCT_KEYWORD1, PRODUCT.PRODUCT_KEYWORD2, PRODUCT.PRODUCT_KEYWORD3, PRODUCT.PRODUCT_KEYWORD4, PRODUCT.PRODUCT_GOOGLE_IMAGE_URL, PRODUCT.PRODUCT_VALIDATED
FROM PRODUCT
WHERE PRODUCT.PRODUCT_ID IN (SELECT TOP 5000 PRODUCT_ID_COLLECTION.PRODUCT_ID FROM PRODUCT_ID_COLLECTION)
</cfquery>

The change above allows the products that need indexed to be gathered more quickly. Without this change, performance degrades as more and more products are indexed.

To optimize a collection as suggested above, the following code can be used:

Code:
<cfcollection action="optimize" collection="#collectionName#">

I think performance suffers a little as the collection grows larger as well. Not sure if anyone else has experienced this while indexing?

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
08-17-2010, 07:20 PM
Post: #27
RE: Database Query Search Time
Hey There,

Has anybody come across this error before:

Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_late​r

request:
http://localhost:8983/solr/products_id/update?commit=true&waitFlush=false&waitSearcher=false&wt=javabin
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Mailer: ColdFusion 9 Application Server


Seems the server is throwing a lot of these errors as it indexes the collection. Indexing is going a lot slower than expected.

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
08-17-2010, 09:24 PM (This post was last modified: 08-17-2010 09:53 PM by green-watch.org.)
Post: #28
RE: Database Query Search Time
Hello Again,

I found an important security bulletin about coldfusion 9 and SOLR:
http://www.adobe.com/support/security/bu...10-04.html

Without taking care of this update, the SOLR collections are exposed and risk being downloaded through external URLs. Basically, the data could be stolen right from your collections.

Update: I found some information on the error I found listed above concerning the "maxWarmingSearchers":
http://www.coldfusionjedi.com/index.cfm/...n-the-bomb

The guy claims its due to a faulty collection. Possible reasons could include:

#1 - The script could have executed while the same script was already executing. This could have occurred because the script got called every 90 seconds. When the view was not being used for that one query, it caused the script performance to degrade. In my next attempt, this issue will be taken care of.

#2 - Sajal mentioned he would optimize the collection frequently after batch inserts. I am not sure if this would help or if it is something I could do once all products have been inserted. I am going to avoid this for now.

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
08-18-2010, 07:20 PM
Post: #29
RE: Database Query Search Time
Hey Guys,

I think it is an important note for SOLR users to understand when they add, edit, or delete products in their database, they should also update the collection.

To delete a product from the collection, you can do something similar to this:

Code:
<cfindex collection="#collectionName#" action="delete" key="#URL.productID#" type="custom">

To add or edit products in the collection, the script I posted in this thread will do that. However, when you edit a product in the source database, you should set the PRODUCT_INDEXED value to 0. When you add a product, the PRODUCT_INDEXED should also be set to 0.

The "maxWarmingSearchers" error is no longer appearing now that I am repopulating the cfcollection.

After the mass product add takes place, it may take several minutes for the cfcollection doccount to update. I set the time interval between massive adds to 4 minutes. This is a bit slower but the script is executing better. I have about 2 million products in the collection right now and 3.6 million to go.

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
08-18-2010, 11:20 PM (This post was last modified: 08-18-2010 11:49 PM by green-watch.org.)
Post: #30
RE: Database Query Search Time
Hello Again,

To improve performance with SOLR, there is a small tweak that can be made that is described here:
http://help.adobe.com/en_US/ColdFusion/9...-8000.html

This increases the amount of memory that SOLR can use, which leads to better performance if you have a system with a bit of available memory.

One of my next goals is as follows: My custom3 field contains a "PRODUCT_PRIORITY_ID", which indicates how well the product ranks in terms of "greenness". If I can use SOLR to order by the custom3 field, then results should be sorted by greenness.

Edit: It would also be nice if there were a way to specify which columns should be returned in the query when using cfsearch. I find the summary field useless and would rather display my own description. Having it return this column wastes memory that I could be using to retrieve even more results.

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)