Current time: 12-13-2017, 12:11 AM Hello There, Guest! (LoginRegister)

Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Database Query Search Time
07-04-2010, 11:31 AM
Post: #1
Database Query Search Time
Hey There,

Let's say we have a database table with over 50 million entries in it.

We want to return the top 1000 results for some query with this table.

SELECT TOP 1000 attempts to return the top 1000 results for whatever query parameters are set against the table.

The query may execute fast for common entries such as "diamonds", "shoes", etc. However, for uncommon entries it may have to traverse the entire table looking for results. When this occurs, the user is setting there waiting for results that may return much later or in a worse case, perhaps there are no results.

To keep the user from waiting forever, I tried passing a "timeout" parameter to the query. When the timeout is exceeded, an exception is thrown and no results are returned.

Is there anyway to setup a query timeout and instead of an exception being thrown, return the results if there are any for that execution timeout duration?

I am going to have a huge product directory and I need to make my searches more efficient and user friendly. Thanks for any advice.

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
07-04-2010, 12:03 PM
Post: #2
RE: Database Query Search Time
Have you considered using a different technology for the index? Not sure how structured your data is but SOLR (http://lucene.apache.org/solr/) might be an option if you're doing free-text searches.
Visit this user's website Find all posts by this user
Quote this message in a reply
07-04-2010, 12:20 PM (This post was last modified: 07-04-2010 12:30 PM by jarrod1937.)
Post: #3
RE: Database Query Search Time
Instead of trying to deal with the symptoms, i'd work on the causes of them. But before anyone can help you, we'll need a bit more info. What dbms are you using? Whats your table structure like (all info if possible)? What is the exact query you're running to get the results?
I've seen some programmers try to create these crazy complex systems to attempt to deal with such slow queries (including a timeout period that kills the query) and usually the slow down is caused by an inefficient/incomplete setup on the database end. I should clarify that this is "usually" the case, at 50 million records you may have to start looking at advanced database management methods like horizontal partitioning/database sharding (though i'd need to see your database structure to see if this is applicable).
Find all posts by this user
Quote this message in a reply
07-04-2010, 01:10 PM
Post: #4
RE: Database Query Search Time
Hey Guys,

I am using MS SQL Server 2005 as my DBMS.

You can see the structure for this table here:
http://www.green-watch.org/temp/ProductT...ucture.png

I created a view to run queries against. That structure is here:
http://www.green-watch.org/temp/ProductTableView.png

On the PRODUCT table, I only have the PRODUCT_ID as the primary index.

On the PRODUCT_GREENEST_VIEW table, I have the following indexes:
PRODUCT_PRIORITY_ID (clustered)
BUSINESS_ID (non-unique, non-clustered)
PRODUCT_ID (unique, non-clustered)

As far as the query goes, here is part of the code:

Code:
<cfif #timeoutOccurred# EQ 0>

  <cftry>

    <cfset innerSelectRows = #currentPage# * 10>

    <cfif #currentPage# EQ #maxPage#>
      <cfset selectRows = #MAX_PRODUCT# - (#innerSelectRows#-10)>
    <cfelse>
      <cfset selectRows = 10>
    </cfif>
        
    <cfquery datasource="#dsnName#" name="PRODUCT_LIST" timeout="#searchTimeout#">
    SELECT * FROM (
        SELECT TOP #selectRows# * FROM (
            SELECT DISTINCT TOP #innerSelectRows# #URL.TABLE_NAME#.PRODUCT_ID, #URL.TABLE_NAME#.PRODUCT_PRICE, #URL.TABLE_NAME#.PRODUCT_PRIORITY_ID
            FROM #URL.TABLE_NAME#
            
            LEFT JOIN ASSIGN_SUBCAT_PRODUCT ON ASSIGN_SUBCAT_PRODUCT.PRODUCT_ID = #URL.TABLE_NAME#.PRODUCT_ID
            LEFT JOIN PRODUCT_SUBCATEGORY ON PRODUCT_SUBCATEGORY.PRODUCT_SUBCATEGORY_ID = ASSIGN_SUBCAT_PRODUCT.PRODUCT_SUBCATEGORY_ID
            LEFT JOIN PRODUCT_CATEGORY ON PRODUCT_SUBCATEGORY.PRODUCT_CATEGORY_ID = PRODUCT_CATEGORY.PRODUCT_CATEGORY_ID        
            
            WHERE #URL.TABLE_NAME#.PRODUCT_VALIDATED = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="1">
    
            <cfif #URL.businessID# NEQ -1>
            AND #URL.TABLE_NAME#.BUSINESS_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.businessID#">
            </cfif>
    
            <cfif #URL.MIN_PRICE# NEQ -1>
            AND #URL.TABLE_NAME#.PRODUCT_PRICE >= <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.MIN_PRICE#">
            </cfif>
            
            <cfif #URL.MAX_PRICE# NEQ -1>
            AND #URL.TABLE_NAME#.PRODUCT_PRICE <= <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.MAX_PRICE#">
            </cfif>

            <cfif #URL.SEARCH_FOR# NEQ "">
            AND (CONTAINS(#URL.TABLE_NAME#.PRODUCT_NAME, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_DESCRIPTION, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_KEYWORD1, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_KEYWORD2, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_KEYWORD3, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_KEYWORD4, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">))
            </cfif>
    
            <cfif #URL.CAT_ID# NEQ -1>
            AND PRODUCT_CATEGORY.PRODUCT_CATEGORY_ID IN (<cfqueryparam value="#URL.CAT_ID#" list="true" cfsqltype="CF_SQL_NUMERIC">)
            </cfif>
        
            <cfif #URL.SUBCAT_ID# NEQ -1>
            AND PRODUCT_SUBCATEGORY.PRODUCT_SUBCATEGORY_ID IN (<cfqueryparam value="#URL.SUBCAT_ID#" list="true" cfsqltype="CF_SQL_NUMERIC">)
            </cfif>
            
            AND (#URL.TABLE_NAME#.PRODUCT_PHOTO IS NOT NULL OR #URL.TABLE_NAME#.PRODUCT_GOOGLE_IMAGE_URL IS NOT NULL)
        
            ORDER BY #URL.ORDER_BY2#) AS FOO
        ORDER BY #URL.ORDER_BY3#) AS BAR
    ORDER BY #URL.ORDER_BY2#
    </cfquery>
    
    <cfcatch>

      <cfset maxPage = 0>
      <cfset URL.END_ROW = 0>
      <cfset timeoutOccurred = 1>
    
    </cfcatch>
    
  </cftry>
  
</cfif>

There is code before this that gets the max rows, sets the correct variables, checks input against sql injection, etc.

I am using a full-text index for the SEARCH_FOR field.

I am not familiar with SOLR. I am on a Windows Server currently.

Sincerely,
Travis Walters
Find all posts by this user
Quote this message in a reply
07-05-2010, 04:35 AM
Post: #5
RE: Database Query Search Time
SOLR is an XML REST interface into the lucene full-text indexer. It's all on JAVA and should work fine on Windows. While you can use the full-text capabilities of a RDBMS I've generally found it a LOT faster to actually use a product designed for full-text indexing and searching.

We have an internal tool that runs on a postgres back-end (a pretty big one) that used to do it's own full-text searching as well and it had a lot of the kinds of issues you're talking about (in addition to just being generally slow even when it was successful). We moved the full-text searching to SOLR/Lucene and things got a LOT better.

Thanks,

-Pat
Visit this user's website Find all posts by this user
Quote this message in a reply
07-05-2010, 05:03 AM
Post: #6
RE: Database Query Search Time
+1 for SOLR/Lucene

Usually I cant stand Java stuff.... but i make an exception for SOLR and Lucene. the "Related stories" feature on my site is powered by SOLR using Lucene's MoreLikeThis functionality.

cdnplanet.com: We help you select the right CDN
Visit this user's website Find all posts by this user
Quote this message in a reply
07-05-2010, 08:02 AM (This post was last modified: 07-05-2010 08:18 AM by green-watch.org.)
Post: #7
RE: Database Query Search Time
Thanks once again for the advice Patrick.

Selling products should be one of my most profitable features on my website so I want to make sure everything is super fast.

I just wish everything was done. It feels like I have been working on my website forever Smile

I will look into the SOLR tomorrow. Think I am going to go enjoy the 4th. Have fun everyone!
I did a quick search on SOLR with Coldfusion.

http://blog.crankybit.com/notes-max09-wh...dfusion-9/

"Search engine. ColdFusion 9 now includes Apache Solr. Verity is still included, but so is Solr, which is actually even faster than Verity. So you may choose to upgrade to Solr! They even provide a Verity-to-Solr migration utility."

I have never tried Verity either.

I know my server has coldfusion 8 installed on it so I guess I will have to figure out when our hosting provider plans to upgrade.
Find all posts by this user
Quote this message in a reply
07-07-2010, 06:10 PM
Post: #8
RE: Database Query Search Time
I may have found a way to get SOLR working with Coldfusion versions 7 and 8.

http://solcoldfusion.riaforge.org/
http://svn.riaforge.org/solcoldfusion/tr...fusion.cfc

I have not looked at their library yet but since it interacts with the database, that is probably a good idea.

I am still hoping the hosting provider can upgrade us to coldfusion 9 because of the built-in interaction with SOLR that I have read about.

Also looks like coldfusion 9 has better cache control as well.
Find all posts by this user
Quote this message in a reply
07-08-2010, 11:16 AM
Post: #9
RE: Database Query Search Time
Looks like the hosting provider can upgrade us to coldfusion 9.

Can I still use MS SQL Server to store my data or does it all have to be transferred to SOLR some how?

Probably an easy question for some one that has used it before Smile
Find all posts by this user
Quote this message in a reply
07-08-2010, 08:47 PM
Post: #10
RE: Database Query Search Time
Not sure about the coldfusion integration but in general you would keep the data in your existing database and SOLR would just be a full-text index for searching against.
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)