Posted By

jatkins on 04/04/11


Tagged

database sql search ranking retrieval tf-idf relevance


Versions (?)

Very simple search retrieval query, ranking with tf-idf weight


 / Published in: MySQL
 

This assumes two tables: search_results and keyword_counts. search_results is an inverted index with, at minimum, fields for the keyword indexed, its occurrences, and the indexed item's id in the table it is in (e.g. item_id in search_results could be 45, referencing id = 45 in documents) in the table it's in. keyword_counts has a row for each indexed item with the total number of unique words in it and its id in the table the actual item is in.

Released into the public domain. This may have bugs in it, but it's worked so far for me.

  1. SELECT id, keyword, item_id, item_type, created_by, EXTRACT, extract_start, extract_end, SUM(occurrences/(SELECT keyword_count FROM keyword_counts WHERE item_id = t.item_id AND created_by = [creator user id goes here] AND item_type = t.item_type) * LOG((SELECT COUNT(1) FROM search_results)/(1+(SELECT COUNT(1) FROM search_results WHERE '.[keyword 'OR' statements go here].')))) AS keyword_density_product FROM (SELECT * FROM search_results WHERE ('.[keyword 'OR' statements go here].') AND created_by = [creator user id goes here] AND item_type = [item type, e.g. 'document', goes here]) t GROUP BY item_id ORDER BY keyword_density_product DESC LIMIT [pagination LIMIT goes here] OFFSET [pagination OFFSET goes here];

Report this snippet  

You need to login to post a comment.