This assumes two tables:
search_results is an inverted index with, at minimum, fields for the keyword indexed, its occurrences, and the indexed item's id in the table it is in (e.g.
search_results could be
id = 45 in
documents) in the table it's in.
keyword_counts has a row for each indexed item with the total number of unique words in it and its id in the table the actual item is in.
Released into the public domain. This may have bugs in it, but it's worked so far for me.
- SELECT id, keyword, item_id, item_type, created_by, EXTRACT, extract_start, extract_end, SUM(occurrences/(SELECT keyword_count FROM keyword_counts WHERE item_id = t.item_id AND created_by = [creator user id goes here] AND item_type = t.item_type) * LOG((SELECT COUNT(1) FROM search_results)/(1+(SELECT COUNT(1) FROM search_results WHERE '.[keyword 'OR' statements go here].')))) AS keyword_density_product FROM (SELECT * FROM search_results WHERE ('.[keyword 'OR' statements go here].') AND created_by = [creator user id goes here] AND item_type = [item type, e.g. 'document', goes here]) t GROUP BY item_id ORDER BY keyword_density_product DESC LIMIT [pagination LIMIT goes here] OFFSET [pagination OFFSET goes here];
You need to login to post a comment.