Posted By

noah on 06/27/09


regex test String match perl fuzzy approximate approx matching

Versions (?)

Who likes this?

1 person have marked this snippet as a favorite


Fuzzy string matching with Perl

 / Published in: Perl


Fuzzy string matches with Jarkko Hietaniemi's String::Approx module.

Get approximate matches, close to what you want. This is great for when you have filenames that might contain misspellings, extra underscores or other typos and mistakes. Also great for searching for files when there are several different naming conventions used within a project.

Mainly I am concerned with being able to match strings that have underscores inserted (or deleted) in arbitrary places. But the result I came up with here, does a pretty good job of matching when there are all sorts of typos, without picking up too many false positives.

  1. use String::Approx 'amatch';
  2. use Test::More(no_plan);
  4. sub fuzm {
  6. $_ = shift;
  8. return amatch("homer_simpson", [ # this array sets match options:
  9. "i", # match case-insensitively
  10. "10%", # tolerate up to 1 character in 10 being wrong
  11. "S0", # but no substituting one character for another
  12. "D1", # although, tolerate up to one deletion
  13. "I2" # and tolerate up to two insertions
  14. ]);
  16. }
  19. ok(fuzm("homer_simpson"), "exact match for 'homer_simpson'");
  20. ok(fuzm("homersimpson"), "still matches without the underscore");
  21. ok(fuzm("homers_impson"), "putting the underscore in a different place, still matches");
  22. ok(fuzm("ho_mer_simpson"), "an extra underscore still matches");
  23. ok(fuzm("ho_mer_simp_son"), "2 extra underscores still matches");
  24. ok((not fuzm "ho_mersimp_son"), "2 underscores, both in the wrong places, doesn't match");
  25. ok((not fuzm "ho_mer_sim_ps_on"), "3 extra underscores doesn't match");
  26. ok((not fuzm "homer____simpson"), "3 extra underscores doesn't match");

Report this snippet  

You need to login to post a comment.