我有以下 2 个字符串,这实际上意味着相同:GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO和Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent我有 1000 多套这样的套路,手动操作会很恐慌,我尝试按以下方式操作:package mainimport ( "fmt" "strings")func main() { var str1 = "Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent" var str2 = "GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO" cnt := 0 for _, i := range strings.Fields(str1) { for _, j := range strings.Fields(str2) { if strings.ToLower(i) == strings.ToLower(j) { cnt += 1 } } } fmt.Printf("str1 is: %d length, and str2 is: %d length, they have; %d common words.", len(str1), len(str2), cnt)}但是匹配度很低,我得到了:str1 is: 197 length, and str2 is: 358 length, they have; 29 common words.但是它们之间的距离看起来很长,我得到了:Distance between str1 and str2: 304知道如何改进吗?
1 回答
一只萌萌小番薯
TA贡献1795条经验 获得超7个赞
他们可能描述了同样的事情,但是您在不了解这一点的情况下使用算法来比较它们。
例如,Levenshtein 距离只是衡量一个字符串等于另一个字符串所需的插入、删除和替换次数。它在“The quick brown fox jumped over the lazy gray dog”和“Dlkj adlkjll o824hs aldkj ladhfj adlbcvhiuywe”上的效果一样好。它不了解词汇或语法。
相比之下,再多的字符串处理都不会认识到“站在我面前的鲜红色的房子”与“在我面前是一座闪亮的玫瑰色住宅”描述的是同一件事。
您需要寻找自然语言处理算法或 NLP。这些使用起来并不简单,需要一些技巧。我不是 NLP 专家,我建议从搜索golang nlp开始,然后从那里开始。
- 1 回答
- 0 关注
- 130 浏览
添加回答
举报
0/150
提交
取消