例如,当我使用unicode_string = u"Austro\u002dHungarian_gulden"unicode_string.encode("ascii", "ignore")然后它将给出以下输出:'Austro-Hungarian_gulden'但是我正在使用一个txt文件,其中包含一组数据,如下所示:Austria\u002dHungary Austro\u002dHungarian_guldenCocos_\u0028Keeling\u0029_Islands Australian_dollarEl_Salvador Col\u00f3n_\u0028currency\u0029Faroe_Islands Faroese_kr\u00f3naGeorgia_\u0028country\u0029 Georgian_lari而且,我必须使用Python中的正则表达式来处理这些数据,因此我创建了如下脚本,但是该脚本无法用字符串中的适当字符替换Unicode值。同样地'\u002d' has appropriate character '-''\u0028' has appropriate character '(''\u0029' has appropriate character ')'用于处理文本文件的脚本:import reimport collectionsdef extract(): filename = raw_input("Enter file Name:") in_file = file(filename,"r") out_file = file("Attribute.txt","w+") for line in in_file: values = line.split("\t") if values[1]: str1 = "" for list in values[1]: list = re.sub("[^\Da-z0-9A-Z()]","",list) list = list.replace('_',' ') out_file.write(list) str1 += list out_file.write(" ") if values[2]: str2 = "" for list in values[2]: list = re.sub("[^\Da-z0-9A-Z\n]"," ",list) list = list.replace('"','') list = list.replace('_',' ') out_file.write(list) str2 += list s1 = str1.lstrip() s1 = str1.rstrip() s2 = str2.lstrip() s2 = str2.rstrip() print s1+s2给定数据的预期输出为:Austria-Hungary Austro-Hungarian guldenCocos (Keeling) Islands Australian dollarEl Salvador Coln (currency)FaroeIslands Faroese krnaGeorgia (country) Georgian lari我该怎么做?
1 回答

有只小跳蛙
TA贡献1824条经验 获得超8个赞
使用将输入转换为Unicode decode("unicode_escape"),然后encode()将输出转换为您选择的编码。
>>> r"Austro\u002dHungarian_gulden".decode("unicode_escape")
u'Austro-Hungarian_gulden'
添加回答
举报
0/150
提交
取消