迭代行以确定特定单词的计数

我在迭代 pandas 数据框中的行时遇到问题。我需要为每一行（包含字符串）确定以下内容：字符串中每个标点符号的计数；大写字母的数量。为了回答第一点，我对字符串进行了如下尝试，以查看该方法是否也适用于数据框：from nltk.corpus import stopwords from nltk.tokenize import word_tokenize t= "Have a non-programming question?"t_low = search.lower() stop_words = set(stopwords.words('english')) word_tokens = word_tokenize(t_low) m = [w for w in word_tokens if not w in stop_words] m = [] for w in word_tokens: if w not in stop_words: m.append(w) 然后，在标记化后对它们进行计数：import stringfrom collections import Counterc = Counter(word_tokens) for x in string.punctuation: print(p , c[x]) 对于第二点，我将以下内容应用于该句子： sum(1 for c in t if c.isupper()))然而，这种情况只能应用于字符串。因为我有一个如下所示的 pandas 数据框：Text"Have a non-programming question?"More helpful LINK!Show SOME CODE... and so on...我想知道如何应用上述代码才能获得相同的信息。任何帮助都会很棒。谢谢

查看完整描述

1 回答

米琪卡哇伊

TA贡献1998条经验获得超6个赞

您可以在 DF 上使用 lambda 函数来执行此操作：

import string

def Capitals(strng):

return sum(1 for c in strng if c.isupper())

def Punctuation(strng):

return sum([1 for c in strng if c in string.punctuation])

df['Caps'] = df['name'].apply(lambda x:Capitals(x))

df['Punc'] = df['name'].apply(lambda x:Punctuation(x))

Caps 是一个包含大写字母数量的新列。Punc 是一个包含标点符号数量的新列。名称是测试的字符串。

反对回复 2023-07-18

热搜

最近搜索清空

迭代行以确定特定单词的计数

迭代行以确定特定单词的计数

1 回答

添加回答