首页猿问使用函数参数过滤 CSV 文件

使用函数参数过滤 CSV 文件

Python

慕运维8079593 2023-09-26 14:10:14

所以我正在编写一个函数来根据函数参数过滤 csv 文件，然后在过滤后找到一列的平均值。我只允许使用 import csv （没有 pandas）并且不能使用 lambda 或任何其他 python“高级”快捷方式。我觉得我可以轻松获得平均部分，但我在根据我提到的参数和约束对其进行过滤时遇到了麻烦。我通常会使用 pandas 来解决这个问题，这使得这个过程更容易，但我不能。这是我的代码：def calc_avg(self, specific, filter, logic, threshold): with open(self.load_data, 'r') as avg_file: for row in csv.DictReader(avg_file, delimiter= ','): specific = row[specific] filter = int(row[filter]) logic = logic threshold = 0 if logic == 'lt': filter < threshold elif logic == 'gt': filter > threshold elif logic == 'lte': filter <= threshold elif logic == 'gte': filter >= threshold 它应该与这个命令一起使用print(csv_data.calc_avg("Length_of_stay", filter="SOFA", logic="lt", threshold="15"))这是代码和列标题的格式。样本数据：RecordID SAPS-I SOFA Length_of_stay 132539 6 1 5 132540 16 8 8 132541 21 11 19 132545 17 2 4 132547 14 11 6 132548 14 4 9 132551 19 8 6 132554 11 0 17

查看完整描述

2 回答

狐的传说

TA贡献1804条经验获得超3个赞

更新

此选项计算一次并返回一个可在迭代行时使用的logic函数。compare当数据有很多行时，速度会更快。

# written as a function because you don't share the definition of load_data

# but the main idea can be translated to a class

def calc_avg(self, specific, filter, logic, threshold):

if isinstance(threshold, str):

threshold = float(threshold)

def lt(a, b): return a < b

def gt(a, b): return a > b

def lte(a, b): return a <= b

def gte(a, b): return a >= b

if logic == 'lt': compare = lt

elif logic == 'gt': compare = gt

elif logic == 'lte': compare = lte

elif logic == 'gte': compare = gte

with io.StringIO(self) as avg_file: # change to open an actual file

running_sum = running_count = 0

for row in csv.DictReader(avg_file, delimiter=','):

if compare(int(row[filter]), threshold):

running_sum += int(row[specific])

# or float(row[specific])

running_count += 1

if running_count == 0:

# no even one row passed the filter

return 0

else:

return running_sum / running_count

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '15'))

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '2'))

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '0'))

输出

9.25

11.0

初步答复

为了过滤行，一旦确定应该使用哪种类型的不等式，就必须进行比较。这里的代码将其存储在 boolean 中include。

然后你可以有两个变量：running_sum和running_count稍后应该除以返回平均值。

import io

import csv

# written as a function because you don't share the definition of load_data

# but the main idea can be translated to a class

def calc_avg(self, specific, filter, logic, threshold):

if isinstance(threshold, str):

threshold = float(threshold)

with io.StringIO(self) as avg_file: # change to open an actual file

running_sum = running_count = 0

for row in csv.DictReader(avg_file, delimiter=','):

# your code has: filter = int(row[filter])

value = int(row[filter]) # avoid overwriting parameters

if logic == 'lt' and value < threshold:

include = True

elif logic == 'gt' and value > threshold:

include = True

elif logic == 'lte' and value <= threshold: # should it be 'le'

include = True

elif logic == 'gte' and value >= threshold: # should it be 'ge'

include = True

# or import ast and consider all cases in one line

# if ast.literal_eval(f'{value}{logic}{treshold}'):

# include = True

else:

include = False

if include:

running_sum += int(row[specific])

# or float(row[specific])

running_count += 1

return running_sum / running_count

data = """RecordID,SAPS-I,SOFA,Length_of_stay

132539,6,1,5

132540,16,8,8

132541,21,11,19

132545,17,2,4

132547,14,11,6

132548,14,4,9

132551,19,8,6

132554,11,0,17"""

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '15'))

print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '2'))

输出

9.25

11.0

反对回复 2023-09-26

陪伴而非守候

TA贡献1757条经验获得超8个赞

您没有对比较结果做任何事情。您需要在if报表中使用它们以将特定值包含在平均值计算中。

def calc_avg(self, specific, filter, logic, threshold):

with open(self.load_data, 'r') as avg_file:

values = []

for row in csv.DictReader(avg_file, delimiter= ','):

specific = row[specific]

filter = int(row[filter])

threshold = 0

if logic == 'lt' and filter < threshold:

values.append(specific)

elif logic == 'gt' and filter > threshold:

values.append(specific)

elif logic == 'lte' and filter <= threshold:

values.append(specific)

elif logic == 'gte' and filter >= threshold:

values.append(specific)

if len(values) > 0:

return sum(values) / len(values)

else:

return 0

反对回复 2023-09-26

2 回答
0 关注
306 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

使用函数参数过滤 CSV 文件

使用函数参数过滤 CSV 文件

2 回答

添加回答