首页猿问使用正则表达式从 4 个列表创建多个字典

使用正则表达式从 4 个列表创建多个字典

Python

一只斗牛犬 2023-08-22 16:02:57

我有以下txt文件：197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048168.95.156.240 - stark2413 [21/Jun/2019:15:45:31 -0700] "GET /engage HTTP/2.0" 201 964571.172.239.195 - dooley1853 [21/Jun/2019:15:45:32 -0700] "PUT /cutting-edge HTTP/2.0" 406 24498180.95.121.94 - mohr6893 [21/Jun/2019:15:45:34 -0700] "PATCH /extensible/reinvent HTTP/1.1" 201 27330我想创建一个函数，将它们转换为多个字典，其中每一行都是一个字典：example_dict = {"host":"146.204.224.152", "user_name":"feest6811", "time":"21/Jun/2019:15:45:24 -0700", "request":"POST /incentivize HTTP/1.1"}到目前为止，我能够做到这一点，为所有项目创建 4 个列表，但我不知道如何为每行创建多个 dic：import redef logs(): with open("assets/logdata.txt", "r") as file: logdata = file.read() host = (re.findall('(.*?)\-',logdata)) username = re.findall('\-(.*?)\[',logdata) time = re.findall('\[(.*?)\]', logdata) request = re.findall('\"(.*?)\"',logdata) #for line in range(len(logdata)): #dc = {'host':host[line], 'user_name':user_name[line], 'time':time[line], 'request':request[line]}

查看完整描述

5 回答

慕斯709654

TA贡献1840条经验获得超5个赞

以下代码片段将生成一个字典列表，日志文件中的每一行都有一个字典。

import re

def parse_log(log_file):

regex = re.compile(r'^([0-9\.]+) - (.*) \[(.*)\] (".*")')

def _extract_field(match_object, tag, index, result):

if match_object[index]:

result[tag] = match_object[index]

result = []

with open(log_file) as fh:

for line in fh:

match = re.search(regex, line)

if match:

fields = {}

_extract_field(match, 'host' , 1, fields)

_extract_field(match, 'user_name', 2, fields)

_extract_field(match, 'time' , 3, fields)

_extract_field(match, 'request' , 4, fields)

result.append(fields)

return result

def main():

result = parse_log('log.txt')

for line in result:

print(line)

if __name__ == '__main__':

main()

反对回复 2023-08-22

料青山看我应如是

TA贡献1772条经验获得超8个赞

我现在正在做这门课程，我得到的答案是

import re

def logs():

with open("assets/logdata.txt", "r") as file:

logdata = file.read()

# YOUR CODE HERE

pattern='''

(?P<host>[\w.]*)

(\ -\ )

(?P<user_name>([a-z\-]*[\d]*))

(\ \[)

(?P<time>\w.*?)

(\]\ \")

(?P<request>\w.*)

(\")

'''

lst=[]

for item in re.finditer(pattern,logdata,re.VERBOSE):

lst.append(item.groupdict())

print(lst)

return lst

反对回复 2023-08-22

跃然一笑

TA贡献1826条经验获得超6个赞

使用str.split()andstr.index()也可以工作，忽略正则表达式的需要。此外，您可以直接迭代文件处理程序，这会逐行生成一行，因此您不必将整个文件加载到内存中：

result = []

with open('logdata.txt') as f:

for line in f:

# Isolate host and user_name, discarding the dash in between

host, _, user_name, remaining = line.split(maxsplit=3)

# Find the end of the datetime and isolate it

end_bracket = remaining.index(']')

time_ = remaining[1:end_bracket]

# Slice out the time from the request and strip the ending newline

request = remaining[end_bracket + 1:].strip()

# Create the dictionary

result.append({

'host': host,

'user_name': user_name,

'time': time_,

'request': request

})

from pprint import pprint

pprint(result)

反对回复 2023-08-22

莫回无

TA贡献1865条经验获得超7个赞

一旦您解决了您遇到的正则表达式问题 - 下面的代码将适合您

import re

result = []

with open('data.txt') as f:

lines = [l.strip() for l in f.readlines()]

for logdata in lines:

host = (re.findall('(.*?)\-',logdata))

username = re.findall('\-(.*?)\[',logdata)

_time = re.findall('\[(.*?)\]', logdata)

request = re.findall('\"(.*?)\"',logdata)

result.append({'host':host,'user_name':username,'time':_time,

'request':request})

print(result)

反对回复 2023-08-22

FFIVE

TA贡献1797条经验获得超6个赞

assets/logdata.txt下面的函数返回一个字典列表，其中包含根据您的原始问题每行匹配的所需键/值。

值得注意的是，应在此基础上实施适当的错误处理，因为存在明显的边缘情况可能会导致代码执行意外停止。

请注意您的模式的变化host，这很重要。示例中使用的原始模式不仅仅匹配host每行的部分，在模式开头添加锚点会re.MULTILINE停止匹配误报，这些误报将与原始示例中的每行的其余部分匹配。

import re

def logs():

with open("assets/logdata.txt", "r") as file:

logdata = file.read()

host = (re.findall('^(.*?)\-',logdata, re.MULTILINE))

username = re.findall('\-(.*?)\[',logdata)

time = re.findall('\[(.*?)\]', logdata)

request = re.findall('\"(.*?)\"',logdata)

return [{ "host": host[i].strip(), "username": username[i], "time": time[i], "request": request[i] } for i,h in enumerate(host)]

以上是基于您原始帖子的简单/最小解决方案。有很多更干净、更有效的方法可以解决这个问题，但是我认为从您现有的代码开始工作，让您了解如何纠正它是相关的，而不仅仅是为您提供一个更好的优化解决方案，相对而言，对你来说意义不大。

反对回复 2023-08-22

5 回答
0 关注
1661 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

使用正则表达式从 4 个列表创建多个字典

使用正则表达式从 4 个列表创建多个字典

5 回答

添加回答