2 回答
TA贡献1995条经验 获得超2个赞
问题是genfromtxt省略长度不等于 24 的行。
可能的解决方案是使用np.arange(16):
logdata = np.genfromtxt("trendx.log",
invalid_raise = False,
dtype=str,
comments=None,
usecols=np.arange(16))
另外,如果要检查匹配的值:
#np.arange(24)
print(df1[df1['SHA-1'].isin(df2['SHA1'])])
SHA-1 VSDT Unnamed: 2
19 0bed7d032d5c51f606befd2f10b94e5c75a6a1e3 WIN32 EXE 7-2 NaN
32 10d6ea590e7e31a396c0fd96cb7413c354ab4b97 WIN32 EXE 7-2 NaN
217 6010a6400d72298fb8e61bff67638da23efd0c81 MSIL 7-18 NaN
231 6614e5097a777cb2192d856c7aa99c73f9104c8a MSIL 7-18 NaN
296 84a3a384c6d61678d6e335559948cb0e2a32de0b WIN32 EXE 7-2 NaN
300 85f3b0710776b897208e88460228eab0f2b6df6a WIN32 EXE 7-2 NaN
340 94571c6299a8bb7a18e374665ff71bcdf7277fc6 MSIL 7-18 NaN
345 96e1e3d135d037696262b20b227b82f6cd3dce44 WIN32 EXE 7-2 NaN
388 acdfacefb1b97d97b896c7af6c47d87f811d7fd9 MSIL 7-18 NaN
408 b61c6e35810f9d506f17874bc1750cd90a57a434 WIN32 EXE 7-2 NaN
503 df88efb7ab874bc024c20c06c0daf8cf34a95897 MSIL 7-18 NaN
511 e1179af687feaeb5b9525df4fbb061d0f424746f MSIL 7-18 NaN
576 fcb12edabdb2e59916f2f84f204c3e8ec13d1135 WIN32 EXE 7-2 NaN
#np.arange(16)
print(df1[df1['SHA-1'].isin(df2['SHA1'])])
SHA-1 VSDT Unnamed: 2
13 06a60c6018a42b1db22e3bf8620861711401c4bb WIN32 EXE 7-2 NaN
14 0723a895a5f8b2d5d25b4303e9f04d16551791b6 MSIL 7-18 NaN
19 0bed7d032d5c51f606befd2f10b94e5c75a6a1e3 WIN32 EXE 7-2 NaN
26 0e13d281af08954102e7caf95864ef553c7277bd Win32 DLL 7-5 NaN
32 10d6ea590e7e31a396c0fd96cb7413c354ab4b97 WIN32 EXE 7-2 NaN
33 113d53cc041fbd25b1004f68493ff1b0d0cd6c1f WIN32 EXE 7-2 NaN
34 1217b71e04c81f4c50f053793dbe60d91d39668f MSIL 7-18 NaN
36 134024d595bf9d724213f4303885f4d1e43b7a44 WIN32 EXE 7-2 NaN
37 13a508933a46ca80529145e8470a2147739d0334 WIN32 EXE 7-2 NaN
42 154985ac3d041303e3b5043e2d96e762c6a3ddd1 WIN32 EXE 7-2 NaN
57 1c453871229e8eddd7a965ec140279bb4a618b48 WIN32 EXE 7-2 NaN
61 1df0cfdee270ea0215b3a6a3e9aa2ad8bd820749 WIN32 EXE 7-2 NaN
68 23ef5c7c3384fcff3e9c3f2c647bebce5d1d7558 WIN32 EXE 7-2 NaN
93 2f7e7d2a9a44b03d9525569168bfbb604317be0e WIN32 EXE 7-2 NaN
106 327891c858ee81955c1945a2787782e958b94ab7 WIN32 EXE 7-2 NaN
111 35be3823638cfb04fbc2f6854faab4bbf1d8a627 WIN32 EXE 7-2 NaN
114 36b13a68ae6c896c68c51ebb89ffd3c484c00457 WIN32 EXE 7-2 NaN
132 3d133c7d15649d607817df5081d85f4397757c67 WIN32 EXE 7-2 NaN
135 3d7aba9ca74e368158b996057a041189b948c9fe WIN32 EXE 7-2 NaN
142 40a18adc9fdbff2b95997f0175307b76657b037c WIN32 EXE 7-2 NaN
157 48bde6c540065d04e19f22d2db8f75aca5d3d375 WIN32 EXE 7-2 NaN
158 48e0dcf8325867063619a28f837704ba8d4ce1cb WIN32 EXE 7-2 NaN
162 4b8a159a69c5ea451d62f9a480e849984687fbf7 WIN32 EXE 7-2 NaN
167 4cb7867c4edaded299199258a7d6062c1c0def89 WIN32 EXE 7-2 NaN
175 501947c29ebbad093881c92ff0c5e4cdce6de64d WIN32 EXE 7-2 NaN
180 50c8f15c8e94d60f370403a09796f9e44e90b888 WIN32 EXE 7-2 NaN
182 5141321fe113df78d41ec282e54cb49c2cc5125d WIN32 EXE 7-2 NaN
194 56ef50c4b83c17e03400d129de99869d8ab18c94 WIN32 EXE 7-2 NaN
196 57d4e8300d405655f37ae98667b76c94fc6c400c WIN32 EXE 7-2 NaN
203 5a339b555ea6c3f7ebe5d8d11890a6d0e738a734 UPX EXE 7-17 NaN
.. ... ... ...
421 be2adbdea170d0fb7012841d48aab27250a933d2 WIN32 EXE 7-2 NaN
424 bee081ba9c5eae456acfb285cd6a0ae0e289f174 WIN32 EXE 7-2 NaN
438 c4eb16a4dc44b2f2525a6296d234fc272b23454f WIN32 EXE 7-2 NaN
448 caf937c3c486236c6ec35fdf5bd8dc849ceb02b9 WIN32 EXE 7-2 NaN
453 cc53cdd86d97afbaf321d228b18d7a0ce4e8f9d1 MSIL 7-18 NaN
460 d01a707b473d2599084807e496331c5d78a394f4 WIN32 EXE 7-2 NaN
463 d131e81b35b0514fb66776e84c5f39bf0e637919 WIN32 EXE 7-2 NaN
469 d352365f415f41dced3a6dd4aa4d2c6014c70ed3 WIN32 EXE 7-2 NaN
472 d3e0e1116aa97b51d5cadee2ea50f172c603fa50 WIN32 EXE 7-2 NaN
480 d54caaf59f1294b88f7d5ceb8ae2c0784be2e272 WIN32 EXE 7-2 NaN
483 d6c9b7b47b3576017afbb974ed6b2b5d54787de5 WIN32 EXE 7-2 NaN
487 d913bed0de10c0168bc8ab733f9b5fd20bbd5472 WIN32 EXE 7-2 NaN
489 daed0b94fd0892063f8d4a91dde5e7496eed4e83 WIN32 EXE 7-2 NaN
496 de3261f839ab02e0ee128faffddd3f45e79527dd WIN32 EXE 7-2 NaN
499 defd56ebf430ac144243e7c8d36d20ea3de10bc4 WIN32 EXE 7-2 NaN
500 df44071358587c90d712b0de78bbca146e3ae223 WIN32 EXE 7-2 NaN
501 df61222fe125e56b02a2cfc797f00ce63904d8df WIN32 EXE 7-2 NaN
502 df69d622e59945e7baf124b2faf205f00769b978 WIN32 EXE 7-2 NaN
503 df88efb7ab874bc024c20c06c0daf8cf34a95897 MSIL 7-18 NaN
505 e043b9d5410458342ff7a911de699cc0aa453610 WIN32 EXE 7-2 NaN
508 e0ee714a5bd67fc6cc68f8419ae336db44fc8a8e WIN32 EXE 7-2 NaN
511 e1179af687feaeb5b9525df4fbb061d0f424746f MSIL 7-18 NaN
527 e7e4a72fb5924051a41155044f03f55aaa304266 WIN32 EXE 7-2 NaN
529 e8bc0782cec91da0044eb275db69f79542c336c1 WIN32 EXE 7-2 NaN
542 ec554c9d8c10c1dddc1a38418c627c344991f640 WIN32 EXE 7-2 NaN
544 eca602bca855cac979a99b44d3ae033daa43bc39 WIN32 EXE 7-2 NaN
547 ed66e83ae790873fd92fef146a2b70e5597792ee WIN32 EXE 7-2 NaN
548 ed6c6a9e55e501520b476087cb5eeaf820b89194 MSIL 7-18 NaN
576 fcb12edabdb2e59916f2f84f204c3e8ec13d1135 WIN32 EXE 7-2 NaN
578 fced05723f49b6d0836e065a436e8c3b8df2bc12 WIN32 EXE 7-2 NaN
[97 rows x 3 columns]
所有代码:
#Log data into dataframe using genfromtxt
logdata = np.genfromtxt("trendx.log",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(16))
logframe = pd.DataFrame(logdata)
#print (logframe.head())
#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,14,15]]).rename(columns={10:'SHA-1', 14: 'PRG',15:'IP'})
#print (df2.head())
#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=",",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV
df = pd.merge(df1, df2, on='SHA-1', how='left').fillna('undetected')
print(df[['SHA-1','VSDT','PRG','IP']])
SHA-1 \
0 0191a23ee122bdb0c69008971e365ec530bf03f5
1 02b809d4edee752d9286677ea30e8a76114aa324
2 0349e0101d8458b6d05860fbee2b4a6d7fa2038d
3 035a7afca8b72cf1c05f6062814836ee31091559
4 042065bec5a655f3daec1442addf5acb8f1aa824
5 04939e040d9e85f84d2e2eb28343d94a50ed46ac
6 04a1876724b53a016cd9e9c93735985938c91fa4
7 06109df23f7d5deadf0b2c158af1f71c2997d245
8 06194c240c12c51b55d2961ae287fd9628e05751
9 0665de1ad83715cc6e68d00ed700c469944a5925
10 067b448f4c9782489e5ff60c31c62b7059e500b2
11 0688e6966b0e4a1f58d2f3de48f960fce5b42292
12 0689f6f99d10dd8bf396f2d2c73ce9dcb6dcad23
13 06a60c6018a42b1db22e3bf8620861711401c4bb
14 0723a895a5f8b2d5d25b4303e9f04d16551791b6
15 07344621cf4480c430f8931af2b2b056775af7e3
16 07831df482f1a34310fc4f5a092c333eeaff4380
17 08386105057cd5867480095696a5ca6701fdb8ad
18 0ad5f62b4ec10397b7d13433a8dc794dc6d4f273
19 0bed7d032d5c51f606befd2f10b94e5c75a6a1e3
20 0c3f8d2cce9e7a6e5604b8d0c9fbe1ff6fd5cebb
21 0c793b4f4e0be7f24f93786d7d4a719a7a002a0d
22 0c7c2b2d05a5c712f4b9302b82fb54007210937f
23 0d03da55b246252fb5b440a23943426bda965bcd
24 0d592f948a4f7bfa95c7cb09faf067ce9fbc9375
25 0df65d8a57c8349e044f98deda17d70d0c4f926a
26 0e13d281af08954102e7caf95864ef553c7277bd
27 0ede12d9c17564e803f51de4d279e84623c5a8a6
28 0fc4f3a30684bb17cbcbf4e3def2ac3528a2f04c
29 0fcb475fcadd8d8e3b8dd5f4376feda48c73fd24
.. ...
553 ef90b17c18c3c5960726964cff12b6d6ef22f3f4
554 effbed4e7e619009def1c4322f68092eb9cc197f
555 f081c8a737f87167fef83d03405c1fbe55a46986
556 f1304ad198045ebb93e70252f0dda9d68acd83f1
557 f14762b5ce92f2713c584140d694ce25f7beb9c2
558 f187959d6afa483d18c69b9e334575781009cd31
559 f1ae32a92f89f54e542973a98eb3dcbe05fe9c58
560 f28217b5928e4d2fbbc5ca45bd815b1c3963bed2
561 f36687584c4bc38f2aed5511930b50eea378c1bf
562 f4846b38f52805ffa2d0ae392df05bbeb8fee2b5
563 f4b8b762feb426de46a0d19b86f31173e0e77c2e
564 f4d0cc44a8018c807b9b1865ce2dd70f027d2ceb
565 f4fcbbdf8c797c96dd1a3e76baf666c319f52aa8
566 f6c9b393b5148e45138f724cebf5b1e2fd8d9bc7
567 f8910d7869be647d2ec6c49ddf6fef49ed0f09d0
568 f90c38a3d623ea47b129b386d841614d9a290f0a
569 f99c069d5ababc7001aa46a494a0400a913a109c
570 f9d2c6e2438fc4571f7ea4f639b2950ddd1307e5
571 fa2229ef95b9e45e881ac27004c2a90f6c6e0947
572 fac66887402b4ac4a39696f3f8830a6ec34585be
573 fb2086d390c1755b53580013c727398d9fb5c01b
574 fb59aa51fec66f8caf409b1ca2b80e7fdaf33c61
575 fc -...
576 fcb12edabdb2e59916f2f84f204c3e8ec13d1135
577 fcbbfeb67cd2902de545fb159b0eed7343aeb502
578 fced05723f49b6d0836e065a436e8c3b8df2bc12
579 fd1cada68f4a9452275d292fe4b9f76a4bd8bd8b
580 fe5babc1e4f11e205457f2ec616f117fd4f4e326
581 fe8c341de79168a1254154f4e4403857c6e79c46
582 fe91021461e48fe82449d2ad73bcc66f6c508152
VSDT PRG IP
0 MIME 6010-0 undetected undetected
1 Microsoft RTF 6008-0 undetected undetected
2 Adobe Portable Document Format(PDF) 6015-0 undetected undetected
3 Adobe Portable Document Format(PDF) 6015-0 undetected undetected
4 Microsoft RTF 6008-0 undetected undetected
5 MS Office 1-0 undetected undetected
6 MS Office 1-0 undetected undetected
7 MS Office 1-0 undetected undetected
8 MS Office 2007 Excel 4045-2 undetected undetected
9 WIN32 EXE 7-2 undetected undetected
10 Adobe Portable Document Format(PDF) 6015-0 undetected undetected
11 MS Office 1-0 undetected undetected
12 ASCII text 18-0 undetected undetected
13 WIN32 EXE 7-2 172.20.4.179 Administrator
14 MSIL 7-18 172.20.4.179 Administrator
15 MIME 6010-0 undetected undetected
16 Microsoft RTF 6008-0 undetected undetected
17 ASCII text 18-0 undetected undetected
18 Java Archive (JAR) 4049-0 undetected undetected
19 WIN32 EXE 7-2 172.20.4.179 Administrator
20 ASCII text 18-0 undetected undetected
21 MS Office 1-0 undetected undetected
22 MS Office 1-0 undetected undetected
23 MIME 6010-0 undetected undetected
24 WIN32 EXE 7-2 undetected undetected
25 MS Office 1-0 undetected undetected
26 Win32 DLL 7-5 172.20.4.179 Administrator
27 MS Office 1-0 undetected undetected
28 WIN32 EXE 7-2 undetected undetected
29 MIME 6010-0 undetected undetected
.. ... ... ...
553 MIME 6010-0 undetected undetected
554 MS Office 1-0 undetected undetected
555 ASCII text 18-0 undetected undetected
556 Text (general) 28-4 undetected undetected
557 ASCII text 18-0 undetected undetected
558 MS Office 2007 Excel 4045-2 undetected undetected
559 Microsoft RTF 6008-0 undetected undetected
560 MIME 6010-0 undetected undetected
561 WIN32 EXE 7-2 undetected undetected
562 ASCII text 18-0 undetected undetected
563 MS Office 1-0 undetected undetected
564 MS Office 1-0 undetected undetected
565 ASCII text 18-0 undetected undetected
566 MS Office 1-0 undetected undetected
567 ASCII text 18-0 undetected undetected
568 User-Defined 117--1 undetected undetected
569 MS Office 1-0 undetected undetected
570 ASCII text 18-0 undetected undetected
571 MS Office 1-0 undetected undetected
572 MS Office 1-0 undetected undetected
573 WIN32 EXE 7-2 undetected undetected
574 WIN32 EXE 7-2 undetected undetected
575 MS Office 1-0 undetected undetected
576 WIN32 EXE 7-2 172.20.4.179 Administrator
577 MS Office 1-0 undetected undetected
578 WIN32 EXE 7-2 172.20.4.179 Administrator
579 RAR 25-0 undetected undetected
580 MS Office 1-0 undetected undetected
581 Java Archive (JAR) 4049-0 undetected undetected
582 MS Office 1-0 undetected undetected
[583 rows x 4 columns]
TA贡献2016条经验 获得超9个赞
我的伪代码是这样的:
将所有 SHA-1 列数据从 CSV 文件获取到sha1
Python 列表。
对于列表中的每个 SHA-1 列值,sha1
比较日志文件的完整内容 ( .readlines()
),如果通过遍历每一行/行 (例如row.split()
) 与所有sha1
值找到并跟踪col
, row
。
如果“单元格内容”等于或re.search()
匹配,则.append([sha1match,row,column])
结果为 lisy。
打印/保存结果列表到文件。
添加回答
举报