首页猿问 Pandas Dataframe...

Pandas Dataframe - 无循环的组件和子组件编号系统

Python

aluckdog 2024-01-27 15:28:04

import papermill as pm# ...# define DAG, etc.# ...def copy_data_from_s3(**context): pm.execute_notebook( "copy_data_from_s3_step.ipynb", "copy_data_from_s3_step.ipynb" parameters=dict(date=context['execution_date'])) # pass some context parameter if you need to )最后，设置该步骤，也许作为 a （尽管如果您想从命令行运行 PapermillPythonOperator也可以使用 a ）。要匹配上面的函数：BashOperatorcopy_data = PythonOperator(dag=dag, task_id='copy_data_task', provide_context=True, python_callable=copy_data_from_s3)

查看完整描述

1 回答

慕工程0101907

TA贡献1887条经验获得超5个赞

钥匙

应用于每行的输出变化可以完全由当前“级别”和前一个级别确定。这里“级别”表示具有非零条目的列的索引号。

换句话说，保留前一行级别的状态变量足以正确填充当前行。

代码

# the working dataset

df2 = df.iloc[:, :4].reset_index(drop=True) # make a copy

df2.columns = range(4) # rename columns to (0,1,2,3) for convenience

# output container

arr = np.zeros(df2.shape, dtype=int)

# state variable: level of the last row

last_lv = 0

for idx, row in df2.iterrows():

# get current indentation level

lv = row.first_valid_index()

if idx > 0:

# case 1: same or decreased level

if lv <= last_lv:

# keep previous levels except current level

arr[idx, :lv] = arr[idx-1, :lv]

# current level++

arr[idx, lv] = arr[idx-1, lv] + 1

# case 2: increased level

elif lv > last_lv:

# keep previous levels

arr[idx, :last_lv+1] = arr[idx - 1, :last_lv+1]

# start counting the new levels

arr[idx, last_lv+1:lv+1] = 1

# the first row

else:

arr[0, 0] = 1

# update state variable for next use

last_lv = lv

# append result to dataframe

df[["Level I", "Level II", "Level III", "Level IV"]] = arr

结果

print(df[["Level I", "Level II", "Level III", "Level IV"]])

Level I Level II Level III Level IV

0 1 0 0 0

1 1 1 0 0

2 1 1 1 0

3 1 1 2 0

4 1 1 3 0

5 1 1 3 1

6 1 1 3 2

7 1 1 3 3

8 1 2 0 0

9 1 2 1 0

10 1 2 2 0

11 1 2 3 0

12 1 2 3 1

13 1 2 3 2

14 1 2 3 3

15 2 0 0 0

16 2 1 0 0

17 2 1 1 0

笔记

该代码只是演示了处理每一行时的逻辑是什么样的。它尚未完全优化，因此当效率成为问题时，请考虑使用更有效的数据表示形式（例如 numpy 数组或只是级别数字列表）。
我调查了任何tree数据结构的库，例如anytree和treelib，希望找到一种自动输出树层次结构的自动化方法。不幸的是，似乎缺乏适合读取缩进文本文件或类似格式的 I/O 函数。这是我决定重新发明轮子的主要原因。

反对回复 2024-01-27

1 回答
0 关注
104 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Pandas Dataframe - 无循环的组件和子组件编号系统

Pandas Dataframe - 无循环的组件和子组件编号系统

1 回答

笔记

添加回答