为了账号安全,请及时绑定邮箱和手机立即绑定

为什么逐行读取文件需要更多内存?

为什么逐行读取文件需要更多内存?

Go
偶然的你 2023-07-26 17:01:37
我尝试读取以下格式的大文件:a string key, 200 values separated by comma并将其写入地图。我写了这段代码:package mainimport (    "bufio"    "unsafe"    "fmt"    "log"    "os"    "runtime"    "strings")func main() {    file, err := os.Open("file_address.txt")    if err != nil {        log.Fatal(err)    }    defer file.Close()    mp := make(map[string]float32)    var total_size int64 = 0    scanner := bufio.NewScanner(file)    var counter int64 = 0    for scanner.Scan() {        counter++        sliced := strings.Split(scanner.Text(), ",")        mp[sliced[0]] = 2.2    }    if err := scanner.Err(); err != nil {        log.Fatal(err)    }    fmt.Printf("loaded: %d. Took %d Mb of memory.", counter, total_size/1024.0/1024.0)    fmt.Println("Loading finished. Now waiting...")    var ms runtime.MemStats    runtime.ReadMemStats(&ms)    fmt.Printf("\n")    fmt.Printf("Alloc: %d MB, TotalAlloc: %d MB, Sys: %d MB\n",        ms.Alloc/1024/1024, ms.TotalAlloc/1024/1024, ms.Sys/1024/1024)    fmt.Printf("Mallocs: %d, Frees: %d\n",        ms.Mallocs, ms.Frees)    fmt.Printf("HeapAlloc: %d MB, HeapSys: %d MB, HeapIdle: %d MB\n",        ms.HeapAlloc/1024/1024, ms.HeapSys/1024/1024, ms.HeapIdle/1024/1024)    fmt.Printf("HeapObjects: %d\n", ms.HeapObjects)    fmt.Printf("\n")}这是输出:loaded: 544594. Took 8 Mb of memory.Loading finished. Now waiting...Alloc: 2667 MB, TotalAlloc: 3973 MB, Sys: 2831 MBMallocs: 1108463, Frees: 401665HeapAlloc: 2667 MB, HeapSys: 2687 MB, HeapIdle: 11 MBHeapObjects: 706798Done!虽然密钥仅占用约 8Mb,但程序占用约 2.7Gb 内存!似乎sliced永远不会从堆中删除。我尝试sliced=nil在末尾进行设置for,但没有帮助。我读过,如果我将整个文件加载到内存中然后分割它,我可以避免这个问题,但是我必须逐行读取文件,因为我没有足够的内存来加载一些较大的文件文件。为什么内存被占用了?处理完每一行后如何释放它?
查看完整描述

2 回答

?
拉丁的传说

TA贡献1789条经验 获得超8个赞

为了高效地使用 CPU 和内存,


key := string(bytes.SplitN(scanner.Bytes(), []byte(","), 2)[0])

mp[key] = 2.2


查看完整回答
反对 回复 2023-07-26
?
慕仙森

TA贡献1827条经验 获得超7个赞

我想我发现了问题!我对大文件的每一行进行切片。返回的[]string是一个切片,包含原始字符串(文件行)的子字符串。现在的问题是,每个子串都不是一个新串。Is 只是一个slice,它保留对未切片字符串(文件行!)的引用。我保留了sliced[0]每一行,因此,我保留了对文件每一行的引用。垃圾收集器不会触及读取行,因为我仍然引用它。从技术上讲,我读取文件的所有行并将其保留在内存中。


解决方案是将我想要的部分(sliced[0])复制到一个新字符串,从而有效地丢失对整行的引用。我是这样做的:


    sliced := strings.Split(scanner.Text(), ",")

    key_rune_arr := []rune(sliced[0])

    key := string(key_rune_arr) // now key is a copy of sliced[0] without reference to line

    mp[key] = 2.2 //instead of mp[sliced[0]] = 2.2

该程序现在变为:


package main


import (

    "bufio"

    "unsafe"

    "fmt"

    "log"

    "os"

    "runtime"

    "strings"

)


func main() {


    file, err := os.Open("file_address.txt")

    if err != nil {

        log.Fatal(err)

    }

    defer file.Close()


    mp := make(map[string]float32)

    var total_size int64 = 0

    scanner := bufio.NewScanner(file)

    var counter int64 = 0


    for scanner.Scan() {

        counter++

        sliced := strings.Split(scanner.Text(), ",")

        key_rune_arr := []rune(sliced[0])

        key := string(key_rune_arr) // now key is a copy of sliced[0] without reference to line

        mp[key] = 2.2 //instead of mp[sliced[0]] = 2.2

    }


    if err := scanner.Err(); err != nil {

        log.Fatal(err)

    }

    fmt.Printf("loaded: %d. Took %d Mb of memory.", counter, total_size/1024.0/1024.0)

    fmt.Println("Loading finished. Now waiting...")


    var ms runtime.MemStats

    runtime.ReadMemStats(&ms)


    fmt.Printf("\n")

    fmt.Printf("Alloc: %d MB, TotalAlloc: %d MB, Sys: %d MB\n",

        ms.Alloc/1024/1024, ms.TotalAlloc/1024/1024, ms.Sys/1024/1024)

    fmt.Printf("Mallocs: %d, Frees: %d\n",

        ms.Mallocs, ms.Frees)

    fmt.Printf("HeapAlloc: %d MB, HeapSys: %d MB, HeapIdle: %d MB\n",

        ms.HeapAlloc/1024/1024, ms.HeapSys/1024/1024, ms.HeapIdle/1024/1024)

    fmt.Printf("HeapObjects: %d\n", ms.HeapObjects)

    fmt.Printf("\n")

}

结果正如我所希望的那样:


loaded: 544594. Took 8 Mb id memory.Loading finished. Now waiting...


Alloc: 94 MB, TotalAlloc: 3986 MB, Sys: 135 MB

Mallocs: 1653590, Frees: 1108129

HeapAlloc: 94 MB, HeapSys: 127 MB, HeapIdle: 32 MB

HeapObjects: 545461


Done!


查看完整回答
反对 回复 2023-07-26
  • 2 回答
  • 0 关注
  • 132 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信