2 回答

TA贡献1898条经验 获得超8个赞
我创建了该结构的全局变量并不断用不同的方法填充它
不确定这是否是最好的方法。
fun main(){
....
webpage := WebPage{} //Is this a right way to declare a mutable struct?
c.OnRequest(func(r *colly.Request) { // url
webpage.Url = r.URL.String() // Is this the right way to mutate?
})
c.OnResponse(func(r *colly.Response) { //get body
pageCount++
log.Println(fmt.Sprintf("%d DONE Visiting : %s", pageCount, webpage.Url))
})
c.OnHTML("head title", func(e *colly.HTMLElement) { // Title
webpage.Title = e.Text
})
c.OnHTML("html body", func(e *colly.HTMLElement) { // Body / content
webpage.Content = e.Text // Can url title body be misrepresented in multithread scenario?
})
c.OnHTML("a[href]", func(e *colly.HTMLElement) { // href , callback
link := e.Attr("href")
e.Request.Visit(link)
})
c.OnError(func(r *colly.Response, err error) { // Set error handler
log.Println("Request URL:", r.Request.URL, "failed with response:", r, "\nError:", err)
})
c.OnScraped(func(r *colly.Response) { // DONE
enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", " ")
enc.Encode(webpage)
})

TA贡献1866条经验 获得超5个赞
我基于 Espresso 的回答...
c2.OnHTML("html", func(html *colly.HTMLElement) {
slug := strings.Split(html.Request.URL.String(), "/")[4]
title := ""
descr := ""
h1 := ""
html.ForEach("head", func(_ int, head *colly.HTMLElement) {
title += head.ChildText("title")
head.ForEach("meta", func(_ int, meta *colly.HTMLElement) {
if meta.Attr("name") == "description" {
descr += meta.Attr("content")
}
})
})
html.ForEach("h1", func(_ int, h1El *colly.HTMLElement) {
h1 += h1El.Text
})
//Now you can do stuff with your elements from head and body
})
- 2 回答
- 0 关注
- 108 浏览
添加回答
举报