Article
goquery: jQuery-style HTML manipulation in Go
Writing HTML document handling code with parsers is pretty hard. Even plain DOM tree walking can make your code look like spaghetti. That is why libraries like jQuery are popular: they make it easy to do queries and all sorts of other manipulations with HTML documents.
goquery is like jQuery, but in Go.
How goquery works
Goquery is based on the official Gonet/html
package (this isn't a standard library package, but an external package officially supported by Go authors). It also uses a CSS selectors library cascadia.
You pass goquery a document, and then query it using selectors, similar to how you would use jQuery.
Using goquery
Install goquery withgo get
:
go get github.com/PuerkitoBio/goquery
Here's an example:
package main
import (
"fmt"
"log"
"github.com/PuerkitoBio/goquery"
)
func main() {
doc, err := goquery.NewDocument("https://blog.golang.org")
if err != nil {
log.Fatal(err)
}
doc.Find(".article").Each(func(i int, s *goquery.Selection) {
title := s.Find("h3").Text()
link, _ := s.Find("h3 a").Attr("href")
fmt.Printf("%d) %s - %sn", i+1, title, link)
})
}
This example prints article titles and corresponding links from Go blog:
1) Errors are values - /errors-are-values
2) GothamGo: gophers in the big apple - /gothamgo
3) The Gopher Gala is the first worldwide Go hackathon - /gophergala
4) Generating code - /generate
5) Go 1.4 is released - /go1.4
Let me explain the code. First we create a new document with goquery.NewDocument
. As you can see, we pass an URL into this function, and goquery fetches the document for us. Nice! You can also create a document from a Reader (NewDocumentFromReader
), from a HTML node (NewDocumentFromNode
), or from an HTTP response (NewDocumentFromResponse
).
Then we use Find
to query this document: we ask it to find everything with article
class, then call Each
on results (as you can see, we can chain functions just like in jQuery), passing it a function telling it what to do with each found selection: in our case, we again use Find
to first find the title and extract its text (Text
), and then find a link and extract href
from it using Attr
(the second returned value, which we ignore with _
, indicates whether the attribute exists).
That was easy! If you ever need to extract some data from an HTML document, use goquery to save time.