Scrapping LINE stickers with Golang

Creating a quick script using Golang to scrap LINE stickers

Scrapping web

Most of us know that there are great libraries in Python such as BeautifulSoup/bs4 and Selenium, Chrome extensions that can download all the media on a specific page, and I would encourage the use of these popular approaches.

Why Golang

Truth is it can be completed just with any other programming languages [NodeJS, C++, Python…]. So really there should not be any debate about this, but purely just for entertainment and learning purposes :)

Speedrun tutorial

Understanding our target

Before we start, we will need to understand a few things.
- Our sticker shop is fortunately not client rendered with Javascript, so a basic curl will get all the required information without waiting for AJAX calls.
- We can parse through HTML tags and get the stickers we need, however we noticed that each sticker **raw** information is in a custom HTML attribute called `data-preview=’{}’` . This allows us to parse that information as a JSON format.

data-preview=”{ “type” : “popup_sound”, “id” : “312149456”, “staticUrl” : “https://stickershop.line-scdn.net/stickershop/v1/sticker/312149456/iPhone/sticker@2x.png;compress=true", “fallbackStaticUrl” : “https://stickershop.line-scdn.net/stickershop/v1/sticker/312149456/iPhone/sticker@2x.png;compress=true", “animationUrl” : “”, “popupUrl” : “https://stickershop.line-scdn.net/stickershop/v1/sticker/312149456/android/sticker_popup.png;compress=true", “soundUrl” : “https://stickershop.line-scdn.net/stickershop/v1/sticker/312149456/android/sticker_sound.m4a" }”
// Entrypoint
func main(){
consoleReader := bufio.NewReader(os.Stdin)
for {
fmt.Println(“Enter Line Stickershop URL”)
inputUrl, err := consoleReader.ReadString(‘\n’); if err != nil {
log.Fatal(err)
}
// Check if input has at least a line store format
if strings.Contains(inputUrl, “https://store.line.me") {
inputUrl = strings.Replace(inputUrl, “\r\n”, “”, -1)
err := scrap(inputUrl); if err != nil {
log.Fatal(err)
}
} else {
fmt.Println(“Invalid format”)
}
}
}
// Check if input has at least a line store format
if strings.Contains(inputUrl, “https://store.line.me") {
inputUrl = strings.Replace(inputUrl, “\r\n”, “”, -1)
err := scrap(inputUrl); if err != nil {
log.Fatal(err)
}
} else {
fmt.Println(“Invalid format”)
}
}
}
resp, err := http.Get(scrapUrl); if err != nil {
return err
}
var rgx = regexp.MustCompile(`(data-preview=’.*?’)`)
tmpExtracted := rgx.FindAllStringSubmatch(inputHtml, -1)
for i := 0; i < len(tmpExtracted); i++ {
// Parse the JSON here
type DataPreview struct {
Id string `json:”id”`
StickerType string `json:”type”`
PopupUrl string `json:”popupUrl”`
StaticUrl string `json:”staticUrl”`
AnimationUrl string `json:”animationUrl”`
SoundUrl string `json:”soundUrl”`
}
var wg sync.WaitGroup
for i := 0; i < len(result); i++ {
wg.Add(1)
go downloadImage(result[i], &wg)
}
```
WaitGroup basically just tells Madam Choo how many students has she assigned the task to.
defer wg.Done()
wg.Wait()
go build main.go

Final

There you go, a quick program to scrap GIF for your pleasure. Although it may seem like a trivial thing to do, it makes a big difference when you are into gathering data for ML trainings or mass scrapping. The difference with and without async does affect the time it takes to achieve our goal at a scale.

--

--

I'm Jonathan Law Hui Hao, a Business Intelligence analyst in Malaysia who enjoys working with tech, RPA and Machine Learning!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jonathan Law

I'm Jonathan Law Hui Hao, a Business Intelligence analyst in Malaysia who enjoys working with tech, RPA and Machine Learning!