Scrapping LINE stickers with Golang

Let’s start by agreeing that Whatsapp does not have great stickers, but LINE does. For the people who have no idea what is LINE, or wonder what is so great about their stickers, [check one out here]. Apart from that, this project would be useful for scrapping information if you are in that field.

Here is the code at my [Github repo]

Scrapping web

Most of us know that there are great libraries in Python such as BeautifulSoup/bs4 and Selenium, Chrome extensions that can download all the media on a specific page, and I would encourage the use of these popular approaches.

However today, the challenge of this project is to:
- Do it without a library
- Attempt to achieve concurrency/async (Downloading multiple stickers at once without waiting)
- (Optional) Package it so our non technical friends can use it

What does it means to download it concurrently/async? It means we are not going to wait for one sticker to download and convert finish, then only proceed to the next one.
Just like in school, the teacher (Madam Choo) will just give everyone available in the classroom one task to complete, while Madam Choo sits and wait for everyone to complete their own task and hand it back to her. On a normal program, Madam Choo would give one student a task, wait for that student to complete then only she will hand out the second task. In most cases this works fine, since that one student is good at what he is doing, however Madam Choo realizes she has a class of 12 students doing nothing all the while. If she could hand out the task for all 12 students (threads), her task would be completed quicker.

Why Golang

Truth is it can be completed just with any other programming languages [NodeJS, C++, Python…]. So really there should not be any debate about this, but purely just for entertainment and learning purposes :)

However what made Golang stand out from the rest would be the ability to achieve the 2nd and 3rd point easily. More importantly, Golang is built for Madam Choo to easily assign all her students a task at the same time.

Speedrun tutorial

Here is the code at my [Github repo]

Understanding our target

Before we start, we will need to understand a few things.
- Our sticker shop is fortunately not client rendered with Javascript, so a basic curl will get all the required information without waiting for AJAX calls.
- We can parse through HTML tags and get the stickers we need, however we noticed that each sticker **raw** information is in a custom HTML attribute called `data-preview=’{}’` . This allows us to parse that information as a JSON format.

data-preview=”{ “type” : “popup_sound”, “id” : “312149456”, “staticUrl” : “https://stickershop.line-scdn.net/stickershop/v1/sticker/312149456/iPhone/sticker@2x.png;compress=true", “fallbackStaticUrl” : “https://stickershop.line-scdn.net/stickershop/v1/sticker/312149456/iPhone/sticker@2x.png;compress=true", “animationUrl” : “”, “popupUrl” : “https://stickershop.line-scdn.net/stickershop/v1/sticker/312149456/android/sticker_popup.png;compress=true", “soundUrl” : “https://stickershop.line-scdn.net/stickershop/v1/sticker/312149456/android/sticker_sound.m4a" }”

With that set, lets start

1/ First we create an entry point that takes in a URL

// Entrypoint
func main(){
consoleReader := bufio.NewReader(os.Stdin)
for {
fmt.Println(“Enter Line Stickershop URL”)
inputUrl, err := consoleReader.ReadString(‘\n’); if err != nil {
log.Fatal(err)
}
// Check if input has at least a line store format
if strings.Contains(inputUrl, “https://store.line.me") {
inputUrl = strings.Replace(inputUrl, “\r\n”, “”, -1)
err := scrap(inputUrl); if err != nil {
log.Fatal(err)
}
} else {
fmt.Println(“Invalid format”)
}
}
}
// Check if input has at least a line store format
if strings.Contains(inputUrl, “https://store.line.me") {
inputUrl = strings.Replace(inputUrl, “\r\n”, “”, -1)
err := scrap(inputUrl); if err != nil {
log.Fatal(err)
}
} else {
fmt.Println(“Invalid format”)
}
}
}

2/ Then we create a scrap function, which uses built in http library to download the webpage.

resp, err := http.Get(scrapUrl); if err != nil {
return err
}

3/ Once the download is completed, we will parse the body of the downloaded webpage. Remember we said that the raw information can be found in a custom attribute called `data-preview`, without going too complicated, a regex call will be able to extract each occurrence if that attribute.

var rgx = regexp.MustCompile(`(data-preview=’.*?’)`)
tmpExtracted := rgx.FindAllStringSubmatch(inputHtml, -1)
for i := 0; i < len(tmpExtracted); i++ {
// Parse the JSON here

4/ Before parsing the JSON, let us create a struct based on the needed information we have from data preview attribute

type DataPreview struct {
Id string `json:”id”`
StickerType string `json:”type”`
PopupUrl string `json:”popupUrl”`
StaticUrl string `json:”staticUrl”`
AnimationUrl string `json:”animationUrl”`
SoundUrl string `json:”soundUrl”`
}

5/ Great, now we just unmarshall the JSON into the DataPreview struct.

6/ Next is to create a function to download the stickers concurrently. With Golang, this could be as easy as a few lines

var wg sync.WaitGroup
for i := 0; i < len(result); i++ {
wg.Add(1)
go downloadImage(result[i], &wg)
}
```
WaitGroup basically just tells Madam Choo how many students has she assigned the task to.

7/ We know that There are 3 Url where we can use, PopupUrl for [big animated stickers](https://store.line.me/stickershop/product/17300/en), StaticUrl for [stickers that do not move](https://store.line.me/stickershop/product/2803/en) and AnimationUrl for [normal sized stickers that are animated](https://store.line.me/stickershop/product/19770/en). Building a simple switch rule will help us identify which URL we should grab, and then again using HTTP library to download the GIF.

8/ After downloading the GIF, I am using [APNG2GIF] to convert the APNG to GIF. This is not the most ideal solution, but definitely the easiest.

9/ Before we proceed asking for another URL from the user, Madam Choo wants to wait for all the students to complete their work.
We need to add this into the async function to inform Madam Choo that its work is done

defer wg.Done()

And Madam Choo will have to wait for everyone’s work to be completed, before continuing

wg.Wait()

10/ And basically we are done! We can quickly package it for Windows and send the compiled binary to our friends just by running

go build main.go

Final

There you go, a quick program to scrap GIF for your pleasure. Although it may seem like a trivial thing to do, it makes a big difference when you are into gathering data for ML trainings or mass scrapping. The difference with and without async does affect the time it takes to achieve our goal at a scale.

Thanks for reading up to here, hope you enjoyed this post!

--

--

--

I'm Jonathan Law Hui Hao, a Business Intelligence analyst in Malaysia who enjoys working with tech, RPA and Machine Learning!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Support In-App-Updates Implementation Example

UseFetch Hook, Explained

React logo

Rust WebAssembly — Sharing data between WebWorkers

Taking out the trash

Vue routes with dynamic imports

Showing Charts for Neo4j Query Results using amCharts and Structr

Javascript = Brains

Data visualization methods Part 2— React + D3.js

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jonathan Law

Jonathan Law

I'm Jonathan Law Hui Hao, a Business Intelligence analyst in Malaysia who enjoys working with tech, RPA and Machine Learning!

More from Medium

REST API service boilerplate using Gin web framework, Golang

GitHub repository preview

How to Create a Containerised Go Application which Can Execute Shell Commands.

Asgardeo authentication with Golang and Goth

Build a REST API with Golang and MongoDB — Gin-gonic Version