Codementor Office Hours with Eric Chiang: Stdin, Stdout: pup, Go, and life at the command-line

Preview:

DESCRIPTION

Codementor Office Hours: https://www.codementor.io Pup is a flexible command line tool written in Go for parsing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors. Inspired by jq, pup aims to be a fast and flexible way of exploring HTML from the terminal. Pup was on the top of Hacker News when it debuted. On 10/15 at 11am PDT / 2pm EDT, Pup's creator Eric Chiang hosted a Codementor Office Hours on Go and command line programming. An intro to command line programming and building tools for it in Go. We will run through some basic command line tools: grep, awk, sed, and jq. We'll talk about curl, wget and pup, then wrap it up with a conversation about Go. Eric Chiang is a software engineer and founding member at Yhat, a NYC startup building products for enterprise data science teams. Eric enjoys of Go, data analysis, Javascript, network programming, Docker, and grilled cheese sandwiches.

Citation preview

stdin,stdoutpup, Go & life at the command-line

stdin,stdoutpup, Go & life at the command-line

$ cd ~/talks/codementor$ cat hello.txtHello, Code Mentor!$

CLI life: Data

data

[LOG] some data[LOG] more data[LOG] even more

col1,col2,col3some,data,andeven,more,data

{ “some”: “data”, “more”: “data”}

<div>

<h1>Some</h1><p>data</p>

</div>

grep & nl

grep

cat

grep

cat

pipes!

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

curl & wget

$ wget -O Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt

$ wget -O Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt

$ wget -O Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt

wget =

$ wget --load-cookies cookies.txt

$ wget -O Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt

$ curl -o Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt

$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

$ curl http://gutenberg.org... | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50

curl & wget

curl & wget

I hate HTML

HTML is really hard

“Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp.”

“Have you tried using an XML parser instead?”

But it gets worse

<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>

<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>

Yes, this is valid HTML:

<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>

Yes, this is valid HTML:

<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>

Yes, this is valid HTML:

<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>

Yes, this is valid HTML:

NEVER TRY TO WRITE AN HTML PARSER

Nokogiri 鋸

pup

Still HTML

$ curl -L -s reddit.com/r/programming/ | \ pup p.title a[href^=http] attr{href}

$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] attr{href}

$ curl -L -s reddit.com/r/programming/ | \ pup p.title a[href^=http] attr{href}

$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] attr{href}

$ curl -L -s reddit.com/r/programming/ | \ pup p.title a[href^=http] attr{href}

$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] attr{href}

$ curl -L -s reddit.com/r/programming/ | \ pup p.title a[href^=http] attr{href}

$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] attr{href}

$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] json{}[ { "attrs": { "href": "https://hacks.mozilla.org/2014/10/passwordless-authentication-secure-simple-and-fast-to-deploy/" }, ...]

$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] json{}[ { "attrs": { "href": "https://hacks.mozilla.org/2014/10/passwordless-authentication-secure-simple-and-fast-to-deploy/" }, ...]

$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] json{}[ { "attrs": { "href": "https:.../" }, "tag": "a", "text": "SHOW HN: pup" }, ...]

github.com/EricChiang/pup

Part II: Building CLI tools in Go

import java.util.Scanner;

class Hello { public static void main(String[] args) { Scanner reader = new Scanner(System.in); System.out.print("Enter your name: "); String name = reader.nextLine(); System.out.printf("Hello, "+name+"!"); }}

import java.util.Scanner;

class Hello { public static void main(String[] args) { Scanner reader = new Scanner(System.in); System.out.print("Enter your name: "); String name = reader.nextLine(); System.out.printf("Hello, "+name+"!"); }}

import java.util.Scanner;

class Hello { public static void main(String[] args) { Scanner reader = new Scanner(System.in); System.out.print("Enter your name: "); String name = reader.nextLine(); System.out.printf("Hello, "+name+"!"); }}

import java.util.Scanner;

class Hello { public static void main(String[] args) { Scanner reader = new Scanner(System.in); System.out.print("Enter your name: "); String name = reader.nextLine(); System.out.printf("Hello, "+name+"!"); }}

Why Go?

Why not?

Taken from Rob Pike’s talk public static void main (2012)

Taken from Rob Pike’s talk public static void main (2012)

“dear god make it stop”

Why not?

Why not?

I suck at this -->

Go

package main

import "fmt"

func main() {fmt.Println("Hello, world!")

}

line

package main

import "io"import "os"

func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")

}

package main

import "io"import "os"

func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")

}

package main

import "io"import "os"

func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")

}

package main

import "io"import "os"

func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")

}

package main

import "io"import "os"

func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")

}

$ echo "Hello, World"Hello, World

$ echo "Hello, World"Hello, World$ go get github.com/ericchiang/line

$ echo "Hello, World"Hello, World$ go get github.com/ericchiang/line$ echo "Hello, World" | lineHello, World

$

url-encode

Live demo!

gox

Messing with zip

Thanks!

Recommended