Grabbing website snapshots from the waybackmachine and save as pdf

I’m recently working on a Critical Discourse Analysis assignment. I’ve chosen a website and trying to compare the different versions of the same site over time using the waybackmachine.

In order to have a better overview I’m trying to convert and capture the site as pdf file, and I’ve found this useful gem app (wkhtmltopdf) which converts a site to pdf.

To convert a single webpage the syntax is quite simple. In Terminal, just type

wkhtmltopdf --zoom 2.8 --page-size A4 <url> <filename>

However I’m trying to download multiple snapshots so I found another tool. “Wayback Machine Downloader”

I then save all the urls in one text file and the corresponding filenames in another text file. Wrote a simple bash script

for i in {0..12}
 wkhtmltopdf --zoom 2.8 --page-size A4 ${arr[$i]} ${pdfName[$i]}
 #echo ${arr[$i]}

Then simply run the script and the sites are being downloaded as pdf file. Lots of fun 🙂

Took me an hour because of some bash syntax for “:” (colon) in an array…


The cost to go backward…

In this article (The Four Horsemen of the Cyber Apocalypse) , it talks about the possible scenarios that will paralyse digital system. They are certainly valid but I can’t stop thinking of a this question, what is the tipping point for people start to move back to paper-based system?

Given the current database and networking technology, there’s almost no real “safe” system that can be created and left alone without being upgraded. Let alone the attach to cloud based service, for institutes that efficiency is not the top priority, paper based system could be a better solution in terms of security.

In the Silo series written by Hugh Howey, it described a post-apocalypse scenario that the whole human kind lives underground. The “history” of the earth is being modified and changed in the database so no one really know what happened, except for the managements of the silos. This top administrator of the silo has access to physical books, which document the “real history” of the past.

Let alone whether there’s a “real history/reality“, just by thinking about this digital history rewriting scenario is scary enough. There are legal requirements for some institute to keep paper-based record, but is there any company/institute that want to move back to paper-based record/database from a digital system? What is the cost of it? Is there any formula for that?

To boil it down to one question, how much does it cost to convert a database (e.g. MySQL/Postgresql) to a paper-based system?

How we see the world differently with mobile phone- VVS

This is probable an old video, but it is really funny. Shooting video in portrait with mobile phone, also known as “Vertical Video Syndrome”/VVS.

What we can see here is the merging of 2 different technologies (video tech and mobile phone). The vertical format of most mobile phones is definitely a sensible design, but it wouldn’t work when the orientation of the camera (landscape/portrait) correspond to the orientation of phone. What is the solution?

A simple approach maybe having a phone that shoots in landscape mode while being hold “up-right”.

Mm… why isn’t there a phone like this?

Using Google Drive to share documents

One of the problems of using GDrive in a school/company is the file ownership. If someone who owned/created the file left and the account was deleted, it will be gone forever. I order to further migrate the data to the cloud service, transfer the ownership to a “school owned” account is essential.

However, explaining all these by an email would be extremely boring… Finally I’ve created my first PowToon..

Prejudice and door lock

Which one is more powerful? A door lock or prejudice?

I encountered 2 broken door lock incidents in the past few days, luckily nothing went wrong. 

One of the door usually requires a smartcard to open, as the system was faulty that day, someone disabled the lock to make sure people have access to it. However I still swipe my card every time when I push an unlocked door…

Here’s an image of a classic InfoSec theory (Which is about “CIA”)


It talks about integrity, availability and confidentially. If you are living in an apartment building in Hong Kong, most of the time if you forgot to lock the door, nothing will happen. The challenge for anyone who’s trying to break in an apartment/system is that there’s a cost for testing each door. Most people will assume a door is “locked”.

How about when we try to share a new idea? Do we assume people will not take it and just drop it? To me, prejudice is much more powerful than a door lock, cause after all, we all assume a door lock can lock the door… but it isn’t always the case.


Printing with AppleScript + Processing/Java

For one of the recent project, I have to write an application that generate QR code and send the generated image/file for printing.

Java printing API
Since I used processing as the development platform, I tried Java Printing API. The original plan was:

  1. Use the XZing library for code generation (
  2. Using the ecoder class in the core library, a bitArray will be returned, basically it is a 1d array recording 0 & 1s, so it could be easily translated to PImage using the pixel access class.
  3. Draw the generated QRCode image to the canvas or an offscreen PGraphics.
  4. Save the image file and send the file to print.

The constrain for using Java printing API was the default PDF output library comes with Processing can only produce images that in 72dpi, and it is difficult to utilize the print options of printers (e.g. fit to page, media size…etc).

Thus despite of the initial success, this method was abandoned.

Apple script and printing
The good news is that for this project we are planning to use Mac OS X platform. After fighting with CUPS printing server back in Ubuntu time, I am planning to use BASH script to accomplish this.

Bash + Java
Before doing anything further, the first thing to test is whether Java can trigger and execute a bash script, then write a bash script that will actually print a file.

I knew that if you type “open ~/Desktop” in OS X terminal, a Finder window will actually pop up showing the content on the Desktop folder, thus I wrote a script like this.

open Desktop

and saved the above text in, to test it I have to first make it executable

chmod +X

Then in Processing I simply put the following lines and ran it. (Reference:

          theProcess = Runtime.getRuntime().exec("java");
      catch(IOException e)
         System.err.println("Error on exec() method");

And it worked!

Print script
The next step is to find references for writing bash script that will actually print in OS X, and luckily I found some useful references

Basically you can simply write a bash script like the following for printing

ls -d PrinterName -o landscape -o fit-to-page path_to_file

In order to find the printer name, you can run the following command in terminal

lpstat -p -d

It worked like a charm until I tried with another printer (Hiti P510L)

Setting the media size
No matter how I changed the image size the printer was simply not printing. It kept showing “unsupported paper size” error. The reason for not trying this option in the beginning was that I’ve already set the default options through CUPS web interface (localhost:631), by combining this with the “-o fit-to-page” options, it should automatically convert and print. However it wasn’t the case.

Out of desperation, I tried listing and specifying the size and orientation in the bash script.

To list the supported media size, use the following command

lpoptions -p printer -l

In the actual bash script I wrote the following

lp -d printerName -o portriat -o fit-to-page -o media=6x4 filename

And it worked!

Problem solved!

PD Session- Integrate EduTech in Classroom

“Open your laptop and login to ManageBac” is almost the equivalent to “turn to page xx of the textbook”  the school. However is the potential of laptops being fully exploited in our classroom? Is it being used as a “typewriter and encyclopedia”?

ICT and Technology
As a technology teacher and teacher librarian, this is a typical conversation in the tech lesson.

“Mr. Lin, my computer doesn’t work”
“Well, it’s not really about the subject, can we talk about it later?”
“But it is about my COMPUTER, it is TECHNOLOGY”

What does IB says about technology and IT?
MYP Technology and ICT- OCC_m_g_mypxx_fcl_1103_1_e

From the document:
In most cases, technology teachers are given responsibility for providing students with the teaching and learning experiences to help them develop ICT literacy. The teaching of ICT skills should not be confused with or take the place of a computer technology course. ICT is simply a tool used to develop computer technology solutions using the Design Cycle.

Also from the document:
Depending upon the school resources, ICT should be used whenever appropriate:

  • As a means of expanding students’ knowledge of the world in which they live
  • As a channel for developing concepts and skills
  • As a powerful communication tool.

ICT as a powerful communication tool for collaboration for developing concepts and skills with international mindedness.

Yes, this is a long title but it actually illustrate how we can use the laptop effectively in the classroom. I will be demonstrating how we can use Google Site and Google+ in the classroom for collaboration.

Google Site/Wordpress for teachers
Google site can be used as a repository for sharing essential documents, references for lessons and an electronic notice board for your class.
Here’s an example of how a subject site can look like
(Using wikispaces)
(Using Google site)

Other references
40 ways to use google app in classroom
How to insert google docs/drive on google site
Using Google+
Using Google Form

Google+ Community
Instead of using facebook, Google community is a better way to create a “learning community”.

11 steps to create Google+ community for your class

One more thing- AirServer
The above tools are very handy when it comes to sharing. AirServer allows users to share their screen that supports airplay to be shown on your laptop.
*Disclaimer, I’m getting a 3 days extension if you download from here.