I’ve been working on a project recently where I needed to take screenshots of an entire website. I’m not talking about the full screen but the entire-multiple-screens-long website.
There are a few existing tools out that but they wouldn’t work for me for various reasons:
- I wanted to use the command line so I can script it
- It had to run on Linux
- I didn’t want to mess with installing a bunch of libraries (like KDE)
- I wanted something simple, that shouldn’t break later when upgrading things
So I rolled my own.
#!/usr/bin/env ruby require 'rubygems' require 'trollop' require 'vapir' def filename_for_url(url) url. downcase. gsub(/https?:\/\//,''). gsub(/\//, '_'). gsub(/[^0-9A-za-z.]/,'') + ".png" end def print_file_name(file) puts file end opts = Trollop::options do opt :url, "URL to screenshot", :default => '' opt :force, "force saving", :default => false end Trollop::die :url, "url required" if opts[:url].nil? || opts[:url] == '' file = filename_for_url(opts[:url]) Trollop::die :url, "file exists, use --force to override" if (File.exists?(file) && !opts[:force]) begin browser = Vapir::Firefox.new browser.goto opts[:url] browser.screen_capture file print_file_name(file) ensure browser.close end
The code is pretty simple and easy to follow with a bit of explaination.
trollop is used to provide some simple command line parsing and validation. I only care about the url and if the screenshot should be overwritten.
The filename_for_url method just does some simple string substition so the url can be used as a safe filename
e.g. Converting http://theadmin.org/articles/something/ into theadmin.org_articles_something.png
The main guts are lines 27-34 where vapir does 4 things:
1. Creates a new firefox instance, which actually opens the browser
2. Browses to the passed in url
3. Does a full page screen capture and saves it to a file
4. Closes the browser window
The process is a bit slow and it borrows my mouse focus but it works good enough for me. I’d like to get it working with some of the QTWebkit stuff so it runs without opening a browser (headless) but I haven’t felt the need to dive into that complexity yet.
Because you can never talk about screenshots without creating one, here is what theadmin.org looks like.
All in all, it only took me about an hour to research all of the alternative solutions and throw this code together.
Is it a hack? Yes.
Is it something I want to maintain until 2021? No.
Will it save me a few minutes every couple of days? Yes.
Was it something I learned from? Yes.
Don’t be afraid of building your own tools if you need to. You might end up throwing them away when they rust but the knowledge you gain from it will stay with you for years to come.
(If you’re on OSX and don’t mind using python, webkit2png seems to be a common alternative. Though it’s well over 280 lines of code and looks like it was last updated on April 2009.)