We recently encountered a situation where we needed to print (on paper!) the contents of an entire source code repository. I know, it sounds ridiculous, but sometimes compliance with some business or legal requirements forces programmers into doing silly things. Thus I began to look for a way to print source code to pdf.
After some quick web searches, I decided to use enscript. Enscript is a command line tool that prints to PostScript. It also has the advantage of being highly configurable and fast. In addtion, I needed a tool to produce pdf files from the PostScript files. There are lots of good tutorials online for how to do this, and eventually I settled on pstopdf, as illustrated in this post. Both of these packages were available from homebrew (pstopdf as part of GhostScript). So all I needed to get started was this:
$ brew install enscript ghostscript
As I mentioned, enscript is highly configurable so you'll want to check the documentation to get the command exactly right for your use case. This is the command I decided on:
$ enscript -1rG --line-numbers --highlight=python -p - --color=0 somefile.py \- | pstopdf -i -o somefile.pdf
This will create a pdf called somefile.pdf in the landsscape orientation (-r option)with 1 page of source per output page (-1). In addition, the file will be created with a fancy header (-G option, see manpage for details and configuration options), which includes the filename and a timestamp.
One potential pitfall here is the
-p - option (note the trailing dash). This allows you do create a temporary (in memory, I think) file which can then be piped as input to
pstopdf. If you do not include it, you'll get strange errors and no output.
The rest of the options are pretty straightforward.
The local directory that houses my source code has a lot of things in it that are not my source. For instance, it has a Git repository, a virtualenv, and a host of .pyc and other things that are not relevant. So we need a way to traverse the directory and prune files/directories that are not relevant. Python's
os.walk is a great solution for this.
One of the strange things to grasp about
os.walk is that it returns a 3-tuple for each directory it encounters on its recursive walk down a file path, including the start path. The members of that tuple are the current directory's path, a list of all of the subdirectories in the current location, and a list of all of the regular files in the current location. What's nice is that you can change the subdirectories list in the context of the iteration, which can help you prune the number of directories you search. Here's a quick example:
for dirname, subdirs, files in os.walk(START_DIR): if 'env' in subdirs: del subdirs[subdirs.index('env')] # remove virtualenv directory if '.git' in subdirs: del subdirs[subdirs.index('.git')] # remove git repository for f in files: if f.endswith('.py'): print 'I am a python source file!'
This will skip the virtualenv and .git directories on the walk, leaving you with a walk that will only identify the files you want, in this case, .py source files.
In the end, I wrote a quick python script that used the
subprocess module to run my
enscript command for files that I identified using
os.walk. You can see my complete solution here.
I was really pleased with the way that enscript worked. Combined with a little python scripting and the os.walk magic, I was able to create a quick script to print out all the source files to pdf.