EpicDiff is a app suite for creating illustrative and entertaining animations out of docdiff-produced diffs. It was basically written so that I can make an animation out of my NaNoWriMo novel, but you can probably modify it to fit your work flow.
NOTE: THIS SOFTWARE IS PROVIDED IN THE “ITWORKSFORME®” CONFIGURATION. SOME EXPERTISE AND PROVERBIAL TOUCHING OF POTENTIALLY LIVE WIRES IS REQUIRED. YOU MAY MISS YOUR DEADLINES IF YOU RELY ON THIS THING, AND I’M NOT TAKING RESPONSIBILITY FOR THAT. It might blow up when you try to use it. When the smoke clears, please try to make sense of what went wrong. I regrettably CANNOT guarantee that I will be able to provide personal assistance on how to use this program to achieve your goals; you may instead want to go and learn a bit of Ruby and Processing instead. It’s more fun that way! I also cannot guarantee this hack is constantly developed; I may update it in future, however, when I actually need it, and in those cases, try not to strip away any of the functionality that is already present and will only add new options.
- are available through my Processing GitHub repository.
- A video of Dusts of Avalon writing process is available on DailyMotion.
- Probably some sort of POSIX userland. (Needs the wc(1) tool.)
- Ruby and RMagick [ gem install rmagick ]
Probably helpful in conversion:
- gitsplode.rb (Note that while other tools are probably cross-platform in some sense, this tool MAY be ingloriously written, unportable and dependant on Posix-like environment.)
- Java SE JRE Last tested with Sun Java 6 under Debian Linux.
- Processing This is built in Processing 1.2.1.
- GSVideo My runs worked with GSVideo 0.7 in Debian Linux.
The code was written in Ruby 1.8 days and hasn’t been really used since, so there may be some small hitches with 1.9/2.x. I don’t expect anything Earth-shattering, however. It’s not like we’d be on Python or anything.</obligatoryjab>
As detailed in the technical discussion below, Processing 1.x only shipped with obsolete Java QuickTime layer and required the external GSVideo library for this kind of stuff. The current version of Processing, 2.x, actually includes a new video API that is based on GSVideo, so when I get around to updating this app to be compatible with Processing 2, it should work without too many changes (class names are different and that’s about that).
The workflow with the animation goes like this:
- Produce a list of files from my Git repository, using “gitsplode.rb” tool (found in my github repository). This will also generate some of the data for the animation in form of an XML file.
- IMPORTANT: Make sure the files are presentable to docdiff. To wit, convert the files to the same line ending format. docdiff is able to figure out that the files have different types of line endings, but for reasons best known to the goddess ever-watchful, is not able to proceed with the comparison, even with that insight in hand. So, you should make sure all of the files have the same kinds of file endings. This could be a problem if you’ve worked on alternating platforms like Windows and *nix, and you’ve set Git to not automatically convert the line endings. If you’ve worked on only one platform, congratulations, you may be good to go.
- Produce the data used by the animator application, using “tellthetale.rb” Ruby program:
- Produce HTML-format diff visualisations using docdiff
- Convert the HTML to PDF using wkhtmltopdf
- analyse, resize and convert the pages using RMagick
- Produce yet another damn XML file with more summary data. Why make one XML file when you can make two =)
- Slurp the XML summary and image files from individual pages in the Processing program, which will clearly animate the resulting pages and hopefully produce a nice little video file.
The animation software itself is distiributed under the terms of the GNU General Public License, version 3. The bundled “Anonymous Pro” font, by Mark Simonson, is distributed under the SIL Open Font License.
Below is the bulk of the original Component-Based Software Production course assignment from early 2011 that I turned in. It’s a little bit technical.
Premise of the exercise
This assignment is about problem-solving using scripting languages and pre-built open source components.
The motivation for this assignment came from my own ongoing project that was temporarily frozen. This kind of projects are not typical business applications, but they are nevertheless applications that might be needed in business. In other words, this is a “boss ordered this” type project (albeit with me being my own boss): it solves a simple, one-shot problem and tries to solve it with a bit crude but effective way that works right now – the fact that it may also work tomorrow is a plus. And in this kind of situations, scripting languages and pre-built libraries are extremely effective.
I have already run into similar issues in previous courses; during the Project 1 course, the boss ordered a UML diagram of the product’s database schema. I decided to save our team a lot of tedious and error-prone work by writing a Ruby script to parse a MySQL schema dump and produce an XMI file that could be imported into a UML tool. This project is largely similar.
I participated in the 2010 National Novel Writing Month (NaNoWriMo). The idea of the exercise is to write a novel of 50,000 words over November. The idea is to exercise one’s writing habits, and the quantity of the words is more important than the quality; yet, a lot of people, myself included, aimed for a sensible, well-structured novel, and the end result of the project is a rough draft for a novel. The novel itself is besides the point of this exercise, but suffice to say that while the rough draft was completed, and I was one of the winners, the work is still going on.
In my writing projects, I use a lot of open source tools. The novel text was written in TextRoom, a plain-text based word processor, using markup language called Markdown that is designed to be as close to plain text as possible. For syncing and versioning the novel text between my laptop and desktop computer, I used Git, a distributed version control system that is widely used in software development.
The choice of the software had several advantages that inspire further processing:
Git stores every revision of the text (as often as I committed the changes, which happened about daily), and allows retrieval of each separate revision as it appeared at a specific date.
Git stores datestamps with each commit. This allows us to see just how much text we had at a specific point of time.
Because the text is in plain text format, it can be processed easily by other tools.
A further bit of inspiration is that NaNoWriMo already offers visualisations of your novel writing progress in form of word count graphs (a bar chart that compares your progress to the supposed progress), but they only rely on the word count you provide to them daily, and it’s subject to both honesty and the choice of word counting methods. While there is a word count validation at the end of the month, NaNoWriMo do not require you to submit the actual novel text for checking at any point; the text may optionally be submitted in obfuscated form (the FAQ instructs to replace all letters in the manuscript with “a”). The word counting methods are varied as well; throughout the project, I used Unix “wc” tool, which uses a simple algorithm of counting whitespace-delimited tokens, but NaNoWriMo’s official word counter, whose algorithm is not publicly disclosed, apparently rejects non-word punctuation (like en and em dashes that are surrounded by whitespace) so I ended up with a little bit lower official word count.
With all of my data at hand, there was the obvious thing to do: We have a repository of verbatim copies of previous revisions of text, we know exactly when those revisions were made, and we have tools to process them. The writing process isn’t exactly dramatic enough for reality TV. Fiction writing is not the most photogenic job imaginable; no one wants to watch a random computer geek typing all night long on some busted old laptop and drinking near-suicidal amounts of coffee. Yet, if the progress in the novel text itself is visualised interestingly enough, maybe there’s something to it.
Hence, the goal of the project: we use the power of scripting languages and all of the existing graphics libraries to produce an interesting animation on how the novel-writing progressed.
The gitsplode and tellthetale programs were written in Ruby, an interpreted, object-oriented, multi-paradigm programming language. It is influenced by many other scripting languages, mostly Perl. Its object-oriented programming style is largely influenced by Smalltalk. Being a multi-paradigm scripting language, Ruby does not enforce any specific style; you can implement your programs in an object-oriented fashion, but this is not required. Ruby is dynamically typed, and in general, Ruby is “duck-typed”; that is, it is interface-oriented rather than strictly type-oriented, and when gluing things together, it’s more important to implement a matching interface (“If it walks like a duck, then it satisfies the requirements of our Duck-Like Walking Application”) rather than providing a strict match for a specific type (“Sorry, I was looking for anas platyrhynchos domesticus here, can’t use a wild mallard”).
Ruby has an extensive library of components and libraries that can be used in many programs. The components are packaged as “gems” and downloaded and updated using the RubyGems application. For example, if you want to provide Twitter access in your application, you just install the “twitter4r” gem and all of its required gems will be installed by the RubyGems application. Some gems are written in pure Ruby, others also make use of C libraries while providing a very idiomatic, “Ruby-like” variation of the C API. The RubyGems system is also used to distribute applications in addition to the libraries, and also makes it easy to require a specific version of another gem, thus making deployment easier.
I could have done this part in Ruby, but unfortunately, in my opinion, Ruby doesn’t quite have the same visual programming power as the Java-based Processing language. Ruby has a visual framework called Shoes, but it’s not quite the same thing and is still in development. Ruby has extensive graphics frameworks, but not really many that are specifically geared for making animated content. Processing, on the other hand, is mature, well-understood and has the advantage of running on top of Java, which allows reuse of existing components and allows sketches to be deployed on web browsers.
Overview of the workflow
The big idea of the project is to start from the Git repository containing various versions of the text, and end up with a video file that shows how the text changed over time. The workflow for this is as follows, and it corresponds to the various tools used in this project:
gitsplode: Extract each individual revision of the novel text from the Git repository. At this point, the history data is analysed. XML summary of the files is produced.
tellthetale: Do further analysis of the data and produce a yet another XML file. Produce the actual graphics that the animation package will animate.
EpicDiff: Read the analysis XML and the image files, and produce the final animation. Render the animation into a video file.
Between all of these steps, there’s a potential for light hand-tweaking of the input data. The files were written on a Windows XP laptop and a Linux desktop, and Windows and Linux have different ideas of file endings. The tools used for comparison are picky about little things like this.
A note on notation
This document uses UML notation as far as it was feasible. Unfortunately, the tool I had at hand used a bit weird notation of UML 1 -style packages and UML 2 -style interfaces.
As we are working on glue languages, the actual code doesn’t really specify and implement specific interfaces as much as it implements and accesses conceptual interfaces. Hence, the interface notation used here is rather informal. The interfaces only have the nature of the API specified using UML stereotypes; for example,
<<Java>> stereotype means that the API is a native Java API.
Because the input and output of each program is very important, and the programs also use intermediate chunks of data, the diagrams shown here also include the files and devices, marked with the famed ye-olde-IBM-mainframe-hard-drive notation. Temporary files that the programs swap around are marked with dotted outlines.
Diagram 1: gitsplode
The purpose of gitsplode is to take every revision of a single file from the git repository and save it into individual files. It will also generate a summary file in XML format.
Accessing Git data
Git stores information in a repository of its own. Git is implemented as a Unix-like tookit that comes with a large number of separate programs, each doing its small but well-defined function. There are certain tools which are supposed to provide a very low-level interface to objects within a Git repository, but such this program only invokes three high-level tools: git-config, which is used to get information about the git repository, git-log, which is used to obtain a list of commits, and git-show, which is used to get individual files out of the datastore. The git-log tool is used to get the datestamps and commit IDs, and git-show is used to get the actual file data.
End results of the program run
The program generates a large number of files, and a summary XML file, which looks like this.
As is apparent, the commit information is stored in “commit” tags. The metadata obtained from the repository includes Git commit ID, datestamp (which is stored in both textual format and in Unix datestamp format), and the commit message. Filenames are based on the original filename (here “avalon_in_orbit.txt”) with revision datestamp.
Interchange considerations: Is XML the correct choice?
The gitsplode and tellthetale scripts make use of Ruby’s built-in XML generation and parsing library, REXML, which allows XML to be manipulated via DOM/XPath-based interface. While well supported, in Ruby world XML seems less used than the YAML language for the purpose of data serialisation, but since one of the components is written in Java and Java’s support for YAML is potentially spotty (which is to say, I haven’t checked the latest situation, but was pretty bad when I last looked years ago), I decided that XML is a good choice for data interchange.
Diagram 2: tellthetale
The telltethale script does a lot of the “interesting” things in this application. It is also interesting in that it calls different components in different ways: the script uses both external programs, and Ruby libraries. The basic operation of this script is this:
Read the existing XML file that contains revision data.
For each revision, produce a HTML file that shows differences between this and the previous version.
Convert this HTML file into a set of image files.
When done processing revisions, produce an additional XML file that contains information about each page, and whether the page in question contains any added or deleted bits of text.
Finding the differences
The actual difference finding is easy and straightforward enough with DocDiff. DocDiff is a program that can identify differences natural-language text documents, similar to the Unix diff(1) tool which compares text files line by line and is more suited for programming work. DocDiff will also produce a HTML file as an output, allowing for proper graphical representation of the differences. The wholly new text is marked with blue background and deleted text is marked with a red background. Text that has been noticeably changed is marked with yellow background for old text parts and green for new text parts. All deleted text is also struck out.
HTML can be converted to image files by using a tool that is specifically designed for this task. Usually, this would involve using some kind of web browser trickery, but since modern web browser layout engines are separate libraries, which allow the engines to be embedded in other programs, I started looking if there are tools that can convert HTML to images. It turned out that someone had written a tool called wkhtmltopdf, which uses the open-source WebKit layout engine found in Safari and Chrome. Thus, generating a PDF file out of the HTML was a relatively simple task. While it is possible to use WebKit through Ruby, such usage is better geared for GUI applications, and this application already implements our required functionality, so I simply chose to use the external application instead of the Ruby library.
Once the HTML is converted to PDF, what remains is the need to analyse the pages and convert the individual pages to image files that Processing can read. This can be done using any image processing library that can read PDF.
Analysing the pages
While on its face the whole process seems to use a lot of existing components, the analysis part was really challenging.
The basic problem rises from the fact that docdiff produces a HTML file, and HTML files, by nature, are flowing text. This means they are not paginated by nature. The HTML file may have enough structural information to identify where “added” or “deleted” bits of information are in the document, but the “pages” these additions are in depends entirely on the web browser. And even web browser layout engines don’t handle pages as pages designed for printing until preparation for printing is done.
There were therefore many ways to find the information: one would involve hooking into the WebKit engine somehow as the pages were printed (which would have required reimplementing the entire functionality of wkhtmltopdf), one would be to analyse the PDF files (which would have required encyclopædic knowledge of the guts of the PDF system, as the HTML’s structural information is lost when printing)… or, perhaps it’s just best to simply analyse the pages as a bitmap image.
The latter option is a little bit hacky, but since there’s virtually no room for error, it’s easiest to implement. The premise is simple: The pages have white background and black text. All additions and deletions are marked with specific HTML colours. If we look at the page as a pixel buffer, we can simply go through every pixel and read their colour value – or, if the page is an indexed bitmap, we can just look at the colour palette. If the colour palette contains docdiff’s colour values that indicate additions or deletions, we’ve identified a page that contains additions or deletions. There are some caveats, though: if possible, the page should not be scaled beforehand (as blurring may occur, distorting colour values), and there may not be anything but the docdiff-produced HTML output in our text, or the colour detection cannot work reliably.
The identification system works as it is, but this information is just stored in the analysis XML file; it’s not currently used in the application. However, I intend to make use of this information during further program development when I produce the “actual” video that documents my NaNoWriMo progress.
Component choice: GD v. ImageMagick
There is two major open-source image processing libraries, both of which have Ruby bindings.
I have used ImageMagick image library in a lot of projects, but in December, when I first worked on the conversion tool, I ran into a problem with Debian GNU/Linux’s ImageMagick libraries; a version incompatibility involving both Ruby and the ImageMagick development packages forced me to uninstall the Ruby ImageMagick bindings (RMagick), and I had to take a look at GD, the competing graphics library. The incompatibility issue was resolved, and the final program uses RMagick – for the following reasons.
First, GD does not support PDF files. This is not much of a problem; I simply invoked GhostScript in the script to convert pages to PNG. Not pretty, but it did the job.
But the bigger problem is that GD does not support paletted images properly. Paletted images, also called indexed images, have colour values stored in an external palette, and the pixel data simply points to the palette entry. This has the drawback that photographic images, which need a lot of distinct colours, need large palettes, often larger palettes than the file format allows – but with drawings (or, in our case, pages of text) that only have a small number of distinct colours can be stored efficiently. When the PNG file is opened in GD, it sees the image as a truecolor image; that is, each pixel’s RGB colour value is stored in the pixel data itself. Not only do truecolor images need much more storage space, but determining existence of a colour value in a truecolor image is much slower than simply looking up the value from an image palette. This conversion is done automatically, even when the image is paletted to begin with. GD also seemed to fudge the colour values; a simple application that created a palette for the image based on truecolor pixel data produced a very large palette, which curiously didn’t even contain the colour values we knew would be there.
GD also made strange assumptions based on the files. For example, I used ImageMagick command line tools to pre-process some images, and GD absolutely refused to read PNG files that had filename extension “.png8”. The GD library makes guesses based on file names and not file contents. Very odd.
RMagic, on the other hand, supports PDFs directly (by automatically invoking GhostScript as necessary). It allowed image to be converted to indexed format, and the index contained precisely the image values we were interested of – no curious fudging involved. The only really odd thing was that RMagick ignored my conversion request when the file was saved as “.png”; this automatically resulted in a truecolour PNG file, even if I explicitly said the file has to be saved in PNG8 format. This is, again, very odd.
The tellthetale invokes standard Unix wc(1) tool for counting words, which is essentially the same tool I had used in the NaNoWriMo project itself. The same wordcount algorithm was also implemented in the TextRoom application.
The wc(1) tool is not necessarily the best option, because implementing the algorithm as a native Ruby component would improve cross-platform compatibility, and it’s not very difficult. I have also written a few Ruby libraries that implement different word count methods, but regrettably, I have not yet written a tool that would be specifically aware of Markdown syntax. My libraries are also currently not in any sort of easy-to-deploy condition; before I’d include them in my projects, I should probably produce RubyGems for them. There’s also a good chance that there are better gems already in existence since I last looked for them.
In short, this is definitely a new direction of development.
End results of the program run
Again, we get a number of files – this time image files. The analysis is stored in another XML file.
The most remarkable part of this XML file are the “added”, “deleted”, “new” and “old” attributes found in the “page” entities. These are part of the analysis that was done through ImageMagick. The word counts are saved in the “wordcount” attributes. Most of the commit information is moved to attributes.
Diagram 3: EpicDiff
EpicDiff is the third application in the workflow, responsible for the actual visualisation.
EpicDiff is distributed as a Processing sketch, and needs the Processing IDE to run. This is because the actual program data has to be generated by gitsplode and tellthetale, and placed in the sketch’s data/input directory first. The sketch requires data/input/page_summary.xml as well as all of the referenced image files.
An unlikely component: Anonymous Pro
One of the interesting component choices that I didn’t actually anticipate beforehand was the choice of fonts. While not directly programming code, they are still part of the development process and they are involved in the animation creation itself. Processing is able to use components installed on the system, but I wanted to include a font with the Processing sketch, so that it can be used consistently across all platforms. Of course, the user is able to choose whatever font they want.
But the fact remains that if a font is included in a distribution package, its license must be properly maintained. You cannot simply just take whatever .ttf file and put it in the package; font files are under copyright, too. Hence, I had to include a font that is also under an open-source license. I chose Mark Simonson’s Anonymous Pro, a monospaced programming font, as the application font. I simply included all font files, and the license text for the font. The font is distributed under Open Font License, which basically allows the font to be used for any purpose; the font files themselves need the accompanying license material.
Component evaluation problem: Java video
The final step in the video production is, obviously, encoding the final animation frames into a video stream. I knew beforehand that this was going to be a big problem. Many programming languages have modern, up-to-date support for video APIs, right out of the box. Ruby, like most scripting languages, has a third-party gem that adds a support for GStreamer, an open source audio and video framework. Some other platforms also have very good support for the native video frameworks; I am no C# expert, but I know that it’s very easy to use Windows’ native media APIs through the .NET framework.
As is widely known, Sun’s policy has been so far that all of the built-in components of the Java platform should be cross-platform, as “pure Java” as possible, and well-integrated to the existing Java APIs. Java already has a fairly good support for images and audio, and it is a very mature platform. In this light, it comes as a surprise that Java has an extremely bad built-in support for video. As far as Sun/Oracle offerings go, Java only has the Java Media Framework (JMF), which hasn’t been re-released as an open source project, it hasn’t been updated in years and has very limited support for any kind of modern video codecs.
A small ray of hope came from Processing: the website touted Processing’s video API, and its ability to encode and decode video and use video capture devices. However, a closer inspection reveals that this video support is based on Apple’s QuickTime for Java library. While QuickTime is available for both Windows and OS X, this library was apparently never supported on anything but OS X. Even when I currently do have a Mac laptop with QuickTime 7, QuickTime for Java has been deprecated in newer versions of OS X and QuickTime.
Here, I recognised one alternate route of video encoding: Processing sketches are able to access the current frame’s pixel buffer directly, and the pixel buffer can also be saved into any regular image file using Processing’s builtins. It would be possible to simply save a series of PNG images, and then use the famous FFmpeg encoder program as the final step to encode the image sequence into a normal video file. This option wasn’t necessary, but it wouldn’t have added much more work.
I was able to find a third-party library called GSVideo. This was touted as a near-drop-in replacement for Processing’s antiquating video API – class names were slightly different, a different import statement was required, and due to different codecs, encoding parameters were, of course, different too – but the API calls themselves are the same. GSVideo is also cross-platform, though it is based on a native-compiled C library – specifically, GStreamer, which, as mentioned, is a very widespread open-source library that is also available for scripting languages.
Adding GSVideo encoder to the project couldn’t have been simpler: unpack library to the “libraries” folder, restart Processing, add an import statement, instantiate a GSVideoMaker object with the proper video encoding parameters and call it with start(). When every frame is drawn, call Processing’s loadPixels() to update the pixel buffer and then call GSVideoMaker’s addFrame() method. When the animation is done, just call GSVideoMaker’s stop() method. The only hitch was that I didn’t remember to restart Processing, but when the application did compile properly the first time, it also ran perfectly. A very pleasant surprise.
I chose to encode the video into Matroska (.mkv) container using high-quality MJPEG. The MJPEG stream was then temporarily copied to RIFF (.avi) container using mencoder (because of an apparent bug in my FFmpeg configuration that prevented it from reading Matroska MJPEG files), then compressed using FFmpeg to Google VP8/WebM format for uploading to YouTube.
I had started to work on this project before the course began, and I just wanted to finish the basic functionality over the course. Once the course is over, I intend to continue the development of this program. The idea is to release a nice, informative and entertaining video about the novel development process; a friend of mine already agreed that the animation is fascinating and somewhat hypnotic as it is, but a bit of music, and perhaps adding a bit of other interesting bits will make the video even more interesting.
The tellthetale program currently identifies the pages that have been changed, but this information is not actually used in the application. As mentioned earlier, this is purely for future growth. The idea is to flash specific icons on each changed page once the pages come to rest on the screen, and add some sort of textual information to the video (e.g. “2 pages with new text”.)
Generating some sort of graphs would also be fruitful; these could be shown during the animation with a bit of translucency applied. There are several interesting graphing libraries for both Ruby and Java, so I imagine this will again not be much of a problem.
There are a few non-code-related problems that cannot be fixed, though – as the video shows, I do not have very good “development habits” what comes to the novel’s commit messages, and most were just “More”. I also do not have much other material about the novel’s development; perhaps a few photographs, but not much more. Perhaps I keep this sort of considerations in mind with this year’s NaNoWriMo.