Extensions and Enhancements for Checksims

MQP Opportunities for 2015-2016
Professor Hugh C. Lauer

The Checksims program was developed by Matt Heon and Dolan Murvihill in 2014-2015 to provide a tool for checking student programming assignments for illegal copying. The current version of Checksims is implemented in Java and has a command line interface. It has already been used to identify copying in several Computer Science courses at WPI.

For the academic year 2015-2016, I am offering several MQP opportunities to Checksims to make it more useful and more flexible. One or more projects can be defined in the following areas:–

·         Integrate Checksims with the web-based Turnin system, so that an instructor or teaching assistant can check all of the submissions of an assignment at the touch of a button or click of a mouse. (Note that Checksims is written in Java, but Turnin is written in Racket.)

·         Provide a graphical user interface to capture all of the parameters and options necessary to run Checksims on an assignment. The purpose is to make it easy for an instructor or teaching assistant to actually use it without having to laboriously construct one or more command lines.

·         Provide a database or archive facility so that Checksims can check both for similarities within a particular assignment but also for similarities with submissions of assignments from previous terms. It may make sense to integrate this with Turnin.

·         Provide a visual inspection tool, so that a pair of assignments identified by Checksims as “suspiciously similar” can be viewed side-by-side with suspected similarities automatically aligned and highlighted.

·         Automatic detection of common code. Although Checksims has the ability to exclude common code, it is easy for an instructor to forget to specify all such code, thereby resulting in too many “false positive” results.

·         Incorporate additional preprocessors into Checksims, including stripping of C/Java style command and of Python style comments.

·         Extend Checksims to traverse compressed directories without requiring that they be manually expanded beforehand.

In addition, a research-level task is to re-implement the algorithm used by MOSS (Measure of Software Similarity, a tool developed about 20 years ago at Stanford and now considered the benchmark for other similarity detection systems). This should be used on a corpus of actual assignments to compare its performance and accuracy with that of the Smith-Waterman DNA matching algorithm currently implemented in Checksims.