HackMIT 2018

CODIFY

During HackMIT 2018, my team created a phone app that takes a picture of whiteboard code and opens it in an IDE or text editor of your choice on your computer in a matter of seconds. This project won SourceGraph's Best DevTool prize.

Github Repository

How it Works

Phone App

You input the target computer's IP Address, and take a picture of the whiteboard code

Google Vision

The image is sent to the server, which processes the image using the Google Vision API

Post-Processing

The server optimizes the API output for code, with spacing, variable names, and tabs

Your Computer

When the server is done, the file will automatically open on your screen!

Phone App

Ionic Cordova

AngularJS

The phone app serves 3 main functions:

Inputting your computer's IP address
Taking a picture of the whiteboard code
Sending this data to the server

In the future, we hope to create a pairing process to avoid manually typing in your computer's IP address.

Server

NodeJS

The server handles the processing of the image data. First, the raw image is sent to the Google Vision API. Then, the output of the API is processed to more accurately reflect the written code.

The Google Vision API is optimized for everyday text, so several algorithms had to be used to ensure the correct code output was created.

Camel Case & Variable Names
- Because the API is optimized for normal text, it often viewed variable names as two separate words. For example, a variable such as "camelCaseVariable" would likely be interpreted as "camel Case Variable", which would not compile
- To handle this, our algorithm uses the coordinates of the bounding boxes for each word to determine if such words are close enough to actually be the same word/variable name
Tabs
- The Vision API does not indent the lines of code.
- We again used the bounding boxes of the first word in each line to determine the correct number of tabs before each line, using a recursive algorithm to group lines into categories with similar indentations

In the future, we'd also like to implement:

Corrective measures for incorrect characters
- Using the levenshtein distance between two strings - how "similar" they are, we can guess if a variable name that appears often in the code was simply misinterpreted by the Vision API if it appears slightly differently elsewhere
Insertion of missing parentheses, brackets, semicolons, etc
- The Vision API often misses the closing bracket (perhaps misinterpreting it as a stray mark), and sometimes misses colons or semicolons as well.
Better language detection
- The current implementation can only guess the difference between Python and C using presence of brackets
- In the future we would like to add smarter language detection using available libraries

Target Computer

NodeJS

The target computer is responsible for:

Listening to POST requests from the server
Converting incoming text data into a file
Appending the correct extension - ".py" for Python, for example
Opening the file

My Responsibilities

Creating the Phone App

Server communication with Google Vision API

Variable Names and Spacing Algorithm

Github Repository