You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Project 1: Conditional Random Fields for Structured Output Prediction
This project aims towards implementing a conditional random field for optical character recognition (OCR) with emphasis on inference and performance test. The project is divided into four parts:
Conditional Random Fields: We derive the gradients with respect to weights Wj and transition matrix Tij. We then show the gradient of log(Z) is exactly the expectation of features. Finally, we implement a decoder with computational cost O(m(y)^2) using the max-sum algorithm. We also implemented a brute-force solution in time O((y)^m) to test that decoder implementation is correct.
Training Conditional Random Fields: We now implement a DP algorithm to compute log(p(y|X)) and its gradient with respect to Wj and Tij. We used scipy.optimize.check_grad to ensure the correctness of the computed gradient. The second part of this portion dealt with using fmin_bfgs to get the optimal values for Wj and Tij. Finally, we used the decoder from part 1 and used the new weights to predict labels for characters in test.txt.
Benchmarking with Other Methods: We benchmarked our implementation with other methods such as SVM-Struct and SVM-MC and plotted the letter-wise and word-wise prediction accuracy on test data.
Robustness to Tampering: We explore what effects do rotation and translation of images have on the decoder and its accuracy. We ran the CRF and SVM-MC algorithm on the distorted images and checked the letter-wise and word-wise prediction accuracy on test data.
Prerequisities
Packages required to run the code can be found in requirements.txt:
Run: pip install -r requirements.txt
scipy
scikit-learn
numpy
Run each script for respective question.
Comment out parts which are long running.
Note: TF is for checking with our final solution, to run, use
pip install tensorflow or
pip install tensorflow-gpu if CUDA is installed
Running the tests
Run q1.py to run the max-sum decoder on 100 characters from the initial dataset. The results are stored in result/decoder_output.txt.
Run q2.py to calculate log(p(y|X)), calculate the gradients, and start the optimizer. The gradients can be found in result/gradient.txt. The predictions from 2(b) can be found in result/prediction.txt
The optimal parameters are pre-computed based on C values and can be found under result/solution[C].txt where C=1, 10, 100, 1000.
The optimal paramters are pre-computed and stored under result/solution[C]distortion.txt where C=500, 1000, 1500, 2000.