Responsible for pushing forward ML & Data Analysis initiatives within the company.
Implemented image recognition algorithms in python to help cluster same products across crawled sites.
Improved crawling architecture by documenting the process and writing function to SQL database.
The program was added to the in-house admin panel and shown to investors.
Comparison of Classifiers
Using the SpamBase Dataset, I compared the effectiveness of 4 classifier algorithms to classify spam.
The program run 10 cross-validation sets. Time shown is the total time taken to complete all 10 runs.
I was surprised to find that a vanilla Random Forest Tree increased the accuracy by 10% on the dataset, while decreasing the computation time. Built with Scikit-learn.
Created a Histogram of Oriented Gradients + SVM shallow classifier to serve as a benchmark on classifying the CIFAR-10 dataset, before using Inceptionv3.
Used HOG from scikit-image v0.14 which took into consideration the RGB data of the image, while the old HOG function allowed only grayscale images.
Using color images as training and testing improved the benchmark by 10%, from 32% to 42% accuracy on the test set.
Using the CIFAR-10 dataset, trained the Inception-v3 model network on 1/10th of the training dataset. Run the 1/10th through a standard SVM classifier with rbf kernel, got an accuracy of 82%. Tasked myself to improve the score to cross the 90% threshold.
First, run a grid search on the SVM classifier with rbf kernel, reaching 86.9% average accuracy on a 2-fold set with regularization parameter set to 2E+15 and gamma set to 2.00E-05. An improvement.
Next, I trained the Inception-v3 model on the full dataset, instead of the 1/10th.
Following that, I run a grid search using a linear svm. Using 1/10th of the dataset, again run 2-fold cross validation test to find the optimal regularization parameter.
First broad run got 0.871 (+/-0.007) average accuracy with C set to 0.002.
Next, I narrowed the range for C to smaller and smaller until I got the optimal C value, still comparing results on a 2-fold set. It converged to C = 0.002666 which got 0.873 (+/-0.004) accuracy.
After funding the optimal parameters, I trained the SVM classifier again on 1/5th and then on the full dataset and run the test.
Small project to learn PyTorch.
Run a CNN on FashionMNIST from Zalando dataset, to classify clothes.
Accuracy of the CNN on the 10’000 test image set: 85%.
An image classification program which, transferring learning from Inception v3 (trained for the ImageNet Challenge), classifies a given image as a daisy, sunflower, dandelion, tulip or rose. I used TensorFlow & Python for this project. It achieves and accuracy of ~90%. Link to code.
Image compression
The project used the k-means clustering algorithm. It scans an image and returns it using only the most popular 16 colors in the image. The other colors are assigned based on which of the other 16 colors they are closest to. The compressed image weighs 7 KB less, which means it was compressed by 64%.
NLP - Text Classifier
The program tried to find the most popular customer questions asked by email. It took a whole mailbox, and filters only to get statements sent by customers. Then, it classifies each statement into Accept, Bye, Clarify, Continuer, Emotion, Emphasis, Greet, No Answer, Other, Reject, Statement, System, Wh-Question, Yes Answer andYes/No Question.
It uses a naive Bayes classifier, trained on the NPS Chat Corpus.
On the training set, it got a 63% classification accuracy. Seems a lot but the results were mostly unusable. Link to code.
SVM - Spam Classifier
This program used support vector machines (SVMs) to build a spam classifier. It achieved a 98% accuracy.
Neural Network - MINST Dataset
The classic ML project. Built a neural network from scratch which recognized handwritten digits form the MINST data set. It got 97.52% accuracy on the training set.
Backpropagation for Neural Network
Expanding on the Neural Network project, added backward propagation from scratch to the neural network.
A simple introduction project to get acquainted with fundamentals of machine learning through logistic regression.
The program computes costs for logistic regression (regularized and non-regularized), its gradients (regularized and non-regularized) and a prediction function. It achieves a 89% prediction accuracy for non-regularized and 83% for regularized functions.
Anomaly Detection & Recommender Systems
In this project, I implemented an anomaly detection algorithm and applied it to detect failing servers on a network. Next, I used collaborative filtering to build a recommender system for movies.
The anomaly detection system estimates a Guassian fit, and highlights anomalies (servers which failed). It finds the optimal threshold value for an anomaly based on a crossed validation set based precision and recall (F1 score).
The movie recommender system computes recommended movies based on other users reviews. It creates a gradient for each movie and each user preferences based on their reviews.
As Google Code-In (GCI) comes to an end, it’s a good time to reflect on my experience participating for the first time in the competition and also to see what I’ve learnt over the past 7 weeks.
Before the competition, when I was looking for the organisation with which I’ll work on for the next 7 weeks, I didn’t look at what language their code base was. Most used C++, Python, Java and other scripting languages that aren’t my strong suit. OpenMRS drew me in because it was putting a huge emphasis on the community. They had a long description about the organisation, a Talk topic asking everybody to introduce themselves and a wiki entry to get us started. The choice was so obvious for me I even thought about switching since this will be the most popular org., with a lot of other competitors. I’m glad I didn’t. Another thing that drew me to OpenMRS was we were both in the same boat. This was their first time participating in GCI and mine as well, so my thinking was, we’ll learn together.
As of writing I’ve completed 6 tasks. This may not sound like a lot but everywhere you go you’ll hear “Quality over Quantity”.
###Task #1: Complete OpenMRS developer setup on Windows OS
This was my first task. I started it the first day and it took me over 10 hours to complete it, which is quite a lot. Because a large amount of other students were starting with this task most errors were solved quickly. OpenMRS had a guide on how to install it but only then did we discover that there were build errors when using Java 1.8 instead of 1.7 or .6. Next error, when tests would fail during a clean install, was fixed by @wluyima within two days. During the install I was met with a lot of new technologies that I hadn’t came in contact with before, e.g. Tomcat, Apache Maven, Jetty, Eclipse IDE for Java EE Developers and MySQL. I had to install and set-up all of them which was time consuming. I tried different approaches to get it up and running: when Maven wasn’t running properly from the IDE, I added it to my PATH and tried running it from the command line. Then, I watched most of the videos available to try and diagnose my problem until wluyima patched the error and it compiled correctly.
On the third day I got it running.
###Task #2: CREATE A SET OF CODE FORMATTER STYLES FOR OPENMRS JAVASCRIPT FILES
This was a Level 3 task (5 being the hardest). It was very simple to complete as there were already a lot of resources available online on the topic. I did it in a much shorter timeframe than the previous task. I settled on styles that Google used.
###Task #3: RESEARCH: IMPROVE THE MODULUS SEARCH ALGORITHM
This was a Level 5 task. Around this time a lot of the tasks were gaining traction (had comments) but this task just stood in the corner and I started with it. There were other tasks in a similar position but Modulus drew me in because it was better aligned with my interest the other tasks. It was more of a web application then the OpenMRS Platform or the Reference Application. I would call this task my first true experience of developing in open-source, a right of passage if you will.
Modulus was built by Elliot Williams to make modules for OpenMRS more accessible. It used MySQL, OAuth, OpenMRS ID, Grails, Groovy and Java. But that’s only the backend. The frontend is Modulus-UI which used Angular, Node, Grunt and Bootstrap. To make all these technologies work together was very hard, especially for someone who never even made a website. All this was new to me, so I had a lot of questions. The set-up documentation was lacking, so what ensued was a 28+ mail conversations spanning 24 days and 10 errors. This taught me patience and how much time a good documentation can save. I had errors concerning OAuth, an improper .config file for Grails and missing important parts of it and lack of git submodules. Then there was an improper clientID between the UI and backend, no MySQL database containing the information to be displayed and some errors that even though they were displayed everything worked correctly. Elliot solved all of them. Every. Single. Time.
This led my to create a whole new wiki entry on the process which enabled further GCI participantes to get Modulus set up and was linked to from the Github Repo’s.
The write-up of all improvements has its own wiki page. It was my first experience with the grails plugin, Searchable, which is built on Apache Lucene. I made a few improvements and submitted a pull request, which might be merged at the end of GCI.
It would have been better if we worked on top of others improvements, which would make making the same changes twice impossible and our results would have been more effective.
Task #4: MOD-54: READ DEPENDENCIES FROM CONFIG.XML
After scrolling through the available GCI-OpenMRS tasks and finding nothing I felt would be worthwhile both for myself and the organisation, I started looking into other projects issues available on JIRA. I was again drawn to Modulus and reached out to Elliot asking him if it was all right for me to do two tasks as part of the competition. He agreed and added MOD-54, MOD-40 and MOD-87. This made me realize how understanding the mentors are and if you suggest an idea they’ll probably agree to it.
The task was quite simple, just 5 lines of code. But it was useless. It read the dependencies, returned them and that’s it. Nothing read it, changed, touched in any shape or form. I wanted to add a finished part not start on it and give it away for somebody else to finish. Useless.
After an hour long discussion on IRC (just imagine how long it’d take by email!) about how I should go about fully implementing it to the whole Modulus ecosystem and Elliot giving me lines of code that were really useful, I embarked on my ever first backend job.
I had opportunities before on working with a database but I shied away from them, fearing I would break them. Thankfully, git gave me courage to do it (git revert). Now my reading from the config.xml file became useful, by adding a variable and assigning the required modules during module creation I was able to display it on screen for the user. But after saving, it disappeared and wasn’t visible in the JSON file that the server returned. This was due to not adding the variable (with a hasMany status) to the MySQL database.
After it was returning in JSON format properly, I added the UI components for it as well, both in the “Upload new module” section and when displaying a module. Since the dependencies where a string I also linked them back to their respected pages to make the download process go as smoothly as possible.
###Task #5: CREATE A VIDEO TUTORIAL SHOWING HOW TO INSTALL & RUN OPENMRS
This task was completed because the last tutorial was from 3 years ago and a lot had changed. A fellow participant already made a tutorial on how to do it on Windows, but nobody touched the Mac version.
###Task #6: COMPLETE OPENMRS DEVELOPER SETUP ON MAC OS AND REPORT OUTCOME
This task was the same as my first task but it went much more smoothly. I also had some problems so I debugged them and updated the Wiki entry on the subject for future developers.
Non Tasks
There were a couple items I did that didn’t count toward GCI.
#1: Song:
I created a song to the beat of “Uptown Funk” by Mark Ronson. I didn’t get to shot a music video for it but I completed the lyrics and recorded myself singing them.
This click, that upload
Burke and Paul, that white gold
This one, for them patients
each nations
Straight masterpieces
Stylin’, wilin’
Livin’ it up in a city
Got forms on with Hibernate
Gotta ‘like’ myself I’m so pretty
[Pre-Chorus]
I’m too hot (hot damn)
Call the doc and the assistant
I’m too hot (hot damn)
Make a scribe wanna retire man
I’m too hot (hot damn)
Say my name you know who I am
I’m too hot (hot damn)
And my repo ain’t ‘bout that money
Break it down
[Chorus]
Nurses hit your hallelujah (Woo!)
Users hit your hallelujah (Woo!)
Devs hit your hallelujah (Woo!)
Cause OpenMRS gon’ give it to you
Cause OpenMRS gon’ give it to you
Cause OpenMRS gon’ give it to you
Saturday night and we in the ward
Don’t believe me, just watch (come on)
Don’t believe me, just watch
Don’t believe me, just watch
Don’t believe me, just watch
Don’t believe me, just watch
Don’t believe me, just watch
Hey, hey, hey, oh!
[Verse 2]
Stop
Wait a minute
Fill his cup put some liquid in it
Take a sip, sign the note
Julio! Get the pulse!
Work in Kandy, Libya, Kenya, Indiana
If we show up, we gon’ show out
Smoother than a fresh banana
[Pre-Chorus 2]
I’m too hot (hot damn)
Call the doc and the assistant
I’m too hot (hot damn)
Make a scribe wanna retire man
I’m too hot (hot damn)
Nurse, say my name you know who I am!
I’m too hot (hot damn)
And my repo ain’t ‘bout that money
Break it down
[Chorus]
[Verse 3]
Before we leave
Lemme tell y’all a lil’ something
OpenMRS, OpenMRS
OpenMRS, OpenMRS
I said OpenMRS, OpenMRS
OpenMRS, OpenMRS
Come on, code
branch on it
If you implement then flaunt it
If you commit it then own it
Don’t brag about it, upload it
Come on, code
Branch on it
If you implement then flaunt it
Well it’s Saturday night and we in the ward
[Part-Chorus]
Don’t believe me, just watch (come on)
Don’t believe me, just watch
Don’t believe me, just watch
Don’t believe me, just watch
Don’t believe me, just watch
Don’t believe me, just watch
Hey, hey, hey, oh!
[Outro]
OpenMRS, OpenMRS (say whaa?!)
OpenMRS, OpenMRS
OpenMRS, OpenMRS (say whaa?!)
OpenMRS, OpenMRS
OpenMRS, OpenMRS (say whaa?!)
OpenMRS, OpenMRS
OpenMRS, OpenMRS (say whaa?!)
OpenMRS
#2: MOD-87 - Fix Scrolling Bug:
While I was working on MOD-54 I completed this task to a working degree but since it wasn’t claimed by me on Melange, Erway Parker claimed and completed it properly.
#3: MOD-40 - Add UI Config for Google Analytics:
This was the last task I claimed during the competition. I learnt how to use grunt, the taskrunner, to add a script containing the GA code into every page. Simple and useful. It was a pull request before the end.
#4: MOD-66 - “Delete” link is dangerously ambiguous:
I wanted gain a better understanding of the Modulus webpage, so I turned to the UI and completed a fairly simple task of changing the wording and position of the “Delete” button.
I gained a better understanding of Bootstrap’s classes, Angular.js templating and DOM. It was accepted and merged into the main repository.
###What comes next…
Thanks to GCI, I had enough strength to tackle my great fears (backend) and learn a lot about production code. Most of my previous projects were my own, so diving in and getting to know an unfamiliar codebase was great! It also opened my eyes to the existence open-source companies (my previous knowledge was only of Linux).
After such an intense 7 weeks, I’ll take a step back and focus on my own projects, which were neglected. I’ll return to OpenMRS in the near future, to continue helping out in a small but important way, just like I did before.
Google Code-in will start in 5 days. As always I have set my hopes high and more precisely on the grand prize, a paid trip to Mountain View, California. Knowing me, and most people, the high you get from such visions of greatness unfortunately fades off rather quickly. You tell yourself you’ll do all the work well and properly but that doesn’t always work out. A good example of this is returning to school after a term break or a summer, set on doing better then before and after a few weeks returning to the old ways. I think everybody has had a similar same experience in their lives.
How do I plan to combat this?
I have some plans and will see how that’ll work out, the plans are similar to tips usually given to procrastinators.
Think of what I’ll accomplish,
what a high that’ll be and what an achievement for me. Take some of that from the future to power the present day. Since some mentoring organizations are taking part in this program for the very first time so if someone is chosen as a Grand Finalist might go into history.
It’s the journey that counts.
Although it’s cliche to say it (and is it cliche to say it’s cliche?) we humans are bad at delayed gratification, so bad that whole industry’s have evolved to capitalize on it (e.g. games, fashion). A constant drive that is pushed by the community/software/challenges will contribute little steps towards the final goal.
Future Plans
As of writing this, I plan on contributing to OpenMRS. I never thought about free medical software also I’ve never encountered open source projects with such breadth, of course my world of open source before this were Objective-C trending repo’s on Github, small libraries with specific usage. However, I don’t want to set anything in stone so I’ll wait until 1st of December and see what tasks they and others set.
Saturday couldn’t come fast enough. A long awaited event among many technology-orientated students.
We come together from over 20 Countries and 50 Universities to meet each other and see amazing people that want to share their knowledge. We heard talks on a wide range of topics, from inspirational speakers.
The event was kicked of by Matt Clifford, co-founder of Entrepreneur First (EF), in his speech he gave an overview about the participants, describing them as “the future of technology in Europe” and revealed EF’s mascot, a honey badger named Stoffle. Why a honey badger? Because a founder cannot be imprisoned and has to be determined, like honey badgers. A member of the EF 2015 Cohort came up on stage to present his app [Click] [http://clickapp.co/] and invite us to test it out during the event.
The opening speaker was Ben Medlock, co-founder of SwiftKey. He talked about the evolution of SwiftKey. How it started, what he studied and where his startup is now and how it works (>150 staff and offices on 3 continents). A Short Q&A session followed.
Avid Larizadeh was the next entrepreneur to take the stage. She argued that “Europe is a great place to build a tech startup”. Her arguments were well based considering she founded her own startup, Boticca, in London and now works for Google Ventures. Avid shared with us the challenges and opportunities in Europe, especially in London. A Short Q&A session followed.
Lunch followed and an opportunity to talk to other participants more freely. Ben Goldsmith, Head of Content at Level 39, took the mic to tell us about our host (L39, an accelerator for smart cities, retail and fintech). After him followed Dan Quine with his talk “A trip into Sci-fi future”. He told us about his background in both academia and the working world, where he worked and how he missed the focus and passion of startups. From his experience of startups, six total, imparts on us what he learnt, what he believes entrepreneurs are, why London is great for startups and why VC’s are great. A Short Q&A session followed.
The first panel of the day came after Dan Crow. Titled “Why here? Why now?: Starting up in London” the panelists were: Alesis Novik and Maria Stylianou (from the 2014-15 EF Cohort), Hugh Collins and Rashid Mansoor (EF Alumni). Kicked off by answering what there startup is about, how it came about and why they joined EF. Followed by what’s life like as a startup founder. A Short Q&A session followed.
The return from a short afternoon break signaled the beginning of the end. “Tech Bytes” was a series of Ted-style talks from three students that wanted to share. First up was Alex Gamble from UCL talking about Bitcoin from first principles. He was followed by Ainsley from Imperial Collage London whose talk was titled “Hacking without getting hacked”. Ainsley gave tips and suggestions on how to create products with better security. The last talk was by Alex Waller of St. Andrews University about “Risk Sensitive Surveillance with Optimal Sensor Quality for Distributed Robotic Systems” (Simpler: Drones fly were risk is greatest - explanation).
The second and last panel of the day followed the talks. “Hacking in Europe: A Student Perspective” was about students who’ve organized a tech-related meet or society in their community. They told us what kind of events they have organized (hackathons, meetups etc.), how it evolved to what it is today. They encouraged us to take part in all hackathons and told us their own events that are coming up.
The closing talk was by Alice Bentinck, co-founder of EF, titled “Be a Founder, not a follower”. I won’t spoil this amazing talk, one of many that day, since you can watch it online on Youtube.
“Networking Drinks” was the closing act of the day.
With inspired and motivated individuals leaving EFUnhacked I deem this event successful.