James Lee

Maxime Daigle

Graduate Student

Contact Me

About Me

I'm going to be working on the intersection of Machine Learning and Neuroscience as a Ph.D. student at McGill and Mila. I'm currently finishing my Master's in Machine Learning at Mila and Université de Montréal. Before that, I obtained a Bachelor's in Mathematics and Computer Science. My research interests include representation learning and memory.

Work Experience

Research intern - Jacobb - Applied AI Research Centre

Jacobb is a nonprofit organization that focuses on propagating the benefits of AI. I worked on Self-supervised Learning for Multivariate Medical Time series.

Data scientist - Government of Canada

I built a combinatorial optimization simulation to maximize the number of inspections done in a year. With the different iterations of solution, I implemented multiple metaheuristics and stochastic optimization methods such as Simulated annealing, Tabu search, Genetic algorithm, and Large neighborhood search. I also did more typical data science tasks such as cleaned, explored, analysed, visualized data and communicated findings.

Research intern - Hanalog (Canada research chair in Healthcare Analytics and Logistics)

For this internship, I received the IVADO Undergraduate Research Scholarships (Canada First Research Excellence Fund). I worked on a Multi-Objective Combinatorial Optimization problem in the Hanalog laboratory. Specifically, I worked on finding the best way to compromise between the multiple objectives. The research was done in partnership with AlayaCare. The algorithm that I worked on enables more home care visits.

Cloud Computing and Devops Programmer - Government of Canada

In the Chief Information and Security Branch of the Natural Resources Canada federal department, I reverse-engineered servers and created a modified version of them in the cloud.

Selected Projects

delayed

Delayed matching to test neural networks' memory

This environment allows to test the memory of neural networks. The model has to remember the shape of an initial object and select the matching object after some delays. A cnn-lstm is trained to match the objects after a random delay of 1 to 30 frames.

Code

audio

Self-supervised learning for Audio spoofing detection

Because of the rapid development of better generative algorithms, the threat of spoofing is a moving target. It is, therefore, critical that anti-spoofing keeps pace with spoofing techniques. One important factor slowing the development of anti-spoofing algorithms is the continuous need of labeled data for the latest types of attack and the absence of labeled data in the case of unknown attacks. This project explore the the development of self-supervised methods to detect audio spoofing with less dependence on labeled data.

Paper

transformer

WGAN

This project compares a GAN trained with Squared Hellinger distance vs Wasserstein distance, and trains the WGAN on Street View House Numbers.

Bonus theoretical Code

transformer

Transformer

This project implements a GRU from scratch and the Multi-head attention module of a transformer. It also compares their performance on Penn Treebank Dataset (~1 million words).

Bonus theoretical Code

energy

Forecasting solar energy with satellite imagery

The amount of solar energy available is affected by numerous factors (atmospheric conditions, clouds, location, altitude, season, etc). This project uses satellite imagery to forecast how much solar energy is available for both present and future times at any given location on earth.

Technical report Code

Presentation

What should not be contrastive in contrastive learning

Lecture about (Xiao, 2020), given at Mila student-led seminar course on Self-supervised learning

Slides

Self-supervised learning through the eyes of a child

Lecture about (Emin Orhan, 2020), given at Mila student-led seminar course on Self-supervised learning

Slides

Other Projects

Social Media user profiling

Identify the gender, age, and personality traits of social media users by using the profile pictures, status updates and likes.

Code

Graph Learning

Two ways to learn nodes representation: Multi-hop Similarity and Random Walk.

Code

Hand-drawn images classification

Google has collected a series of hand-drawn images corresponding to a number of categories in their Quick, Draw! project. The dataset used in this project is a modified subset of this dataset. The dataset contains images that aren't always centered and straight, neither are they free of noisy artifacts. I use a modified Resnet18 to classify the images.

Code

Markov Decision Process

Policy iteration is a dynamic programming method to find the optimal policy. Here, it is used to solve small gridworld. Gridworld is a deterministic finite Markov Decision Process where we want to find an optimal path to one of the grey cells.

Code

Inference in Bayesian Network

I use two methods to compute approximate inference in bayesian networks, rejection sampling and likelihood weighting.

Code

Reddit comments classification

Classify comments on a forum (reddit) according to their topic (subreddit) using Naives Bayes, Neural Network, Bidirectionnal GRU (Gated recurrent unit), Glove word embedding, and Ensemble method. Tools used: Scikitlearn, Keras, nltk (stemmer, stopwords, lemmatizer).

Code

Monte Carlo tree search

This program uses Monte Carlo tree search (which is notably used in AlphaGo's algorithm) to find the next move to do in a game that has more than 10^50 possible move combinations (ultimate tic-tac-toe).

Code

Neural network from scratch

Implementation by hand of a neural network for multiclass classification. The implementation is done with only numpy. The calculation used to implement front propagation, back propagation and gradient descent are shown in the pdf file.

Code

Density estimator

I implemented two types of estimator, a diagonal Gaussian parametric density estimator and a Kernel density estimator with an isotropic Gaussian kernel. Then, I used them on the iris flower dataset to estimate the density of different features.

Code

Cross-entropy method on Taxi-v2 from OpenAi Gym

Cross-entropy is a Monte Carlo method for optimization. In this notebook, it is used to solve Taxi-v2 from OpenAi Gym.

Code

More projects