My research group has an opening for one undergraduate to work on a
Machine Learning project, starting this fall. The specific area is
Non-linear dimension reduction/Manifold learning (NLD). The goal for
this project is
(1) *efficient* implementation of NLD algorithms in python. The
current implementations run on thousands of data points (Matlab), 1
million (python). Can you rewrite them to run on 100M? on 1B?
(2) study real world data sets and discover their features, using
the algorithms you implement
– spectra of galaxies from large sky surveys
– the benchmark image data sets CIFAR-10 and CIFAR-100
www.cs.toronto.edu/~kriz/
– recordings of brain activity
The software will ultimately (possibly as soon as the end of the
fall quarter) become a component of scikit-learn.
Requirements. To participate, you MUST:
– be a an expert with cython, numpy and other python scientific
computing libraries (send me the name of a github repository with
code by you, or equivalent proof of expertise when you apply)
Highly desirable (you will gain more from the experience)
– basic notions of probability, statistics and mathematics
– a course in algorithms and data structures
– a curious mind
Rewards for you:
– experience with modern machine learning
– experience with the statistical study of large real data sets
– co-authorship of the package
– 2-4 credit hours
[- depending on your dilligence: co-authorship of research papers
resulting from this project]
What if you are interested but are not a python expert? I cannot work
with you until the python project is underway. But if I do find a
person for this first priority project, then I may have 1-2 openings
in the same area. So, drop me a line.
Marina Meila
,_ o Marina Meila Dept of Statistics Padelford B – 321
/ //\ Associate Professor U of Washington Box 354322
__\>>_|__ mmp@stat.washington.edu Seattle WA 98195-4322
\\, www.stat.washington.edu/mmp phone: 206-543-8484