Meta-Learning: Learning to Learn
Configuring and using machine learning systems is difficult. It requires considerable expertise to select an appropriate algorithm, transform and clean the data, and set the parameters of the learning system. The goal of the Meta project is to come to a better understanding of this process and, using meta-learning, automate significant parts of the process. Elements of this understanding include how learning system parameters affect learning performance, and what the right algorithms and parameter settings are for problems, all based on the data and goals of the learning problem.
To learn requires data, and to meta-learn requires data about the learning process. A key element of this work is to capture the performance and behavior of learning system runs in a database, the data in that database being usable for meta-learning experiments. Important elements of this data collection is that it be independent of the underlying learning system and capable of recording arbitrary information about learning executions. Other important elements of the Meta system are MetaLang, a scripting language for knitting together sequences of learning system executions using tools like designed experiments and searches over a learning parameter space, and providing visualizations and tabular presentation of the meta-learning results, and an interactive GUI for configuring learning experiments and visualizing their results.
Below we show the relationship of mean squared error based fitness as a function of the complexity of the structure learned by the machine learning algorithm.
Equipment Health Monitor (EHM)
The Equipment Health Monitor (EHM) is a generic tool that can learn to monitor arbitrary sensor streams. Given a small set of training streams labeled good, the EHM will learn the expected normal time evolution and statistical properties of the sensor readings. The THM can then begin real-time monitoring of sensors and raise alarms when an unexpected behavior is detected. While monitoring, the tool will refine the learned model of normal behavior. Depending of the configuration, a certain rate of drift will be tolerated.
The tool health monitor is fundamentally different from classical model-based FDI (fault detection and isolation) in that machine learning techniques are used to learn a model that can be significantly more complex than manually constructed ones. The THM can detect shifts in sensor values, changes in sensor noise, spikes, and other failure symptoms.
The THM is being used to monitor hundreds of sensors in production runs of several kinds of semiconductor manufacturing tools.
Computer Vision/Analytics for Stem Cell Production
We are working to take mass production and banking of stem cells from science fiction to a reality for commercial applications such as fundamental research, curing diseases, tissue/organ replacement, and personalized drug testing. Our work has focused on induced pluripotent stem cells, which are stem cell made by "reprogramming" more readily accessible cells such as skin cells, these cells can then be used to create any cell in the human body through the proper differentiation process. One of the advantages of inducing stem cells is that the cells should be 100% biologically compatible with the donor, so patients seeking stem cell treatments may one day donate their own cells for the treatment they need. We are creating new ways to assess stem cell quality and differentiation capability using ideas from data science, computer vision, and machine learning.
We are mining hundreds of gigabytes of data including enzyme, metabolome, medium concentration, karyotype, and image data. The stem cells are grown over several days before they are ready to use for seeding new cell colonies or taken for other uses. The image data has been gathered across days at multiple scales and includes brightfield, phase contrast, fluorescent imagery (such as DAPI, OCT4, SOX2, and more.), as well as images from a custom imaging system. The problems we are solving span the full range of machine learning from unsupervised to supervised and reinforcement learning, including both regression and classification.
We are seeking individuals with strong backgrounds in areas such as Machine Learning, Computer Science, Statistics, Mathematics, Computer Vision, Image Processing, Data Mining, Data Visualization, and Database Management; who are familiar with techniques such as deep learning, genetic programming, support vector machines, Gaussian mixture models, k-means clustering, data mining, and principal components analysis to join our team.
Below we show the results of automated object recognition and color coding using deep learning.