College of Liberal Arts & Sciences

Iowa State University
INDEX
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

College of Liberal Arts & Sciences

College of Liberal Arts and Sciences
LAS Calendar | E-Mail/Phones |
  • Bringing science to society

    Learning from disparate data sources may become more manageabl thanks to the work of computer science's Vasant Honavar

  • The amount of data available to researchers and the general public is mind-boggling.

    Development of high throughput data acquisition technologies together with advances in computing and communications have resulted in an explosive growth in the number, size and diversity of potentially useful information sources.

    Examples of such data respositories in biological sciences include Genbank (a database of genome sequences), and Protein Data Bank (a database of protein structures). NASA maintains large repositories of data gathered from satellites while the U.S. Census Bureau and the Environmental Protection Agency maintain information that is accessible to the public.

    In principle, scientists or interested laypersons should be able to use such data to explore specific scientific questions. But in practice, our ability to exploit disparate, autonomously maintained data sources is hindered by the massive size of the data repositories and unavoidable semantic differences among them.

    "If you are a scientist, you don't want to spend months writing code," said Vasant Honavar, professor of computer science. "If you have to spend months writing code in order to extract the data that you need in the form that you want from existing data repositories before you can analyze the data, it hinders your ability to use available data effectively to explore scientific hypotheses. If you had the right tools, you could potentially pose a question and get an answer in 30 seconds instead of two years.

    "And because of the large amount of data, ideally you would want to do the analysis where the data and computational resources are available, instead of retrieving huge amounts of data when all you are interested in are results of analysis."

    Honavar is currently conducting research to make that task a little easier. He has received funding from several sources including a four-year $1 million grant from the National Institute of Health to develop and use computational tools for data-driven characterization of protein sequence-structure-function relationships (in collaboration with Iowa State faculty members Robert Jernigan and Drena Dobbs) and a three-year, $210,000 Information Technology Research grant from the National Science Foundation (NSF) to develop some of the necessary algorithms and software.

    Honavar's research over the past several years has been supported by a number of sources including the NSF, the Department of Defense, the Carver Foundation, Pioneer Hi-Bred, IBM, John Deere, and Iowa State.

    "Our research is aimed at overcoming some of the challenges in data-driven scientific discovery through the design, analysis and implementation of algorithms and software for knowledge acquisition from heterogeneous distributed data," Honavar said. "The challenge is to extract, integrate, and learn from semantically heterogeneous data."

    But it's not just researchers and scientists that Honavar hopes to be able to assist with the new algorithms and software.

    "A longer term goal would be for a layperson, such as a high school student or a journalist, to examine if certain findings are supported by data," he said. “"We want to develop the software infrastructure that can engage all interested individuals in discovery.

    "This type of technology can make scientific data and analysis tools available not only to specialists but to anyone that is interested in it," he said.

    Honavar and his research group are planning to customize information extraction agents that can effectively exploit domain or context-specific ontologies supplied by the users to extract the information needed for learning from distributed data sources. They hope to accomplish this regardless of differences in query capabilities, interfaces, and ontologies, and under privacy constraints.

    "We think you can do this in restricted access settings, such as hospital records," Honavar said. "There may be data available in hospital records that researchers could use to analyze any number of diseases, but they can't get to it because of privacy issues."

    Honavar would like to develop privacy-preserving data-mining algorithms for applications for such areas.

    Honavar is working with a team of nine graduate students and two undergraduate students and collaborators from several other disciplines on these projects. His team is working with Dobbs and Jernigan on developing a test-bed knowledge acquisition from heterogeneous distributed data in computational molecular biology aimed at discovery of protein sequence-structure-function relationships.

    Honavar's group collaborates with computer science faculty members Johnny Wong, Les Miller and Robyn Lutz on applications in security informatics. He is also working with Iowa State faculty members Heather Greenlee and Jan Buss on applications in gene expression analysis, and with James McCalley on applications in power systems.

    Honavar leads the Computational Intelligence, Learning, and Discovery (CILD) Program which aims to foster cross-disciplinary research on applications of artificial intelligence, and in particular, machine learning, in scientific discovery.

    He will discuss his team's work in an Institute of Science and Society seminar on Tuesday, Feb. 24, in 302 Catt Hall at 12:10 p.m.

Vasant Honavar in front of computer

Around LAS
February 23 to March 7, 2004

Air Force Aerospace Studies - Anthropology - Biochemistry, Biophysics & Molecular Biology - Chemistry - Computer Science
Ecology, Evolution & Organismal Biology - Economics - English - Genetics, Development & Cell Biology - Geological & Atmospheric Sciences
Greenlee School of Journalism and Communication - History - Mathematics - Military Science - Music - Naval Science
Philosophy & Religious Studies - Physics and Astronomy - Political Science - Psychology - Sociology - Statistics - World Languages & Cultures

African American Studies - American Indian Studies - Biological/Premedical Illustration - Bioinformatics and Computational Biology
Classical Studies - Communication Studies - Criminal Justice Studies - Environmental Science - Environmental Studies - Interdisciplinary Studies
International Studies - Liberal Studies - Linguistics - Software Engineering - Speech Communication - U.S. Latino/a Studies - Women's Studies