TeraFlow Testbed TeraFlow Projects

Teraflow Data Services for the Sloan Digital Sky Survey (SDSS)

Astronomical data is growing at an exponential rate, doubling approximately every year The main reason for this trend is Moore's Law, since the power of the underlying hardware used by data collection and processing grows via Moore's Law. As one example, the Sloan Digital Sky Survey (SDSS) is mapping in detail one-quarter of the entire sky, determining the positions and brightnesses of more than 300 million celestial objects. It will also measure the distances to more than a million galaxies and quasars.

A research group led by Alex Szalay from John Hopkins University in collaboration with Jim Gray from Microsoft is building the science archive for the project. The first data from this project was released in 2001 and was about 80 GB in size. The second release (DR1) of data took place in 2003 and conisted of about 1 TB of data. The third release of data (DR2) took place in 2004 and consists of about 1.7 TB.

At a technical demonstration at the SC 04 meeting in 2004, this data was distributed using the UDT high performance data transport protocol (part of the teraflow data services). This is the first time this data was distributed via the network instead of by shipping disks of data. With UDT and high performance networks, the data could be transported over 1000x faster than with the TCP protocol as standardly deployed over today's networks.

The goals of this project are 1) to use teraflow data services to distribute SDSS data; and 2) to use teraflow data services to process the data continuously so the releases of data becomes a continuous process instead of an episodic one.

Up-to-date information on the SDSS Project can be found at SDSS Wiki

Pantheon Gateway Testbed

Today, research in data integration and data assimulation is hindered by the lack of availability to researchers of large collections of heterogeneous data that can be used for developing and testing new technologies. In this project, we are archiving highway sensor data, overhead imagery, text based data about special events that may affect traffic, and weather related data. These resources will be archived each day and made available to the community for testing novel data integration and assimulation strategies.

Today, this data is collected, but not archived, by the Gateway System that coves the three state, fifteen county Gary-Chicago-Milwaukee (GCM) corridor. The Gateway System uses fixed traffic sensors in addition to other data sources to compute real-time traffic congestion data and displays this data to the public at two websites http://www.gcmtravel.com and http://www.travelinfo.org.

The Pantheon Gateway Testbed archives this data, overlays additional data, and makes this available to the community as a resource.

Chicago Biomedical Consortium
Bioinformatics Data Integration Testbed

The Chicago Community Trust and Searle Family Foundation have recently awarded funds to UIC, UC and NW to create the Chicago Biomedical Consortium (CBC). The focus of the CBC will be on proteomics. In the first year of the project, the Chicago Biomedical Consortium will purchase advanced mass spectrometers, such as Time-of-Flight or Fourier-Transform Mass Spectrometers.

In this project, we are developing a data integration infrastructure for mass spectrometer data, which will archive mass spectrometer data and make it available as an open community resource in a format that facilitates its integration and leverages its ability to contribute to new discoveries. In particular, we are developing open source repositories for mass spectrometer data, and developing teraflow-based services for the real time discovery of proteins, and the integration of third party protein, text, and pathway databases.

telephone (312) 996-0305
e-mail staff@teraflowtestbed.net
address 700 SEO MC 249, 851 S. Morgan St. Chicago, IL. 60607