UDT
UDT, or UDP-based Data Transport (UDT) protocol, is an application
level transport protocol designed for distributed data intensive
applications. The new protocol is motivated by the growing importance
of wide area high-speed optical networks, in which applications
employing TCP generally fail to utilize the available bandwidth.
UDT demonstrates 1) good efficiency (it utilizes available
bandwidth quickly); 2) good friendliness (UDT is friendly to flows
independently of their RTT and also friendly to TCP flows sharing the
same bandwidth); and, 3) good fairness (UDT is fair to other UDT based
teraflows). UDT is designed to be deployed in high performance
computing environments in which a small number of teraflows share
bandwidth with each other along with accompanying TCP control flows.
It combines both rate-based and window-based control and uses
bandwidth estimation to determine the control parameters
automatically.
UDT is open source and available from Source Forge. The current
release is Version 2. Detailed information on UDT can be found at
udt.sf.net
Version 3 will be developed using a framework for high performance
protocol development called the Composible Protocol Development
Framework (CPDF). Using CPDF, protocols with varying and specialized
congestion control mechanisms can be developed easily.
SOAP*
SOAP* is an open source library for high performance web services.
SOAP* combines a TCP/XML based control channel with a separate data
channel that can employ: 1) specialized protocols such as UDT; and 2)
alternatives to XML that provide greater efficiency for large data
sets.
An open source implementation of SOAP* is available from Source
Forge as part of the DataSpace Transfer Protocol (DSTP) framework.
DSTP is a framework designed for exploring and analyzing remote and
distributed data. SOAP* web services using the current version of
DSTP have been used successfully in applications employing 1 Gbps data
flows.
The next release of SOAP* will be independent of DSTP and will be
designed to scale to 10 Gbps data flows.
Teraflow services for processing, exploring, and analyzing are
generally built over SOAP*.
High Performance Scoring Engines
The Predictive Model Markup Language or
PMML is an open standard for
statistical and data mining models that is supported by over two
dozen vendors. Traditionally, deploying data mining into operational
systems has been very labor intensive. This is especially true of
high performance or distributed applications.
Over the past few years, light weight, high performance PMML-based
scoring engines have been developed. Once integrated into an
operational system, a new statistical model can be deployed simply by
updating the PMML file. This is beginning to change dramatically how
statistical models are deployed.
High performance open source scoring engines are an important part
of the teraflow services middleware. An initial version of a scoring
engine is available now and a full release is expected in the 3Q
2005.
|