You are here: TWiki > Documentation Web>StorageGridFTP (06 Dec 2016, KyleGross)

Overview of GridFTP in the Open Science Grid


GridFTP is an extension to well known file transfer protocol (FTP). Globus Alliance describes GridFTP as a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. It is based upon the Internet FTP protocol, and it implements extensions for high-performance operation that were either already specified in the FTP specification but not commonly implemented or that were proposed as extensions by our team. The current GridFTP protocol specification is now a 'proposed recommendation' document in the Global Grid Forum (GFD-R-P.020).

GridFTP extends the FTP standard with:

  • Strong authentication, encryption via Globus GSI on both control (command) and data channels
  • Multiple data channels for parallel transfers
  • Third-party transfers: C can initiate a transfer from A to B.
  • Partial file transfers
  • Reusable data channels
  • Tunable network & I/O parameters
  • Server side processing
  • Command pipelining

GridFTP is the protocol. A server or client that implements the GridFTP protocol is GridFTP-enabled or Grid-enabled. Although you will hear about "GridFTP servers", the correct term is "GridFTP-enabled servers". Groups other than Globus can release GridFTP-enabled clients & servers (e.g., UberFTP, which is included with the OSG release, see also uberFTP how to).

Simple File Transfer through GridFTP

In a basic file transfer, we would move a source file from Site A to Site B, perhaps so that we could do some operation on it on Site B's computers.

GridFTP works slightly differently than simple FTP. In Figure 2, Site A, which holds archived data we want to use, runs a GridFTP-enabled server. Site B, where we want to run our job, runs a GridFTP-enabled client. We will want to move the file from Site A to Site B.

The data channel is the communication link(s) over which the data of interest (in this case, the file we wish to move) flows from one place to another. This high-bandwidth link is authenticated by default, with the option of providing encryption and integrity protection.

The control channel is a low-bandwidth TCP link over which commands and responses flow, and is encrypted and integrity-protected by default.

Direction of Control

In GridFTP, control can go either way, depending upon which site is initiating the transfer. Control channel can go either way. Site A can initiate the transfer to Site B, or Site B can initiate the transfer to Site A. The direction of control depends on which is acting as the GridFTP-enabled client and which the GridFTP-enabled server. Figure 3 shows how the control

Even though Site A initiates the transfer, meaning that Site B is acting as the "server", the data channel is still in the same direction as in Figure 2.

Third party transfer

GridFTP also allows for the Controller to be at a separate location from both the source and destination locations. This third-party transfer lets a researcher at a remote site initiate a data transfer from Site A to Site B, where he has access to computing facilities.

Speeding Up GridFTP

There are several ways to increase GridFTP transfer speeds. However, speeds are always affected by factors that you can't control, such as the network weather.

Parallel Streams

Where supported by the installations, GridFTP supports using parallel streams to transfer data. Instead of a single data channel, it uses several at once, as limited by the installation and its connections.

Striped Transfers

GridFTP also supports striped transfers of data, where both the source and destination use multiple servers all with separate GridFTP data channels.


Of course, for truly speedy transfers, use parallel streams with striped transfers.

Use large TCP windows

$ globus-url-copy -vb -p 4 -tcp-bs 1048576  / gsiftp://ldas-cit.ligo.caltech.edu:15000/usr1/grid/largefile file:/tmp/largefile
    514392064 bytes      6609.67 KB/sec avg      8639.71 KB/sec inst

Use large memory buffers

$ globus-url-copy -vb -p 4 -bs 1048576 -tcp-bs 1048576 gsiftp://ldas-cit.ligo.caltech.edu:15000/usr1/grid/largefile file:/tmp/largefile
    523304960 bytes      7300.56 KB/sec avg      9311.99 KB/sec inst

Debugging

Use –dbg to see control channel communication.

Hints for Experts

To make GridFTP go really fast
  • Use fast disks and filesystems: your filesystem should read & write >30 MB/s
  • Configure TCP for performance
  • Patch your Linux kernel with the web100 patch, as this has an important work-around for Linux
  • Understand your network path


Based on "Lecture 4: Grid Data Management?" (July 2005: OSG Summer School) by Ben Clifford (UChi) from work by Bill Allcock (ANL), Jaime Frey (UWisc) & Scott Koranda (UWMilw)

-- ForrestChristian - 17 Nov 2006

Topic attachments
I Attachment Action Size Date Who Comment
pngpng GridFTP_3rdPartyTransfer.png manage 46.7 K 17 Nov 2006 - 19:03 UnknownUser GridFTP third-party transfer
pngpng GridFTP_ControlChannel.png manage 34.2 K 17 Nov 2006 - 19:02 UnknownUser GridFTP transfer showing control channels
pngpng GridFTP_ParallelStreams.png manage 34.7 K 17 Nov 2006 - 19:04 UnknownUser GridFTP transfer using parallel streams
pngpng GridFTP_Simple.png manage 34.3 K 17 Nov 2006 - 19:51 UnknownUser Basic GridFTP transfer
pngpng GridFTP_StripedTransfer.png manage 44.8 K 17 Nov 2006 - 19:04 UnknownUser GridFTP striped transfer
pngpng GridFTP_StripedTransfer2.png manage 32.1 K 17 Nov 2006 - 19:04 UnknownUser Striped transfer diagram
pngpng SimpleFileTransfer.png manage 27.8 K 17 Nov 2006 - 19:02 UnknownUser Simple file transfer
Topic revision: r23 - 06 Dec 2016 - 18:12:39 - KyleGross
Hello, TWikiGuest!
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..