Apache Hadoop

Accelerate Hadoop Over Distance with Silver Peak

Get more out of your distributed Hadoop implementation. Silver Peak addresses the networking challenges that undermine Hadoop performance over distance. Remote users can run Hadoop queries faster and IT can backup and distribute data data in a fraction of the time. With Silver Peak, Hadoop performance over distance can improve by 20X. 

Apache Hadoop is one of the most popular frameworks for processing large-scale datasets. While effective within a location, Hadoop operations often suffer across the wide area network (WAN). The throughput across large-scale networks is insufficient for many Hadoop tasks. Common instances where the WAN may undermine Hadoop performance include:

  • Importing data from existing SQL tables into the Hadoop Distributed File System (HDFS).
  • Delivering post-processed Hadoop results to data analysts in remote locations. 
  • Processing high-volumes of Hadoop jobs (hundreds of jobs per hour) when some or all of those jobs depend on data from remote locations.
  • Replicating Hadoop offsite for disaster recovery purposes.

In all of these cases, the added delay of long-distances, routing policies, peering practices and congestion interfere with underlying network operations. This impact is independent of bandwidth. Theoretical throughput of a 100 Mbps connection within an offices can approach 100 Mbps; across a 50 ms connection, though, only 10.5 Mbps becomes possible. 

Silver Peak addresses these underlying networking problems. Customer experiences show that Silver Peak shortens Hadoop data replication times by as much as 20X. Silver Peak achieves these kinds of results by addressing the following challenges:

  • Bandwidth – Silver Peak byte-level deduplication removes redundant data from the WAN. The first time Hadoop sends data it is fingerprinted and compressed by Silver Peak. Subsequent requests are fulfilled by the local Silver Peak instance. 
  • Latency – Silver Peak mitigates latency enabling Hadoop to operate more efficiently over distance. TCP Acceleration includes windows scaling, selective acknowledgements, and HighSpeed TCP. Where applications must access a CIFS/SMB volume, performance is improved through CIFS Acceleration, which includes CIFS read-ahead, CIFS write-behind, and CIFS metadata optimizations. Latency is also reduced by re-packaging multiple smaller packets into a single larger one, and through Dynamic Path Control, which selects the fastest path to a remote location.
  • Congestion – Silver Peak makes Hadoop performance more predictable across congested WANs. Lost packets are reconstituted in real-time at the far end of a WAN link, avoiding delays that come with multiple round-trip retransmissions. Out-of-order packets are resequenced, avoiding retransmission and processing delays that occur when packets arrive out-of-order.

Silver Peak protects Hadoop data in-flight between locations with an Accelerated IPSec VPN running AES-256, the enterprise standard for data encryption. Data-at-rest is also encrypted with AES. End-to-end encryption is provided by SSL/TLS.

Silver Peak offers the most scalable and cost-effective data acceleration software. for connecting data centers, remote offices and the cloud. You can download and deploy it as a free, self-service trial in just 20 minutes or contact us to see how Silver Peak can help your organization.

Featured Products

  • VX WAN Optimization Software

    VX Virtual WAN Optimization Software

    Silver Peak VX WAN optimization software supports the full list of Silver Peak WAN optimization features. Use Silver Peak VX software to build a Silver Peak Unity WAN fabric for integrating your enterprise WAN, the Internet and the cloud, and to optimize every SaaS application.

  • NX WAN Optimization Appliances

    NX Appliances for Application Acceleration and Replication Acceleration

    Silver Peak NX WAN optimization appliances are the industry's highest-performance WAN optimization solutions, delivering three-times the WAN performance and capacity of the nearest competing products. Silver Peak Software for Life program means you can convert NX appliances to VX software at anytime for free.

  • Unity Architecture: Building an SD-WAN Fabric

    Silver Peak Unity is the premier solution for broadband and hybrid WANs.

Benefits

  • Import data from remote data-stores faster by dramatically improving the throughput of Hadoop over distance. 
  • Increase Hadoop’s value by gathering data faster, increasing the number of Hadoop jobs per hours.  
  • Hire the best talent regardless of location by passing MapReduce results quickly and efficiently to data analysts regardless of their locations. 
  • Avoid or delay expensive WAN bandwidth upgrades by eliminating redundant Hadoop traffic from the WAN.
  • Protect Hadoop with off-site data replication more efficiently by eliminating redundant Hadoop traffic from the WAN.

Resources

  • Networks have never been more critical to the success of IT and the business. New virtualization and Cloud technologies and services are remaking the face of IT and the way in which infrastructure is architected. But the common thread throughout is the network, which must be both highly available and high performing. The tools, technologies, and practices of network monitoring and management address these needs, and are thus essential to the success of every enterprise and governmental organization.This Enterprise Management Associates (EMA) research report takes a detailed look at the current state of networks and network management, and examines five major areas of change and evolution affecting network management, including Cloud and virtualization, Software Defined Networking (SDN), big data, the rise of log data and APIs as management data sources, and the ongoing convergence of network operations teams and tools.
  • Data Center Interconnect (DCI)  initiatives use technologies such as Cisco's Overlay Transport Virtualization (OTV) to extend data centers across locations. But in order for DCI to be effective, IT must address the bandwidth limitations, high rates of packet loss, and increased the delay of the WAN separating those locations. Data acceleration and WAN optimization, in particular, compensates for those challenges, enabling architects of software-defined data centers to meet DCI initiative.