How to move big data into Amazon S3 using Signiant Flight
In this SlideShare, Matt Yanchyshyn, AWS Solutions Architect and Nelson Hsu, Signiant VP of Business Development & Alliances explain why companies are moving large data volumes to cloud storage, and how to use Signiant Flight to easily accelerate content into and out of Amazon S3.
SlideShare Transcript
Moving Big Data Into Cloud Object Storage | Signiant
Matt Yanchyshyn, AWS Solutions Architect
Nelson Hsu, Signiant VP Business Development & Alliances
How do we define Big Data?
- When your data sets become so large that you have to start innovating around how to store, organize and transfer them.
- Anytime you are gathering data or using data someone else has gathered for analytics.
Unconstrained growth: Big Data is moving fast
- From application server logs, web sites and mobile apps to sensor output, high definition film and satellite imagery, data is growing at an unconstrained and exponential rate
Why is Amazon S3 good for Big Data?
- There is no limit on the number of Objects
- Objects size can be up to 5TB
- Central data storage for all of you systems
- High bandwidth
- 99.999999999% durability
- Versioning, Lifecycle Policies and Glacier Integration
Moving Big Data to and from Amazon S3
Signiant launched a new product last year called Flight that provides an easy way for AWS customers to push large amounts of data into Amazon S3 (and easily pull it back out) without worrying about managing cloud infrastructure.
Signiant Flight is an easy way to move data to and from S3 at high speeds:
- When users frequently move large data sets into Amazon S3, like for processing EMR and Amazon Redshift.
- For batch-file transfers using manifests. For example, if you’ve pre-aggregates and compressed your data in order to optimize Hadoop.
A few ways businesses are using Flight:
- On-premises storage optimization
- Large-scale ingest
- Accelerate big data analytics
- Enable cloud-based workflows
Flight is hybrid Software-as-a-Service (SaaS)
Flight is the only SaaS solution on the market for accelerated file transfers to and from cloud object storage. The SaaS component makes Flight unique for several reasons:
- High Availability
- Global Performance
- Elasticity
- Cost Effectiveness
- Easy to Deploy & Use
- Rapid Innovation
Not all SaaS is created equal
Subscription + Management v. BYOL
While BYOL (bring your own license) models certainly have their place, there is a significant difference between them and Subscription + Management payment models like Signiant’s.
Namely, with BYOL, you still have to manage, maintain and support your own servers.
With Signiant’s subscription service, they cover all of that for you, significantly reducing both Opex and Capex.
A fully managed service
Signiant Flight eliminates the overhead of managing compute resources in the cloud. Signiant manages the server-side component – the Amazon EC2 instances running Flight servers and the Amazon S3 transfer components – while end users run a lightweight, client-side agent.
All you have to do is:
- Install the local client
- Authenticate with AWS and set which Amazon S3 bucket to use
- Start transferring files
Highly reliable without a complex setup
When you use Signiant Flight to send files to Amazon S3, its backend automatically scales during high-volume transfer cycles.
Flight’s backend is load-balanced across multiple Amazons EC2 instances spread across multiple AWS Availability Zones, so it is highly reliable without passing the complexity of configuration management on to you.
Encryption and Checkpoint Restart
Importantly, Signiant’s file transfer protocol also supports two features that are not supported in Tsunami UDP:
- AES-256 bit encryption
- Intelligent file transfer retries
If a transfer is interrupted for any reason, the transfer is restarted (using numerous file retry algorithms) and continues transferring from the point of interruption. If a file already exists in Amazon S3 and hasn’t been changed, Flight won’t upload the file.
Why is Flight so fast?
Signiant’s patented accelerated file transfer protocol is often called UDP acceleration, but it actually implements both an advanced TCP on top of UDP and an advanced FTP. If that intrigues you, read more about it here.
Basically, this minimizes the impact of WAN latency on throughput which results in considerably faster transfers, especially for large files transferred over long distances. Once files arrive on Signiant Flight’s AWS-based backend, servers managed by Signiant write the data directly into Amazon S3 over HTTPS with the multipart upload API.
Over long distances, Signiant technology minimizes the impact of latency, while being able to capitalize on increases in bandwidth. TCP based protocols do not benefit from increases in bandwidth and are very in efficient over long distances due to latency.
Setting up Signiant Flight
- Sign-up for Signiant Flight via the AWS Marketplace
- Create an IAM user with read/write permissions to the Amazon S3 bucket where you would like to upload your files
- Install the Flight client and ass the IAM credentials of the user you just created plus the Amazon S3 bucket where you would like to upload your files.
Note: Flight comes with a command-line interface and other client options. To learn more, check out Flight’s client options.
Setting up the Command Line Interface (CLI)
- Configure the Flight CLI by adding your credentials, target Amazon S3 bucket, and key to the config.cfg file.
- To transfer a single file with the CLI, just use the -d upload parameters. In the example below I used an m3.xlarge Amazon EC2 instance located in us-east-1 running the AWS base Amazon Linux AMI with no additional tuning. I transferred a 1 GB uncompressed file, generated using dd, to an Amazon S3 bucket located in US Standard. Importantly, this file is located on EC2 instance storage, so that Amazon Elastic Block Store (Amazon EBS) throughput doesn’t become a bottleneck and skew our testing. The average transfer rate in this case was around ~630 Mbps.
A more complex file transfer may involve a large number of files listed, one file per line, in a manifest:
- flight -d upload @manifest.txt -z -l
In this case, we use interactive move (-i) to see file transfer statistics in real-time and generate detailed transfer statistics (-z) at the end of the transfer.
Conclusion
Signiant’s Flight is an easy way to move big data into the cloud at high speed. Because it’s a SaaS solution, its highly available and high performance file transfer system is deployed and maintained for you.
Flight’s encryption in transit and intelligent file transfer guaranteed delivery means that you can send files securely and reliably. It’s easy to use and get started!
Just look-up Signiant Flight at the AWS Marketplace. A free trial is available.