15

How To Download Secure Files

The Problem

In a previous tip we covered how to write tests to download files in a browser agnostic way by leveraging Selenium Webdriver and an HTTP library in tandem.

This approach is great, but there are often times where the file you want to download is behind authentication, presenting a hurdle to overcome.

A Solution

In order to access secure files with an HTTP library, we want to pull the authenticated session information out of Selenium's cookie store and pass it into the HTTP library when we perform the request.

Let's dig in with an example.

An Example

We start by requiring our libraries (selenium-webdriver to drive the browser, rspec-expectations for our assertions, and rest-client for our HTTP requests) and wire up some simple setup, teardown, and run methods.

# filename: secure_download.rb

require 'selenium-webdriver'
require 'rspec/expectations'
require 'rest-client'

include RSpec::Matchers

def setup
  @driver = Selenium::WebDriver.for :firefox
end

def teardown
  @driver.quit
end

def run
  setup
  yield
  teardown
end

Next we'll wire up our test actions.

We first access access a page of download links that is behind Basic HTTP Authentication on the-internet.

Once the page loads, we grab the authentication session cookie and the URL for the first file listed. Once we have that we fire up RestClient and perform a HEAD request using both the download link and the session cookie. We then check the response headers to make sure the file is the correct type and that it is not empty.

run do
  @driver.get 'http://admin:admin@the-internet.herokuapp.com/download_secure'
  cookie = @driver.manage.cookie_named 'rack.session'
  link = @driver.find_element(css: '.example a').attribute('href')
  response = RestClient.head link, cookie: { cookie[:name] => cookie[:value] }
  expect(response.headers[:content_type]).to eql('image/jpeg')
  expect(response.headers[:content_length].to_i).to be > 0
end

It's worth noting that we are using a HEAD request instead of a GET request. Since we only care about the header information this will perform a partial fetch of data, rather than a full download of the file.

If we run this it will pass. But it's only limited to the first download link and it's unfortunately quite brittle (since the test assumes that the file will always be an image). Let's update our example to remedy this.

An Improved Example

First, let's create a helper method to tease out the filename from a given URL and return the correct content type. If we don't recognize the file type, then we'll stop the test and raise an exception.

def content_type(file)
  file = File.basename(file)
  if file.include? '.jpg'
    'image/jpeg'
  elsif file.include? '.pdf'
    'application/pdf'
  else
    raise 'Unknown file type'
  end
end

Now we can update our test to use this method in addition to grabbing all download links from the page and iterating through them.

run do
  @driver.get 'http://admin:admin@the-internet.herokuapp.com/download_secure'
  cookie = @driver.manage.cookie_named 'rack.session'
  links = @driver.find_elements(css: '.example a')
  links.map! { |link| link.attribute('href') }
  links.each do |link|
    response = RestClient.head link, cookie: { cookie[:name] => cookie[:value] }
    expect(response.headers[:content_type]).to eql(content_type(link))
    expect(response.headers[:content_length].to_i).to be > 0
  end
end

By using find_elements we get all of the download links returned in an Array. We then use map! to update the collection to give us just the URLs (instead of a collection of Selenium objects which contain URLs).

After that, we're able to iterate over the Array of URLs, perform a HEAD request, and perform our assertions just like before (but this time, using our new content_type helper method).

Expected Behavior

If we save this file and run it (e.g., ruby secure_download.rb from the command-line), here is what will happen.

  • Browser opens
  • The secure file downloads page loads
  • The authenticated cookie information is retrieved
  • All download links are retrieved
  • An HTTP library performs a HEAD request against the download link using the retrieved cookie information
  • The response headers are checked to make sure the file is the correct type and not empty
  • The previous two steps are repeated until all download links are verified

Outro

From here, it's simple enough to add in additional content types and file types. And while this example demonstrates accessing files behind Basic HTTP Authentication it should also work with files behind form-based authentication.

Hopefully this helps save you some time, enabling you to build a more lean and fast set of download tests.

Happy Testing!

Found this helpful?

Submit your e-mail in the form below to recieve tips like this!

One email every Tuesday. No Spam. Ever. Unsubscribe anytime.


Back to the archives