Part I — E2E Testing and Selenium

What is E2E Testing?

E2E Testing, as it will be defined here, is testing your system from the system’s user perspective. The key difference between E2E tests and other functional tests (namely: unit and integration) is as follow:

  • In a unit or integration test the testing framework is playing the role of software CODE that is RUNNING the tested code. The testing framework calls the functions in the tested code and will wait for responses from the code. The testing framework is very aware of the internals of your code, such as your development language.
  • In an E2E test, the testing framework is playing the role of a USER that is INTERACTING with the application like a user would do: click on the screen, input text and expect to see screen changes in response.

Unit and integration vs E2E testing

How do we E2E Test a system?

Here are some good news: if you are about to test an application from the end user’s perspective, you do not really care about the internals of the system. If we are referring to a browser based app (or a mobile app, but we will get to that later), then all you have to do is run a browser and interact with it. You need to find a certain input field, paste some value, click on a button and wait until the system has reacted.

Was the front end developed in React, Angular or Vue? who cares. Is the backend based on Nodejs or Java or PHP? As long as it works it is fine. All is required is this magic system that will find this input field and button and will know how to interact with them .

Meet Selenium

This is where Selenium comes into picture. Selenium (actually: Selenium WebDriver) is a tool for automating web application testing, and in particular to verify that they work as expected.

Selenium originally started at 2004. Its original version was called Selenium RC (Remote Control), or Selenium 1.0. It worked by injecting javascript into the browser and executing its functions.

Later, Selenium was evolved into Selenium Webdriver, which uses the browser APIs to execute the required commands. Here is Selenium Webdriver’s architecture:

How Does it Work?

  • Selenium language bindings (in our favorite language) sends a JSON (HTTP) to the Selenium Server.
  • The server receives the request and translates it to the browser’s specific APIs
  • The server sends the request to the browser.
  • Browser executes the requests and send the response back to Selenium server
  • Selenium server returns the response the the HTTP request response.

Let’s make this a bit more real:

COMMAND POST "/session"
DATA {"desiredCapabilities":{"javascriptEnabled":true,"locationContextEnabled":true...}
INFO SET SESSION ID 67379a3fab9310256090bd4fbe8839df
RESULT {"acceptInsecureCerts":false...}

Language Bindings: Create me a new session with the following characteristics: enable js, enable location and more…

Selenium server: I hear you. Here is the session Id I created for you!.

COMMAND POST "/session/67379a3fab9310256090bd4fbe8839df/url"
DATA {"url":"somewere.example.com"}

LB: Great. Can you now go to that url I am sending you?

SS: yep! (no result body, just returning status code).

COMMAND POST "/session/67379a3fab9310256090bd4fbe8839df/elements"DATA {"using":"css selector","value":"#welcome"}RESULT [{"ELEMENT":"0.7584651621036613-3"}]

LB: Can you find me an element on page using the css selector #welcome

SS: Found it — here is its ID. Go ahead and use it if you want to do anything with this element.

COMMAND POST "/session/67379a3fab9310256090bd4fbe8839df/element/0.7584651621036613-3/click"
DATA {}

LB: Click on it!

SS: ditto!

And so on and on…

What are those APIs?

The APIs that are now part of the spec that is governed by W3C (the web committee). They are called Webdriver APIs and are defined here. Those API are not yet an official standard, but this is likely to occur at some point. Another useful link points to the implementation status of the APIs in the major browser. It is a great source to check when your tests do not work as expected.

But, what is Selenium Language Bindings Anyway?

Because the APIs are a standard, we can use any language to trigger them. So if you would like to write your tests in Java — go ahead. C++ or C#? sure. The language binding is running your tests and translates them to the JSON requests. Selenium originally was developed in Java, but later Language Bindings were added in other languages. You can see the various official language bindings in the Selenium official repository.

Selenium-less Selenium

So, in order to test on a browser, we need to set a language binding server that will call the APIs, and a Selenium Server that will receive the Webdriver API requests and will dispatch the requests to the right browser.

This is true, BUT there is also a shortcut: the major browsers (read: Firefox, Chrome and lately Safari) can receive the HTTP API calls directly without the need to install Selenium Server. This means that in order to test chrome you can simply install Chromedriver locally and send the requests to the default port 9515. Chrome will execute the commands and respond without the need to go thru the Selenium Server.

So if you are using non official language bindings, such as webdriverio, and a direct browser connection to, let’s say Chrome, you are running “Selenium” like tests (in fact — webdriver) without really using or installing Selenium itself.

Selenium Grid

One additional term you may encountered is the Selenium Grid. This is required when you are running multiple browsers or multiple operating systems. The Selenium Grid keeps track of all your sessions and will direct the correct session request to the relevant browser. If you are testing against single browser, this is not needed — see Selenium-less.

Need someone to help you around the E2E testing world? Read the whole series. If you are happy and joyful — you may clap!