czwartek, 12 listopada 2015

Want to be a data scientist? Start learning Linux in the cloud!

Cloud computing offers great possibilities to learn Linux newcomers. Why would you like to learn Linux? Because if you want to large scale computations, data mining or big data analytics there is virtually no other reasonable option.

What makes the cloud technology great is the very minimal effort required to set up and start using a fully functional Linux machine. In this tutorial I will show how to setup a Linux server in the Amazon Web Services cloud and how to connect to it. I will present how to set up a terminal connection (command line) and how to browse files your Linux server when connecting to your cloud instance.


The following prerequisites are required to for this tutorial:
  1. Open an Amazon Web Services (AWS) account. Go to to set-up an account. Detailed instructions can be found in my previous post.
  2. Go to and download Putty. You will need two files: putty.exe and puttygen.exe. Please note that no installation is required.
  3. Go to and download WinSCP. Choose either “Installation package” or “Portable executables” depending on your preference

Setting up a Linux machine in the cloud

Please follow these steps to start an EC2 instance in the cloud
  1. Login to your an Amazon Web Services (AWS) account at
  2. From available services select EC2 - Virtual Servers in the cloud image
  3. Press the “Launch Instance” image button
  4. Select “Ubuntu Server 14.04 LTS (HVM), SSD Volume (64-bit)”. Press image .
  5. In the “Choose an Instance Type” screen select a hardware to run your Linux. For computational purposes you usually want c4.* instances but it might depend for specific use case. For demo and tutorial purposes use t2.micro. If you do not know what to choose please select t2.micro.
  6. Click “Review and Launch”
  7. Click “Launch” image in the right bottom corner
  8. Create a new key pair that will be used to connect to your instance (or use an existing key pair if you already have one) . If you select to create a key pair, download the *.pem key file and store it in a secure location. After a key pair is set click “Launch instances” image

  9. You will see a message “Your instances are now launching”. Click on the instance name starting with “i-”.

  10. After clicking the link you will go to the “Instance list” screen

  11. Copy to the console the “Public DNS address” (in this example
  12. Now you are ready to connect to your new Linux server!. Please go to the next section of this tutorial to see the instructions on how to connect.
  13. The setup is complete! Remember to shut down the instance after finishing your computations. In order to shut down an instance right click on it and set either “stop” (to continue using it later) or “terminate” (deletes the instance, you loose the main partition).


Connecting to an EC2 Linux virtual machine

 Converting your key to putty/WinScp format

 Putty and WinSCP use *.ppk file format rather than *.pem that was exported from the AWS web site in the previous section. Hence, you need to convert the file format.
  1. Run puttygen.exe
  2. Press “Load” image
  3. In the “Load private key” dialog select “All file types” and select the *.pem file.
  4. Now press “Save private key” image and save the private key as a *.ppk file

Terminal connection

 Once you have created the *.ppk private key, you can use it to connect with a putty terminal.
  1. Run putty.exe
  2. In the the computer address from the "Public DNS" field in the previous section to the Host Name field. In this example we have typed Use the address of your server.

  3. Now go to Connection -> SSH -> Auth and set the “Private key file for authentification”. Use the *.ppk file.

  4. Press “Open” to open the connection image
  5. During the first connection you will be informed that “the server host key is not cached in the registry”. Click “Yes” to add the instance to the registry.

    Security note: in production environments you should always check the server’s fingerprint BEFORE initiating the connection for the first time. In order to do that right-click the server in AWS management console, select Instance settings & Get system log.

Explore files in your cloud Linux server

Use WinSCP to explore files on your virtual server.
  1. Start WinSCP and click Tools & Run Pageant

  2. A small pagent icon image will appear in your taskbar somewhere near your clock. Right-click it and select “Add keys”
  3. Select the *.ppk file that you have earlier created by puttygen.exe (e.g. somekey.ppk)
  4. Now in the Winscp you can type the hostname (e.g., user name (ubuntu) and click “Login” to connect. Leave the password field blank – Pageant will handle the authorization for you.