czwartek, 12 listopada 2015

Want to be a data scientist? Start learning Linux in the cloud!


Cloud computing offers great possibilities to learn Linux newcomers. Why would you like to learn Linux? Because if you want to large scale computations, data mining or big data analytics there is virtually no other reasonable option.

What makes the cloud technology great is the very minimal effort required to set up and start using a fully functional Linux machine. In this tutorial I will show how to setup a Linux server in the Amazon Web Services cloud and how to connect to it. I will present how to set up a terminal connection (command line) and how to browse files your Linux server when connecting to your cloud instance.

Prerequisites

The following prerequisites are required to for this tutorial:
  1. Open an Amazon Web Services (AWS) account. Go to https://aws.amazon.com/ to set-up an account. Detailed instructions can be found in my previous post.
  2. Go to http://www.putty.org/ and download Putty. You will need two files: putty.exe and puttygen.exe. Please note that no installation is required.
  3. Go to http://winscp.net/ and download WinSCP. Choose either “Installation package” or “Portable executables” depending on your preference

Setting up a Linux machine in the cloud


Please follow these steps to start an EC2 instance in the cloud
  1. Login to your an Amazon Web Services (AWS) account at http://console.aws.amazon.com/
  2. From available services select EC2 - Virtual Servers in the cloud image
  3. Press the “Launch Instance” image button
  4. Select “Ubuntu Server 14.04 LTS (HVM), SSD Volume (64-bit)”. Press image .
  5. In the “Choose an Instance Type” screen select a hardware to run your Linux. For computational purposes you usually want c4.* instances but it might depend for specific use case. For demo and tutorial purposes use t2.micro. If you do not know what to choose please select t2.micro.
  6. Click “Review and Launch”
  7. Click “Launch” image in the right bottom corner
  8. Create a new key pair that will be used to connect to your instance (or use an existing key pair if you already have one) . If you select to create a key pair, download the *.pem key file and store it in a secure location. After a key pair is set click “Launch instances” image

    image
  9. You will see a message “Your instances are now launching”. Click on the instance name starting with “i-”.

    image
  10. After clicking the link you will go to the “Instance list” screen

    image
  11. Copy to the console the “Public DNS address” (in this example ec2-52-23-164-62.compute-1.amazonaws.com)
  12. Now you are ready to connect to your new Linux server!. Please go to the next section of this tutorial to see the instructions on how to connect.
  13. The setup is complete! Remember to shut down the instance after finishing your computations. In order to shut down an instance right click on it and set either “stop” (to continue using it later) or “terminate” (deletes the instance, you loose the main partition).

    image

Connecting to an EC2 Linux virtual machine


 Converting your key to putty/WinScp format

 Putty and WinSCP use *.ppk file format rather than *.pem that was exported from the AWS web site in the previous section. Hence, you need to convert the file format.
  1. Run puttygen.exe
  2. Press “Load” image
  3. In the “Load private key” dialog select “All file types” and select the *.pem file.
  4. Now press “Save private key” image and save the private key as a *.ppk file

Terminal connection


 Once you have created the *.ppk private key, you can use it to connect with a putty terminal.
  1. Run putty.exe
  2. In the the computer address from the "Public DNS" field in the previous section to the Host Name field. In this example we have typed ubuntu@ec2-52-23-164-62.compute-1.amazonaws.com. Use the address of your server.

    image
  3. Now go to Connection -> SSH -> Auth and set the “Private key file for authentification”. Use the *.ppk file.

    image
  4. Press “Open” to open the connection image
  5. During the first connection you will be informed that “the server host key is not cached in the registry”. Click “Yes” to add the instance to the registry.


    Security note: in production environments you should always check the server’s fingerprint BEFORE initiating the connection for the first time. In order to do that right-click the server in AWS management console, select Instance settings & Get system log.

Explore files in your cloud Linux server


Use WinSCP https://winscp.net/eng/download.php to explore files on your virtual server.
  1. Start WinSCP and click Tools & Run Pageant

    image
  2. A small pagent icon image will appear in your taskbar somewhere near your clock. Right-click it and select “Add keys”
  3. Select the *.ppk file that you have earlier created by puttygen.exe (e.g. somekey.ppk)
  4. Now in the Winscp you can type the hostname (e.g. ec2-52-23-164-62.compute-1.amazonaws.com), user name (ubuntu) and click “Login” to connect. Leave the password field blank – Pageant will handle the authorization for you.

sobota, 10 października 2015

Setting-up an account with Amazon Web Services for your cloud computing needs



On this blog I will publish information and tutorials for beginners on how to start doing your large scale computation in the cloud. I will focus mainly on Amazon Web Services since it is the largest cloud computing provider.
The target audience are MS Windows users with none experience in Linux systems who need to handle large scale computation or data processing in the cloud and do not have time to spend several hours finding out how to do the things right. 

So let's start the blog with some notes about setting up an Amazon Web Services account.

In order to start your journey with cloud computing you need to register at Amazon Web Services (AWS)  at https://aws.amazon.com/.

Please note the following:

  • A credit card is required for registration - however you will not be billed until you use paid-for the resources.
  • If you do not have a credit buy a prepaid one! For example, in my country (Poland, BZ-WBK bank) a virtual credit card can be bought for as little as $1.5 and is being emailed almost immediately after the payment is made. You are not required to have any funds at your credit card up to the point when you actually decide to use some non-free AWS services. If you have similar options in your country and you have checked that they work with AWS - please let me know I will update this post.
  • If you are affiliated with an academic institution please remember to apply for a free computations coupon at https://aws.amazon.com/education/awseducate/apply/. If you are a student you can get either USD 35 or 100 of credit depending on whether your university participates in the AWS for education program or not.
  • Amazon takes a great effort in order to ensure that a user is actually aware when taking an action that results in billing. Hence, there is almost no chance that you will spend the money accidentally.