Friday, 18 September 2015

Developer’s Guide to Install Elasticsearch, Logstash and Kibana


In this I will talk about how to perform log analytics using Elasticsearch, Logstash and Kibana. To start with, we will see how to install these softwares on Windows.

Prerequisites:
  1. ·         elasticsearch-1.4.4
  2. ·         kibana-4.0.1-windows
  3. ·         logstash-1.5.0.rc2


Install Elasticsearch on Windows
Elasticsearch is a search engine tool/platform which allows us to save the documents to be search in certain format and provides APIs to do full text search capabilities. In the recent times, because of its features like Open Source, Scalability, ease of use, it has become very popular among developer community.

Install Elastic Search is every easy, here are the steps for the same

For this demo, we are going to use “elasticsearch-1.4.4”. Unzip and extract the content to the suitable directory.



This will start the Elasticsearch service at http://localhost:9200.

Install Logstash on Windows

Logstash is useful utility when it comes to playing with Logs. It gives you in built-in features to read from various file formats and perform some operations with it. One of the best feature it has is, you can read the logs in certain format (e.g. Apache Logs, SysLogs etc.) and put them into Elastic search.
Unzip the downloaded “logstash-1.5.0.rc2” in any folder.

To enable use of Logstash from any directory, add the path to system variable using environment variables.

>set LOGSTASH_HOME=D:\ELK\logstash-1.5.0.rc2
>set PATH=%PATH%;D:\ELK\logstash-1.5.0.rc2\bin

And that's it, logstash is ready to use



Install Kibana 4 on Windows

Kibana is a JavaScript library which allows us to create beautiful dashboard reports using elasticsearch data.

Here we are going to use “kibana-4.0.1-windows” as it is compatible with current release of elasticsearch that we are using.
Prior to Kibana 4, we need to have a web server running but with Kibana 4, we get it embeded.
Unzip the “kibana-4.0.1-windows” file at any location.

Kibana configuration is very easy, simply edit config/kibana.yml to add the elasticsearch url and done.

Open config/kibana.yml and update property elasticsearch_url: "http://localhost:9200".

To start Kibana, execute

 

A server would get started and you could see the GUI at http://localhost:5601/



Developers Guide to Analyse Logs Using Elasticsearch, Logstash and Kibana


In this I will talk about how to leverage the analysis of logs using Elasticsearch, Logstash and Kibana.


Prerequisites:
·         Installation of Elasticsearch, Logstash and Kibana is complete as per previous article here.


Load logs data into Elasticsearch using Logstash
We are going to write one Logstash configuration which reads the data from Apache Logs file.


Create a sample log file as below and save it in “D:” directory.

ApacheLogs.log
71.141.244.242 - kurt [18/May/2011:01:48:10 -0700] "GET /admin HTTP/1.1" 301 566 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
134.39.72.245 - - [18/May/2011:12:40:18 -0700] "GET /favicon.ico HTTP/1.1" 200 1189 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)"


Next thing is to write logstash conf file as shown below.

Name this file as "logstash-apache.conf" and save this file in bin folder of Logstash installation folder.


input {
  file {
    type => "apache"
    path => [ "D:/ApacheLogs.log" ]
    start_position => "beginning"
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  elasticsearch { host => localhost }
  stdout { codec => rubydebug }
}


The above script first reads the apache logs file on given path, parses it into Apache logs format and add it to Elasticsearch.

Now let's execute this config using logstash and insert records into elasticsearch.



Now it will start reading from the file and also start inserting data into Elasticsearch.

Now go to Google Crome "Sense" extension and execute following command to view above log data.
















Now go to Kibana dashboard and configure indexes and Timestamp attribute as shown in the screenshot below.







Now simply go to Visualize tab to create graphs as you like and save them. Next is to view this saved Visualization in the Dashboard tab.

Saturday, 17 January 2015

Setup Apache Hadoop in a Standalone Mode




                                 Setup Apache Hadoop in a Standalone Mode



Apache Hadoop is an open source framework for writing and running distributed applications that process large amounts of data.
Hadoop is a rapidly evolving ecosystem of components for implementing the Google MapReduce algorithms in a scalable fashion on commodity hardware.
Hadoop enables users to store and process large volumes of data and analyze it in ways not previously possible with less scalable solutions or standard SQL-based approaches.
Through this tutorial I will provide step by step guide on how to configure Apache Hadoop in Standalone Mode.

Following are the Hadoop different Mode in which it can be configured to run on:

Standalone Mode- In standalone mode, we will configure Hadoop on a single machine (e.g. an Ubuntu machine on the host VM). The configuration in standalone mode is quite straightforward and does not require major changes.
Pseudo-Distributed Mode- In a pseudo distributed environment, we will configure more than one machine, one of these to act as a master and the rest as slave machines/node. In addition we will have more than one Ubuntu machine playing on the host VM.
Fully Distributed Mode- It is quite similar to a pseudo distributed environment with the exception that instead of VM the machines/node will be on a real distributed environment.



Installing & Configuring Hadoop in Standalone Mode
You might want to create a dedicated user for running Apache Hadoop but it is not a prerequisite. In our setup, we will be using a default user for running Hadoop.



Environment:
  • Ubuntu 10.10
  • JDK 6 or above
  • Hadoop-1.1.2 (Any stable release)




Follow these steps for installing and configuring Hadoop on a single node:

Step-1. Install Java
In this tutorial, we will use Java 1.6.

Use the below command to begin the installation of Java
1 $ sudo apt-get install openjdk-6-jdk
or
1 $ sudo apt-get install sun-java6-jdk



This will install the full JDK under /usr/lib/jvm/java-6-sundirectory.



Step-2. Verify Java installation
You can verify java installation using the following command
1 $ java -version

On executing this command, you should see output similar to the following:
java version “1.6.0_27″
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)



Step-3. Configure JAVA_HOME
Hadoop requires Java installation path to work on, for this we will be setting JAVA_HOME environment variable and this will point to our Java installation dir.
Java_Home can be configured in ~/.bash_profile or ~/.bashrc file. Alternatively you can also let hadoop know this by setting Java_Home in hadoop conf/hadoop-env.sh file.

Use the below command to set JAVA_HOME on Ubuntu
1 export JAVA_HOME=/usr/lib/jvm/java-6-sun



JAVA_HOME can be verified by command
1 echo $JAVA_HOME



Step-4. SSH configuration
  • Install SSH using the command.
1 sudo apt-get install ssh
  • Generate ssh key
    ssh -keygen -t rsa -P “” (press enter when asked for a file name; this will generate a passwordless ssh file)
  • Now copy the public key (id_rsa.pub) of current machine to authorized_keys. Below command copies the generated public key in the .ssh/authorized_keys file:
1 cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • Verify ssh configuration using the command
1 ssh localhost
Pressing yes will add localhost to known hosts



Step-5. Download Hadoop
Download the latest stable release of Apache Hadoop from http://hadoop.apache.org/releases.html.
Unpack the release:
tar – zxvf hadoop-1.0.3.tar.gz
Save the extracted folder to an appropriate location, HADOOP_HOME will be pointing to this directory.



Step-6. Configure HADOOP_HOME & Path environment
Use the following command to create an environment variable that points to the Hadoop installation directory (HADOOP_HOME)
1 export HADOOP_HOME=/home/user/hadoop



Now place the Hadoop binary directory on your command-line path by executing the command
1 export PATH=$PATH:$HADOOP_HOME/bin



Use this command to verify your Hadoop installation:
hadoop version
The output should be similar to below one
Hadoop 1.1.2



Step-7. Create Data Directory for Hadoop
An advantage of using Hadoop is that with just a limited number of directories you can set it up to work correctly. Let us create a directory with the name hdfs and three sub-directories name (represents Name Node), data (represents Data Node) and tmp.
  • /home/ja/~ mkdir hdfs
  • /home/ja/hdfs/~ mkdir tmp
  • /home/ja/hdfs/~ mkdir name
  • /home/ja/hdfs/~ mkdir data



Since a Hadoop user would require to read-write to these directories you would need to change the permissions of above directories to 755 or 777 for Hadoop user.



Step-8. Configure Hadoop XML files
Next, we will configure Hadoop XML file. Hadoop configuration files are in the HADOOP_HOME/conf dir.


conf/core-site.xml
1
2
3
4
5
6
7
8
9
10
<!--?xml version="1.0"-->>
<!--?xml -stylesheet type="text/xsl" href="configuration.xsl"?-->
<! -- Putting site-specific property overrides the file. -->


fs.default.name
hdfs://localhost:9000

hadoop.temp.dir
/home/ja/hdfs/temp



conf/hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
<! -- Putting site specific property overrides in the file. -->

dfs.name.dir
/home/ja/hdfs/name

dfs.data.dir
/home/ja/hdfs/data

dfs.replication
1




conf/mapred-site.xml
1
2
3
4
<! -- Putting site-specific property overrides this file. -->

mapred.job.tracker
localhost:9001


Step-9. Format Hadoop Name Node:
Execute the below command from hadoop home directory
1 $ ~/hadoop/bin/hadoop namenode -format



Step-10. Start Hadoop daemons
1 $ ~/hadoop/bin/start-all.sh



Step-11. Verify the daemons are running
1 $ /usr/java/latest/bin/jps
output will look similar to this
9316 SecondaryNameNode
9203 DataNode
9521 TaskTracker
9403 JobTracker
9089 NameNode
Now we have all the daemons running:



Step-12. Verify Admin Page UI of Name Node & Job Tracker
Open a browser window and type the following URLs:
Name Node UI: http://localhost:50070
Job Tracker UI: http://localhost:50030



Now you have successfully installed and configured Hadoop on a single node.



Keep posting me your queries. I will try my best to share my opinion on them.
Till then, Happy Reading!!!