Saturday, 17 January 2015

Setup Apache Hadoop in a Standalone Mode




                                 Setup Apache Hadoop in a Standalone Mode



Apache Hadoop is an open source framework for writing and running distributed applications that process large amounts of data.
Hadoop is a rapidly evolving ecosystem of components for implementing the Google MapReduce algorithms in a scalable fashion on commodity hardware.
Hadoop enables users to store and process large volumes of data and analyze it in ways not previously possible with less scalable solutions or standard SQL-based approaches.
Through this tutorial I will provide step by step guide on how to configure Apache Hadoop in Standalone Mode.

Following are the Hadoop different Mode in which it can be configured to run on:

Standalone Mode- In standalone mode, we will configure Hadoop on a single machine (e.g. an Ubuntu machine on the host VM). The configuration in standalone mode is quite straightforward and does not require major changes.
Pseudo-Distributed Mode- In a pseudo distributed environment, we will configure more than one machine, one of these to act as a master and the rest as slave machines/node. In addition we will have more than one Ubuntu machine playing on the host VM.
Fully Distributed Mode- It is quite similar to a pseudo distributed environment with the exception that instead of VM the machines/node will be on a real distributed environment.



Installing & Configuring Hadoop in Standalone Mode
You might want to create a dedicated user for running Apache Hadoop but it is not a prerequisite. In our setup, we will be using a default user for running Hadoop.



Environment:
  • Ubuntu 10.10
  • JDK 6 or above
  • Hadoop-1.1.2 (Any stable release)




Follow these steps for installing and configuring Hadoop on a single node:

Step-1. Install Java
In this tutorial, we will use Java 1.6.

Use the below command to begin the installation of Java
1 $ sudo apt-get install openjdk-6-jdk
or
1 $ sudo apt-get install sun-java6-jdk



This will install the full JDK under /usr/lib/jvm/java-6-sundirectory.



Step-2. Verify Java installation
You can verify java installation using the following command
1 $ java -version

On executing this command, you should see output similar to the following:
java version “1.6.0_27″
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)



Step-3. Configure JAVA_HOME
Hadoop requires Java installation path to work on, for this we will be setting JAVA_HOME environment variable and this will point to our Java installation dir.
Java_Home can be configured in ~/.bash_profile or ~/.bashrc file. Alternatively you can also let hadoop know this by setting Java_Home in hadoop conf/hadoop-env.sh file.

Use the below command to set JAVA_HOME on Ubuntu
1 export JAVA_HOME=/usr/lib/jvm/java-6-sun



JAVA_HOME can be verified by command
1 echo $JAVA_HOME



Step-4. SSH configuration
  • Install SSH using the command.
1 sudo apt-get install ssh
  • Generate ssh key
    ssh -keygen -t rsa -P “” (press enter when asked for a file name; this will generate a passwordless ssh file)
  • Now copy the public key (id_rsa.pub) of current machine to authorized_keys. Below command copies the generated public key in the .ssh/authorized_keys file:
1 cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • Verify ssh configuration using the command
1 ssh localhost
Pressing yes will add localhost to known hosts



Step-5. Download Hadoop
Download the latest stable release of Apache Hadoop from http://hadoop.apache.org/releases.html.
Unpack the release:
tar – zxvf hadoop-1.0.3.tar.gz
Save the extracted folder to an appropriate location, HADOOP_HOME will be pointing to this directory.



Step-6. Configure HADOOP_HOME & Path environment
Use the following command to create an environment variable that points to the Hadoop installation directory (HADOOP_HOME)
1 export HADOOP_HOME=/home/user/hadoop



Now place the Hadoop binary directory on your command-line path by executing the command
1 export PATH=$PATH:$HADOOP_HOME/bin



Use this command to verify your Hadoop installation:
hadoop version
The output should be similar to below one
Hadoop 1.1.2



Step-7. Create Data Directory for Hadoop
An advantage of using Hadoop is that with just a limited number of directories you can set it up to work correctly. Let us create a directory with the name hdfs and three sub-directories name (represents Name Node), data (represents Data Node) and tmp.
  • /home/ja/~ mkdir hdfs
  • /home/ja/hdfs/~ mkdir tmp
  • /home/ja/hdfs/~ mkdir name
  • /home/ja/hdfs/~ mkdir data



Since a Hadoop user would require to read-write to these directories you would need to change the permissions of above directories to 755 or 777 for Hadoop user.



Step-8. Configure Hadoop XML files
Next, we will configure Hadoop XML file. Hadoop configuration files are in the HADOOP_HOME/conf dir.


conf/core-site.xml
1
2
3
4
5
6
7
8
9
10
<!--?xml version="1.0"-->>
<!--?xml -stylesheet type="text/xsl" href="configuration.xsl"?-->
<! -- Putting site-specific property overrides the file. -->


fs.default.name
hdfs://localhost:9000

hadoop.temp.dir
/home/ja/hdfs/temp



conf/hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
<! -- Putting site specific property overrides in the file. -->

dfs.name.dir
/home/ja/hdfs/name

dfs.data.dir
/home/ja/hdfs/data

dfs.replication
1




conf/mapred-site.xml
1
2
3
4
<! -- Putting site-specific property overrides this file. -->

mapred.job.tracker
localhost:9001


Step-9. Format Hadoop Name Node:
Execute the below command from hadoop home directory
1 $ ~/hadoop/bin/hadoop namenode -format



Step-10. Start Hadoop daemons
1 $ ~/hadoop/bin/start-all.sh



Step-11. Verify the daemons are running
1 $ /usr/java/latest/bin/jps
output will look similar to this
9316 SecondaryNameNode
9203 DataNode
9521 TaskTracker
9403 JobTracker
9089 NameNode
Now we have all the daemons running:



Step-12. Verify Admin Page UI of Name Node & Job Tracker
Open a browser window and type the following URLs:
Name Node UI: http://localhost:50070
Job Tracker UI: http://localhost:50030



Now you have successfully installed and configured Hadoop on a single node.



Keep posting me your queries. I will try my best to share my opinion on them.
Till then, Happy Reading!!!

Wednesday, 31 December 2014

How to detect and fix Memory Leak – OutOfMemoryError

How to detect and fix Memory Leak – OutOfMemoryError – Heap Space

Step 1: To detect and fix a memory leak in Java, we will be using JVisualVM which is free open source tool that comes bundled with JDK. To open this application, refer to the path in following snapshot:



Step 2 : Let us create a java program that can be the cause of Memory leakage. In this program we will create a POJO class and instantiate it infinite times in a loop and keep adding it into an Arraylist.
After running for few seconds, the program will throw the famous OutOfMemoryError: Java Heap Space.



Step 3: JVisualVM will reflect the status of Heap. To view this click on the respective java program in Applications side bar of JVisualVM.
Refer to the following snapshot:





Step 4: Now that you have seen the error it is time to detect the root cause of the same. For this enable the ‘Heap Dump on OOME’ property of your java program in JVisualVM. For this right click your program and click on ‘Enable Heap Dump on OOME’.

Refer to the following snapshot:




Enabling this will ensure that Heap dumps will be generated at time of OutOfMemoryError in your program.
The heap dumps are usually used to analyze the root cause of any memory related error. It is basically a snapshot of JVM internals like total classes loaded during a program execution, total instances created, which class had how many instances etc.



Step 5: Re-run your java program to replicate the problem and this time we will use generated heap dump to analyze the cause of error.





Step 6: To open the Heap dump, go to JVisualVM, click File -> Load. Browse to the path of Heap dump location.

For example, in our case the dumps were created at location:

C:\Users\User1\AppData\Local\Temp\visualvm.dat\localhost_688\heapdump-1420002430567.hprof

Refer to snapshot below for the same:





Step 7: The Summary tab of heap dump will display the basic information like number of classes loaded, total instances created, the environment in which program was executed etc.

Refer to the snapshot below:



Step 8: To detect the root cause of error, click on Classes tab. This displays the number of instances created for each loaded class.
As in our case, the OOM error was there due to high number of instances of class MyPOJO.

Refer to the snapshot below for the same:



There was huge number of instances around 13,845,151 for class MyPOJO.

Step 9: Now that we have discovered the root cause of the problem, next step will be to resolve the same. In our case we can modify the program to not run an infinite loop, so that we can control the number of instance creation.
Similarly, you can too dry run your code to find out the resolution of known root cause.

Wednesday, 16 July 2014

Configure HTTPS(Self Signed Certificate) on Tomcat


Step1 : First, uses “keytool” command to create a self-signed certificate. During the keystore creation process, you need to assign a password and fill in the certificate’s detail. 




Above, the “imfluxkeystore” is the certificate file that gets created at location “d:\”.
 Press Enter, following options will be shown. Provide password and leave the rest of the options blank by pressing enter.






  













Step2 : Secondly, configure conf/server.xml of Tomcat to treat 8080 port as secured port.
To do this, comment out existing <connector> tags having port as 8080. After this, uncomment existing <connector> tag having port as 8443. Modify this with port 8080 and add property like “keystoreFile” and “keystorePass” (this password is the one that you entered in Step1).



Step3 : Saved it and restart Tomcat, then access https://localhost:8080/YOUR_APPLICATION_NAME


Tuesday, 3 June 2014

JVM Memory Leak



JVM Memory Leak

In this article, I will deep dive into one of the crucial area of any Java Application, i.e Memory. Any java application where either Garbage Collection(GC) techniques has not been looked upon or Memory size has not been looked upon is bound to suffer from Memory Leak one day.

Following sections will be covered in this article so that we have a both broad and in depth picture of Memory Leak and its surroundings:

1) Overview of Garbage Collection mechanism(GC)?
2) Overview of Memory Allocation to JVM?
3) What is Memory Leak?
4) How Memory Leak happens or Symptoms of Memory Leak?
5) How to prevent or avoid Memory Leak?

1) Overview of Garbage Collection mechanism(GC) -
Garbage Collection is one of the automatic JVM mechanism to look at the Heap memory, identify objects which are no longer in use and to delete unused objects.
It's basic purpose is to reclaim the heap memory from unused objects and makes space for future objects to be created.
So basically, GC is a two way process, one is to Mark the objects(unused,non-referenced) and second is to Delete these objects.

2) Overview of Memory Allocation to JVM -
JVM Heap space is divided into following categories:
a) Young Generation
                i) Eden Space
                ii) Survivor '0' Space (S0)
                iii) Survivor '1' Space (S1)
b) Old Generation
c) Permanent Generation(Perm Gen)



When a new object is created, it is first allocated space in Young generation. Based upon whether it is being referred or not during the time of GC, decision is made to move this object to next heap space or to delete it is made.
If at time of GC, an object is still has valid reference, then it is moved from Eden Space to S0 space. Gradually an object with valid reference during time of GC is moved forward to Old Generation.

So lifecycle of an object is like :
new Object -> Eden Space -> S0 space -> S1 space -> Old Generation.
At any time during GC, if object is unused, it will be deleted from its present heap space and memory will be reclaimed.

The Permanent Generation contains metadata required by JVM to describe the classes and methods.
Following are the contender of Permanent Generation space:
a) Methods in a class
b) Class Name
c) Constant Pool
d) Internal objects used by JVM

The classes that are no longer being used or unloaded are removed from Perm Gen space and memory is reclaimed.

3) What is Memory Leak -
A memory leak happens when JVM is unable to allocate memory to new objects. Directly it can be said, when even after multiple GC execution, memory space cannot be reclaimed and heap space gets full, a memory leak will happen.

4) How Memory Leak happens or Symptoms of Memory Leak -
One of the common symptom of Memory leak is when JVM throws OutOfMemoryError.
Other symptoms includes when responsiveness or performance of an application is getting degraded and after a restart of server, the performance rises.
But again after some duration, performance seems to degrade.

This usually happens because JVM is not able to match up with the requirement of space by new objects. As and when heap space gets full and if at the same time heap space is needed for a new object, OutOfMemoryError will be thrown up.

5) How to prevent or avoid Memory Leak -
First step is to identify the problem area and then use profiling tools to tackle them. One need to identify the problem with help of error in logs file if any.

Common error traces found in log file during Memory leak are as following:
a) java.lang.OutOfMemoryError: Java heap space
b) java.lang.OutOfMemoryError: PermGen space
c) java.lang.OutOfMemoryError: Requested array size exceeds VM limit
d) java.lang.OutOfMemoryError: request <size> bytes for <reason>. Out of swap space?

In this article, I will focus on first two heap error i.e Java Heap space and PermGen space error

Let's look into Java Heap space error first:

a) java.lang.OutOfMemoryError: Java heap space -
As specified in the error name itself, this error is thrown when heap space is full and GC cannot reclaim memory.

Few of the causes of OutOfMemoryError are as following:
i) GC not able to reclaim heap memory - This happens when application is holding continous reference to objects. This may be due to poor coding techniques followed.
To tackle this error following steps can be followed (not in order) :
                Step 1: Dry run of code - Look out for major bottleneck in code, specially objects creation in Loops.
                Step 2: Use Profiling tools - A free and easy to use profiling tool shipped with Sun JDK itself is JVisualVM (exe available in bin dir of jdk) - It represent a graphical view of classes, threads, objects getting loaded in your application. Target area should be CPU and Memory profiling tab. It will let you know about the threads/objects that are occupying most of the CPU time and Memory area.

ii) JVM Heap space allocation configuration - Sometimes it is difficult to reduce the number of objects creation after a certain limit, then it is the time to increase the heap space allocated to JVM.
Following are the heap space allocation parameters:
                a) -Xms - Minimum memory size
                b) -Xmx - Maximum memory size
                Specify same size for both a) and b)
                c) -XX:+HeapDumpOnOutOfMemoryError - To generate heap dump as and when out of memory occurs
                d) -XX:OnOutOfMemoryError=%ACTION_RESTART% - To specify action like restart server after out of memory error
               
b) java.lang.OutOfMemoryError: PermGen space -         
As specified in the error name itself, this error is thrown when Permanent Generation space is full and GC cannot reclaim memory.
This happens when classloader and its classes cannot be garbage collected after they have been modified/unloaded. Sometimes this error comes if application is using 3rd party jars like Spring, Hibernate, CgLib and the space allocated to PermGen is not sufficient.
These jars creates lots of Proxy classes at runtime, due to this unknowingly your PermGen space will get filled.

To avoid this error following below steps(not in order):
i) Configure JVM Options
                a) -XX:MaxPermSize=128m or higher - Increase as per need
                b) -XX:+UseConcMarkSweepGC
                c) -XX:+CMSClassUnloadingEnabled  - This will enable class unloading when it is no longer in use.
ii) Do not pass references of your application to utility jars in your application.
iii) Before selecting any 3rd party jar, evaluate its pros and cons on performance.

Friday, 30 May 2014

Spring Secutiry HandsOn

Spring Security – Authentication and Authorization

In this article we will learn about how to develop a spring security framework which covers both authentication and authorization. I have explained each step in detail along with snapshot of creating each entity. I hope that you also get quickly Hands On Spring Security.
Tools to be used:
  1.  Eclipse 3.5
  2.  Spring 3.0.5
  3.  JBoss v5.0

We will use following major libraries (refer to Libraries snapshot for complete list of jars required both at compile time and runtime:  

  1.  Spring Core 3.0.5
  2. Spring Security 3.0.5
  3.  Spring MVC 3.0.5
  4.  Spring AOP 3.0.5


Steps to be followed:
  1. Create a dynamic web project in Eclipse and configure libraries.
  2.  Create a package structure.
  3.   Configure Spring related details in Web.xml.
  4. Create a spring-security.xml (can be renamed as per convenience).
  5.  Create a mvc-dispatcher-servlet.xml (can be renamed as per convenience).
  6.   Create a Controller class to secure methods.
  7.    Create a welcome page with links to your Controller request mapping.
  8.  Create an authorization/unauthorization page where user will be redirected in case of authentication success/failure respectively.
  9.  Create a logout page where user will be redirected in case of logout.
  10. Create an EAR project, link with your webapp project and deploy in JBoss v5.0









Step 1: Create a dynamic web project in Eclipse and configure libraries
In Eclipse, go to File -> New -> Dynamic Web Project (refer snapshot below)






  
Enter Project Name, such asSpringSecure (refer snapshot below)

This will create a project named SpringSecure in eclipse workspace (refer snapshot below)



To configure libraries, copy and paste following jars to your project WEB-INF -> lib folder (refer snapshot below):



Step 2: Create a package structure
In Project Explorer frame, select src folder, right click it, go to New -> Package and enter package name there. Click on Finish button (refer snapshot below)



Step 3: Configure Spring related details in Web.xml
Enter details with regard to Welcome Page, Spring MVC, Spring Context Loader and Spring Security Filter in web.xml (refer snapshot below)







Step 4: Create a spring-security.xml
In Project Explorer frame, go to your project, right click WEB-INF folder and create a new XML file “spring-security.xml” (refer snapshot below). It will create a blank xml file.



Enter details in spring-security.xml regarding which URL you want to be secured and their role authority.
Also configure the username and password for an authorized user. Spring security by itself will match the username and password entered during login screen with the details provided in spring-security.xml.

Following is the snapshot of spring-security.xml:



The tag <intercept-url> indicates the URL to be secured. Like in our case, we will secure any url having pattern “/adminUser*”. 
Also list the authorized Role for this url in access parameter.
Like in our case, a user having role “ROLE_ADMIN” will be allowed to access a pattern having “/adminUser*”.
The tag <user name=”xxxxx” password=”xxxx” authorities=”xxxx”/> indicates the Spring security framework to allow successful authentication to a user having these details.

Like in our case, a user having username as “admin” and password as “admin” will be successfully authenticated. Such user will have an authority which is mentioned against “authorities” parameter.
Spring security framework will load the above user details in an Object and will match with the details entered by a user in Login screen. If the details get match, then only authorization is checked further otherwise user is shown appropriate error message in Login screen.


Step 5: Create a mvc-dispatcher-servlet.xml
In Project Explorer frame, go to your project, right click WEB-INF folder and create a new XML file “mvc-dispatcher-servlet.xml”. It will create a blank xml file.
Enter details in “mvc-dispatcher-servlet.xml” file regarding your base package and view resolver.
Following is the snapshot of “mvc-dispatcher-servlet.xml” file:




  


Step 6: Create a Controller class to secure methods.
In Project Explorer frame, go to your project, right click package “com.secure”  and create a new JAVA class “AdminController” (refer snapshot below)









Enter details in “AdminContoller.java” regarding request mapping, methods body and their return parameters.
Note – We secured following two urls in spring-security.xml. Request mapping is entered in AdminController.java for the same. This will secure methods such as “welcomeAdminUser” and “welcomeSupportUser”.
1) “/adminUser”
2) “/supportUser”

Refer to snapshot below:




Step 7: Create a welcome page with links to your Controller request mapping.
To request for a page with secured URL’s, we need to create a JSP page which we will have hyperlinks to send request to our application.
Right click “WebContent” folder in project and create a new folder “jsp”. In this folder we will create “home.jsp”.
Following is the content to be added in “home.jsp”.



Note – We have already added “/jsp/home.jsp” in our web.xml as welcome page.
This page will open by default when we hit our application url.

Step 8: Create an authorization/unauthorization page where user will be redirected in case of authentication success/failure respectively
After successful authorization, we will redirect the user to authorization.jsp and display message there.
And in case of unauthorized user, we will redirect the user to unauthorization.jsp.
In “jsp” folder we will create “authorize.jsp” and “unauthorize.jsp”.
Following is the content to be added in “authorize.jsp”:



Following is the content to be added in “unauthorize.jsp”:



Note – In our “AdminController.java”, we have already added code statement to redirect to “authorize.jsp”. See snapshot below:



Step 9: Create a logout page where user will be redirected in case of logout
Once user has been successfully authenticated and authorized, Spring Security framework stores the client related information in cookies. So in order to re-login, we need to first logout from the application.
We will create a new JSP with name “logout.jsp”.
Following is the snapshot of “logout.jsp”:



Note –We have already provided “Logout” hyperlink to user in our “authorize.jsp”.
Refer snapshot below for the same:



That’s all with coding; Spring Security framework will by itself take care of Login Page. Yes, in case we do not specify our custom Login page, then Spring Security framework by itself display a Login page.

Step 10: Create an EAR project, link with your webapp project and deploy in JBoss v5.0
Go to File -> New -> Select ‘Enterprise Application Project’ and enter EAR name like ‘SpringSecureEAR’.



Do remember to choose ‘SpringSecure’ (webapp project) as dependency when creating EAR. Click on Finish button.


Add the newly created “SpringSecureEAR” resource on the server (refer snapshot below)



After this click on Finish button and then start server.

  
To Test the Security, we need to hit the following URL:


This will send the request to our application, and a home page will be displayed as following:



Click on first link, this will trigger the Spring Security as request is for “/adminUser” page.

The inbuilt Login Page will be displayed for authentication of the user. Refer to the snapshot below:



Enter User as “admin” and Password as “admin” and Submit the page.
As we have defined a user tag already in “spring-security.xml”, therefore authentication will be done on basis of what we have entered in login page with the details present in xml file.

 Upon successful authentication, the framework checks for authorization.
The admin user is having a role “ROLE_ADMIN” as already defined in user tag in spring-security.xml file. Refer to snapshot below:

<user name="admin" password="admin" authorities="ROLE_ADMIN" />

As authority of admin user matches with the role needed to access “\adminUser” url, therefore it will get successfully authorized.
An authorization page will be displayed as per return type mentioned in our “AdminController.java”.



On click of “Logout” link, user will be redirected to our controller. There we can build the logic to clear any session data available, if any.



In case user is not authorized, then following page is displayed: