Friday, March 2, 2012

Practical challenges of profiler integration with Java/J2EE applications

Profiling is a technique used to identify code-level performance bottlenecks in an application and tune application code accordingly. In the context of Web-based and multi-tiered applications, the profiling needs to be done in the Application Server/Middleware tier as most of the business logic processing is centered there. The focus of this paper is to briefly describe the components and process of profiling; share some practical challenges of integrating profiling tools with Java applications specifically; and suggest some options to overcome these issues so that profiling and tuning activities can be carried out effectively. 
The term “Profiling” refers to understanding and analyzing an application's programs in terms of the time spent in various methods, calls and/or sub-routines and the memory usage of the code. This technique is used to find the expensive methods/calls/subroutines of a given application program and tweak them for performance optimization and tuning. Alternately, the amount of time it takes the code to run is also referred to as “Latency.” In typical multi-tiered applications; especially on the server side, it is called “Server-Side Latency”.
Profiling can also be used to get insights on the application in terms of memory usage -- such as how the objects/variables are created in memory provided by the runtime (JVM Heap, .NET CLR, Native Memory, or others) -- which can be further used to optimize the memory footprint of the applications. Also, for multi threaded applications, profiling can uncover issues related to Thread Synchronization and show the thread status as well.
The objective of this article is to enumerate some of the most common practical challenges in integrating Profiling Tools with Java/J2EE Applications (standalone or Application Server-based) and discuss workarounds that can help significantly reduce turnaround time required for profiling integration activity in the Application Performance Management process.
For instance, in a 3-tiered Web application comprised of a WebServer, Application Server and a Database server, let us consider an online business use case or business transaction that is taking more time to respond to an end user who is accessing it through a browser. In this context, the high end-to-end response time of the transaction could be attributed to Client Side Rendering, Web Server Request/Response handling, or Business Logic Processing on the Application Server or Query Execution on the Database Server. In order to find out the tier where the latency is high, profiling can be used to uncover where the server-side time is being spent.
types of profiling
Profiling can be broadly categorized into 2 types - CPU profiling and Memory Profiling.
CPU Profiling
CPU Profiling mainly focuses on identifying the high latency methods of the application code in Java context – it will provide the call graph for a given business process with the break-up of time spent in each of the methods. The method latency can be measured in 2 ways – Elapsed Time and CPU Time.
Elapsed Time is the time taken by a method. This includes the time taken within the method, its sub-methods and any time spent in Network I/O or Disk I/O. Ideally, the elapsed time is the duration between the entry and the exit of a method as measured by a wall clock. For instance, if a method involves the execution of business logic in Java code and SQL calls to a database (which is Network I/O), elapsed time shows the total time for all of that method to run.
Elapsed Time = CPU Time + DISK I/O + Network I/O 
CPU Time refers to the time taken by a method/function/routine spent exclusively on the CPU in executing the logic. Hence, it does not include the time taken for any I/O or any other delays from the Network or Disk I/O. For the same example mentioned above, CPU time does not show the time spent in database execution because that is Network I/O.
CPU Time = Time exclusively spent on CPU
(without I/O or any Interrupts Delay)
In general, it is the Elapsed Time which will be of interest; however, both halves provide quite valuable information about the application processing.
If the CPU time is very high for a given method/routine/function, it indicates that the method/routine/function is processing intensive and little I/O is involved for that method/routine/function.
Alternately, if the Elapsed Time is very high and CPU Time is less for a given method/routine/function , it indicates that the method/routine/function has significant I/O activity.
In extreme cases, the CPU time and the wall clock time can differ by a very large factor, especially if the executing thread has a low priority – since OS can interrupt the method execution multiple times due to low thread priority.
Memory profiling
Memory profiling refers to analyzing how the objects are created in memory provided by runtime (such as JVM for Java/J2EE applications and .NET CLR for .NET applications) and thereby to optimize the memory footprint needed for the applications. Memory profiling can also be used to find critical issues such as Memory Leaks in the applications.
generic architecture of profilers
This section provides a generic architecture of a Profiling Tool. Typically, any profiling tool will have 2 major components – an Agent and a Console.
An Agent is also sometimes called a Probe or Profiler which is a component that will run on a server (typically an Application Server) where the code is deployed and running. The agent will be attached with the JVM whichcollects performance metrics using JVMPI/JVMTI interfaces and push the data to a pre-configured port on the host machine where the application is running.
A Console is a Java program which typically captures data from the pre-configured port and displays the metrics in dashboard view. This will be used to view the metrics and also capture Snapshots that can be used for offline analysis. The diagram below illustrates the profiler’s architecture in general.
Figure. 1. Generic Architecture & Componets of Java Profilers
Process steps in profiling
In the context of Java Profilers that need to be used with Java Application Servers/Java Standalone Programs, the Java Agent needs to be attached with the JVM so it can profile the application. This process is referred to as “Agent Integration” or “Profiler Integration”. This will be done by adding specific arguments to JVM in the Application Server’s startup scripts when invoking the standalone Java program.
Figure. 2. Process Steps in Profiling Activity
The sections below are focused towards Profiling Applications running in Java based Application Servers.
practical challenges in profilers integration & remediation
In this section, I would like to present the practical challenges of integrating Java Profiling Agents with Java/J2EE Applications and Java Application Servers and suggest ways and means to overcome the issue so that profiling activity can be carried out effectively. These issues are broadly divided into several categories as illustrated in Fig.3. below. The various categories and remedies are discussed in detail in the coming sections.
Figure. 3. Profiling Integration Issues

–XX Debugging/diagnostic Java parameters
Integration of Java profilers with Java based Application Servers might not be stable or successful if the JVM arguments contain the –XX arguments, especially some of the “Debugging” or “Diagnostic” flags. It should be understood that JVM options that are specified with -XX are not stable and are not recommended for casual use. Also, these options are subject to change without notice.
Hence, an attempt is made here to list some specific –XX flags to look out for while integrating Profilers with Java Application Servers if any unexpected behavior is observed with Profiler Agents.

Recommended Solution
JVM –XX Flag
This flag turns on “point performance optimizations” that are expected to be ON by default in SUN JDK releases 1.5 and above. This flag is to try the JVM's latest performance tweaks, however a word of caution is that this option is experimental and the specific optimizations enabled by this option can change from release to release which should be evaluated prior to deploying the application to Production.
(NOTE: this should be used with -XX:+UnlockDiagnosticVMOptions & -XX:-EliminteZeroing)
JVM might not start up with a Java Agent / Probe
Application might crash

Disable these options
Remove these switches from JVM Arguments list in order to enable the profiling activity
This option disables the initialization of newly created character array objects. Typically this will be used along with -XX:+UnlockDiagnosticVMOptions
If -XX:+AggressiveOpts is to be used
JVM might not start up with a Java Agent / Probe
Application might crash

Any “diagnostic” flag of JVM/Java must be preceded by this flag.
JVM might not start up with a Java Agent / Probe
Application might crash
Enables performance-impacting “dtrace” probes – the probes that can be used to monitor JVM internal state and activities as well as the Java application that is running (Introduced in JDK 1.6. and Relevant to Solaris OS 10 above only)
If this option is already enabled, adding another Profiling Agent will make the JVM behavior unknown.

HPROF is actually a JVM native agent library which is dynamically loaded through a command line option, at JVM startup if the switch is passed to JVM arguments, and becomes part of the JVM process. By supplying HPROF options at startup, users can request various types of heap and/or cpu profiling features from HPROF.
If this option is already enabled, adding another Profiling Agent will make the JVM behavior unknown.

Non-tuned application
In certain scenarios, where the Application is not performance tuned or the business scenarios have high response time, attaching a profiler to some Applications (Application Servers) will cause time-outs and the Application Server sometimes could not even startup. This is attributed to the additional overhead profiling tools create due to byte-code instrumentation and the overhead linearly increases with the methods’ execution time causing the overall business scenario execution time to be even higher, resulting in time-outs.
Recommended Solution
The recommendation suggested for such scenarios is to identify the high latency code components “by introducing Java code in the application in order to capture the time taken in the crucial methods”. However, this approach has the disadvantage of choosing the critical/crucial classes and methods involved in a business transaction and instrument the logging statements which needs code change, code deployment. 
The following is a code snippet that illustrates this concept:

Class A {
                private long startTime, endTime;
                public Obj obj1;
                public Obj obj2;
                void public method1(){
                                startTime = System.currentTimemillis();
                                endTime = System.currentTimemillis();
<logUtility>.log(“Time Spent in method” + this.getClass().getcurrentMethod()+ “in msec:” + (endTime-startTime); 
//assuming the log utility being used for this as well

Security privileges
This section highlights the typical security or access privileges issues encountered while integrating Java profiler agents with Java Applications. The table below lists the issues and recommendations to overcome them.

Issue Description
Owner permissions for Agent Installation Folder
If the Agent/Probe installation folder is not owned by the user with which the Java process gets executed/started, the Probe/Agent libraries cannot be loaded by the Java Process due to insufficient security privileges.

The agent installation folder on the file system should have “owner permissions” similar to that of the “Java Process” recursively
For instance, if the Java process gets started by “wasadmin” user group, the agent installation folder and all its sub-directories/folder should have “wasdamin” as the owner.

(NOTE: On UNIX based platforms, Primary UserId and Primary Group should be the same as that of the Java process)

chown –R <uid:Primary Group> <Agent Installation Folder> - UNIX based Platforms
Execute Permissions for Agent Installation Folder
If the Agent Folder just has only Read and Write permissions, the Agent/probe libraries cannot be executed within Java process.
Assign proper access permissions to the Probe Installation folder recursively

E.g.: chmod 775 <Probe Installation Directory> - (UNIX/LINUX/SOLARIS/AIX/HP-UX platforms)
Read/Write/Execute permissions - WINDOWS platform
Java Applications /Application Servers enabled with Java Security 
For  Java Applications that are enabled with Java Security, profilers might not start due to missing directives in server.policy file.

Grant all security permissions to Agent/Probe’s jar file by adding the  following directives to server.policy file as below.
E.g.: When integrating HP Diagnostics Java Probe with WAS 6.1, profiler could not start and get all the required metrics. The following is added to server.policy file:
grant codeBase "file:/opt/MercuryDiagnostics/JavaAgent/DiagnosticsAgent/lib/../lib/probe.jar" { permission; };

(NOTE: For Java Security enabled standalone applications, identify the property file in which the required security directive needs to be added)

Application Caching
For certain applications, where Caching is used (either Custom Caching or Distributed Cache Products) to improve application’s performance, profilers might not work as expected. This is attributed to the increased overhead of tracking and probing each and every object available in the Cache.
As some of the Profilers/Probes will have default instrumentation points for different layers like JSP/Servlet, EJB, DAO and standard Java Frameworks like Struts, Spring and ORM tools like Hibernate, even without adding any custom instrumentation points, the overhead will be very high if the Application under consideration uses huge object Caches.
Recommended Solution
In such scenarios, it is suggested to follow one of the options highlighted below:
Disable all the default instrumentation points that come along with Probe/Profiler and observe the application behavior in the context of Profiler.
Disable Caching only while profiling the application to find out any bottlenecks in the Java code and other layers such as the Database or external systems.
Alternately, Reduce the Cache Size so that impact of object tracking and probing can be reduced (this works in most cases since it is desirable to design any Cache that can be configurable in terms of its Size and Cache Invalidation Policies).

Default instrumentation of COTS products running in J2EE Servers
If the application that needs to be profiled has some COTS products that run in JVM process area, JVM might not startup when invoked with a profiling agent.
I would like to share my experience with one of the industry-leading Rules Engine product that is hosted in a WebSphere Application Server instance which could not be started successfully after integrating a java profiler. With the profiler, the application server used to crash showing an issue in Java CompilerThread in which is a JVM library. There is no sign of any error coming from Profiler’s libraries. Without the profiler, no JVM crash is observed at any time.
Recommended Solution
Ensure that the COTS product running in Java process does not have any probing or monitoring mechanism enabled by DEFAULT. If enabled, try to disable and run with the profiler.
Check if the COTS product is compatible with the profiler in use – Many a times, it’s quite difficult to get any documentation which can confirm the product’s compatibility with the Profiler. Hence, it is suggested to reach out to the tool vendor or COTS vendor to get required support.

operating system specific
In certain cases, the number of file descriptors set in UNIX based (especially on UNIX/SOLARIS/LINUX/HP-UX) Operating Systems will have impact on the profiling process.
Since profiling activity will load more libraries and binaries, application servers in the context of Probe/Profiler might not start up properly with less number of file descriptors
Recommended Solution
It is recommended to check current file descriptors limit (using ulimit –n) and increase the number of file descriptors allowed on the system higher. It should be noted that this is not a mandatory change required whenever profiler is to be used with an application, however, it acts as one of the checklist items in case of unexpected errors while working with profiler-attached applications.

No comments:

Post a Comment