The following is a paragraph I wrote for a recent support engagement. At first glance it may not seem like much and may even seem a bit arrogant, but I think in many ways it sums up the diagnostic process of troubleshooting an Agile environment. However, it does omit one critical element of troubleshooting.
“I bring over 7 years of working with Agile systems in a variety of environments and operating systems combined with over 30 years of computer, network and systems experience on operating systems ranging from MS-DOS, ALL flavors of Windows, OS X, BSD Unix, Linux, Oracle Solaris and IBM AIX. I will perform the following tasks to further assess the current state of your Agile system:
• Install Oracle Support Tools
• Evaluate the output of Oracle Support Tools
• Review Agile logs
I will then combine and summarize all the information gathered with my experience and develop a plan for addressing system anomalies.”
Before you can perform any troubleshooting, you must clearly define and understand the issue you are investigating. You must consider the various symptoms as reported by system users or administrators and from that data form a hypothesis that describes, at least in your mind, the actual problem. So often in support we are deluged with symptoms of a problem that can cloud our recognition and drastically reduce our chances of success.
Sorting through the symptoms and seeing the pattern is not a skill that is easily learned. It is the cumulative result of education, experience and a dash of intuition.
The process can be enhanced by applying a few simple rules of the diagnostic process adopted here from the process employed by physicians.
Gather as much data as you can. Ignore anecdotal misinformation and focus on what is reproducible, measurable and documented. This involves gathering log files and documenting steps to reproduce errors. Anyone that has worked in support for any length of time know the difficulty in diagnosing intermittent or non-reproducible errors and for this article we will be ignoring those as they provide a unique challenge to support professionals.
The skills required at this stage of the process involve knowledge of the system, where are the logs, how does the system communicate across the network (for these purposes we will assume a networked application environment since stand-alone environments are currently the exception), how does the system interact with the host OS and what debugging tools are available.
Typical log file repository on an Agile system
To properly diagnose any problem in an Agile environment you MUST be familiar with the logging system and where the log files are located. Invest some time to locate the extensive logs produced by Agile and familiarize yourself with the stdout.log, the father of all Agile logs and the first place to look when an error is reported. It will, of course, not always be evidence in this log (that would be too easy) but is a good place to start. Other Agile logs carry more task or subsystem specific information so learn them and use them as appropriate.
Essential to your effectiveness gathering information is the toolset used to either comb through the acquired data or to easily summarize data from multiple sources. Fortunately, Oracle offers a couple of very good tools for such analysis.
First among these is ACollect. In its default form, ACollect gathers a snapshot of your system summarizing memory usage, configurations and potential improvements. ACollect can also be run in interactive mode wherein you start ACollect, it pauses and then you recreate the error or issue and restart ACollect to create a differential diagnosis of the system state. You can find out more about ACollect and the interactive mode as well as get the most recent version on Oracle Support by searching for this document: Use ACollect to Collect Info and to Diagnose Problems for Agile PLM (Doc ID 1518524.1).
Example of ACollect Summary page
ACollect is also used extensively to troubleshoot LDAP integration issues and issues related to notifications. It is a good investment of time to become familiar with installing and running ACollect before you need it.
For the more adventurous there is also this document on collecting and analyzing core dumps:
How to analyze core dumps. (Doc ID 1487728.1).
Using core dumps in Agile troubleshooting is not for the weak of heart or technical skills and can be safely delegated to Oracle Support when necessary. It is actually rare to need this depth of analysis to resolve the majority of system related issues but you should be familiar with the process for when it is needed.
Speaking of Oracle Support, this vast resource is often overlooked by support professionals when working with an Agile system issue. That is sad because not only does the Oracle Support website contain a vast array of knowledge articles which can be easily searched by keyword or phrase but it is also the gateway to some of the most knowledgeable support personnel I have ever worked with. Learning to effectively use Oracle Support resources including the Knowledgebase and the Support Request system will expand your system knowledge and result in faster, more accurate resolution of Agile system issues. A note of caution here as I have too often seen inexperienced support personnel search the Oracle Knowledgebase, identify a system patch (or patches) and immediately begin patching their system with unpredictable results. I make it a policy to never apply patches without having either experience with the patch or having consulted with Oracle Support and then ALWAYS apply patches in a test environment prior to deploying to production.
After having gathered all the relevant information, sorted out the chaff and developed a reasonable hypothesis the next step is to identify a solution and then test the solution in a controlled environment. The process of thoroughly testing ANY patch or configuration change can be emphasized enough. It is not uncommon to create another, perhaps worse, issue when attempting to ‘fix’ and existing error or issue. Agile is a complex system with myriad data interactions defined by user operations, database defined relationships and procedures as well as the data itself.
If you got the correct solution on the first try congratulations. If not, go back and take a look at the data you gathered and question your own assumptions to see if a different solution is possible. Remember, diagnosing system issues is a process where the journey is as important as the destination.
Was this guide helpful? Please share your comments below.
Author Bob McDuffee, Certified Ethical Hacker (CEH), has over 30 years experience and is a System Engineer for Zero Wait-State. He is responsible for installing software for clients and overseeing hosted and virtual environments. He provides configuration information for customers and debugs hardware issues both for clients and the company internally. His experience includes implementing, troubleshooting and upgrading PDM systems on Linux, Solaris and Windows servers utilizing both WebLogic and Oracle Application Server.