Memory exhaustion, long garbage collection time and the Hibernate query plan cache

Recently, we released a new Spring Boot webservice. It uses Hibernate for database queries and persistence. (Spring Boot 2.1.10 and Hibernate 5.3.13). Within a week of release, Dynatrace alerted us to memory exhaustion and long garbage collection time on all the deployments. The app was started with the -XX:+HeapDumpOnOutOfMemoryError command line option so heap dumps were generated automatically. While the heap dumps were analysed for root cause of the memory problem, the apps were restarted with double the memory limits to keep things running.

I loaded the heap dump with Eclipse memory analyzer and its Leak Suspects Report. The tool identified one problem suspect, which used 321.5MB out of 360MB of heap memory.

One instance of “org.hibernate.internal.SessionFactoryImpl” loaded by “org.springframework.boot.loader.LaunchedURLClassLoader @ 0xe0035958” occupies 337,113,320 (89.29%) bytes. The memory is accumulated in one instance of “org.hibernate.internal.util.collections.BoundedConcurrentHashMap$Segment[]” loaded by “org.springframework.boot.loader.LaunchedURLClassLoader @ 0xe0035958”.

When I looked at the ‘Accumulated Objects in the Dominator Tree’, I noticed there were a large number of Bounded Concurrent Hash Map entries. Clicked on an individual hash map entry and use the list objects with incoming reference option, I could see that the segments of the hash map are query plan cache objects.

To look at what is stored inside the query plan cache, I used the OQL function in the memory analyzer with the follow query.

SELECT l.query.toString() 
FROM INSTANCEOF
org.hibernate.engine.query.spi.QueryPlanCache$HQLQueryPlanKey l

By inpsecting the strings returned by the query, I spotted the problem. The hibernate query cache plan cached a lot of identical queries with the numeric primary key parameter in the where clause. Like this

generatedAlias0.numberDetails as generatedAlias59 left join fetch
generatedAlias59.ddiRange6 as generatedAlias60 where generatedAlias0.id=500685

Reading the hibernate documentation, I learned that by default Criteria queries use bind parameters for literals that are not numeric only. My application used only the numeric primary key for access. Hibernate was therefore caching a new plan for every individual row in the database. As a result, the application was performing worse than not having a query cache plan at all!

To fix this, set the parameter hibernate.criteria.literal_handling_mode to BIND. This instructs hibernate to use bind variables for any literal value. In Spring Boot, the variable is spring.jpa.properties.hibernate.criteria.literal_handling_mode.

To confirm that the query cache plan problem was fixed, I used jconsole to take a heap dump

Running the OQL query again and I can see the query plans have changed to using bind parameters in the where clause.

generatedAlias0.numberDetails as generatedAlias59 left join fetch
generatedAlias59.ddiRange6 as generatedAlias60 where generatedAlias0.id=:param0

Indexed computed column for frequenty used XML values

If you query an XML element or attribute value frequently, you can promote the value into a column and create an index on the promoted column. This makes the code easier to read and queries faster.

As an example, you have a XML column called requestXML in the table Orders. Within the requestXML is a XPath element /Order/accountId which your application queries frequently. If you try to create a computed column for the element value as follow, SQL server will return the error message ‘Xml data type methods are not supported in computed column definitions. Create a scalar user-defined function to wrap the method invocation’.

ALTER TABLE dbo.Orders
  ADD accountId AS requestXML.value('(/Order/accountId)[1]', 'varchar(6)')

This is SQL server way of saying you need to create a function

create function fnGetAccountIdFromRequestXML(@request xml)
 returns varchar(6)
 with schemabinding
 begin
     declare @accountId varchar(6)
     select @accountId = @request.value('(/Order/accountId)[1]', 'varchar(6)')
     return @accountId 
 end

Don’t forget the ‘with schemabinding’ part if you want to create an index on the computed column. Now you are ready to create the computed column and then the index

ALTER TABLE dbo.Orders
ADD accountId AS dbo.fnGetAccountIdFromRequestXML(requestXML);

CREATE NONCLUSTERED INDEX IX_Orders_accountId  ON  dbo.Orders(accountId)

The accountId element in requestXML has now been promoted to an indexed computed column. It can now be used in queries just like any normal SQL database table columns.

Computing the prefix from a alphanumeric varchar column in SQL Server

Recently, I needed to do some data analysis for a project I was working on. The order references are alphanumeric strings, with a variable length prefix that indicates the origin system. For example, order references AB123456789 and MOB98765 have prefixes AB and MOB.

I have pulled the data into a SQL database table for analysis. I wanted to use the prefix in SQL group by and aggregrate functions. To accomplish this, I created a computed column for the order prefix.

ALTER TABLE datatable
ADD order_ref_prefix 
AS (stuff(order_ref, patindex('%[0-9]%', order_ref), 10000, ''))

This computed column make use of two SQL server functions patindex and stuff. Patindex returns the starting position of the first occurrence of a pattern in a string. In this example, it returns the first occurence of a number in the order reference. Stuff replaces part of a string with another. Here, we replace the substring between the start of the first number in the order reference to position 10000 with an empty string. This effectively truncates the order reference from the first number found.

Once I have the order prefix in its own column, I can use them in queries like these:

select order_ref_prefix, count(*) as occurrence
from datatable
group by order_ref_prefix
order by occurrence desc
select *
from database
where order_ref_prefix = 'MOB' and <other predicates>

Super handy!!

Logging outbound HTTP requests from JAX-RS client

In order to track down a bug, I needed to log HTTP requests sent from one of our web services to another third party web service. (We hosted the service, but the software was not developed in house).

Our web service was written in resteasy, a framework I was not especially familiar with. (I prefer to use the Spring stack, and always create new web services using Spring Boot). The code to call the third party web service looked like this

import javax.ws.rs.client.Invocation.Builder;
builder.buildPost(Entity.form(form)).invoke();

Surprisingly, there wasn’t an obvious way to get the request body sent. From various stackoverflow Q&A, the way to log JAX-RS outbound client requests was to create an implementation of ClientRequestFilter, and register it as a Provider in the container.

@Provider
public class MyClientRequestLoggingFilter implements ClientRequestFilter {
  private static final Logger LOG = LoggerFactory.getLogger(MyClientRequestLoggingFilter.class);	
    @Override
    public void filter(ClientRequestContext requestContext) throws IOException {
      LOG.info(requestContext.getEntity().toString());
    }
}

You then configure your web.xml to scan for providers

<context-param>
  <param-name>resteasy.scan.providers</param-name>
  <param-value>true</param-value>
</context-param>

There are quite a few warnings that because the function ClientRequestContext.getEntity() returns an Object, the default toString() may not work as expected. Unmarshalling of the object is required to log the request body.

After banging my head against a wall for an afternoon, I decided to take a completely different approach to the problem. I googled on how to enable request logging in apache httpd instead. This turned out to be a much more straightforward way to achieve what I needed. The module mod_dumpio can used to dump all input and output requests to the server into a log file. You need mod_dumpio present in the apache httpd installation. (In windows, check to see if mod_dumpio.so is in c:\apache-install-dir\modules). Stop the service, edit the httpd.conf file to include the following lines

LoadModule dumpio_module modules/mod_dumpio.so

ErrorLog "logs/error.log"
LogLevel debug
DumpIOInput On
DumpIOOutput On
LogLevel dumpio:trace7

The ErrorLog and LogLevel lines are already present in my httpd.conf. I changed the LogLevel to debug, and added the follwoing three lines to turn on the dumpio module. After server restart, all HTTP requests and responses were successfully logged to the file logs/error.log.

Lesson learnt here – if an approach turned out to be more complicated than expected, it’s worth taking a step back and rethink.

Using two datasources in a Spring Boot application

Using one datasource only in a Spring Boot application is very straight forward. However, using multiple datasources in an application is anything but! It took me quite a bit of googling and fiddling to find a solution that worked.

To use two datasources, you need to set one up as primary. The second datasource will then become the secondary. You set a datasource as the primary by using the primary attribute. Below is an example using XML based configuration

<bean id="greenDataSource" primary="true" class="org.apache.commons.dbcp2.BasicDataSource">
    <property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver"/>
    <property name="url" value="${db.green.url}"/>
    <property name="username" value="${db.green.username}"/>
    <property name="password" value="${db.green.password}"/>
</bean>

Then define the secondary datasource like this:

   
<bean id="purpleDataSource" class="org.apache.commons.dbcp2.BasicDataSource">
    <property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver"/>
    <property name="url" value="${db.purple.url}"/>
    <property name="username" value="${db.purple.username}"/>
    <property name="password" value="${db.purple.password}"/>
</bean>

You can then wire them into your Java classes using the @Autowired and @Primary annotations:

@Repository
public class AwesomeDaoImpl implements AwesomeDao {
    private JdbcTemplate greenJdbcTemplate;
    private JdbcTemplate purpleJdbcTemplate;

    @Autowired
    @Primary
    public void setGreenDataSource(DataSource greenDataSource) {
        this.greenJdbcTemplate = new JdbcTemplate(greenDataSource);
    }

    @Autowired
    public void setPurpleDataSource(DataSource purpleDataSource) {
        this.ipdcJdbcTemplate = new JdbcTemplate(purpleDataSource);
    }
}

I haven’t figured out how to plumb in more than two datasources without using JNDI. If JNDI is available, then your Spring Boot application can access all the JDNI datasources using the @Resource annotation.

@Repository
public class ColourDaoImpl implements ColourErrorDao {
    private JdbcTemplate jdbcTemplate;

    @Resource(mappedName = "java:jboss/datasources/Green")
    public void setGreenDataSource(DataSource greenDataSource) {
        this.jdbcTemplate = new JdbcTemplate(greenDataSource);
    }
}

Dropping unknown default constraints in SQL server

For releases, we had to provide SQL scripts to install database changes. This had to be done outside of SQL server management studio and run on multiple environments. Occasionally, I had to drop unnamed constraints in the script. (Most developers wrote their install scripts with named constraints to avoid the difficulty). Every time I encountered this problem, I googled and followed this excellent blog post by Rob Farley. It gave the SQL commands to query for the name of an unnamed constraint, given the table and column name. It stopped short of giving you the SQL code to actually drop the constraint.

So here is my take:

declare @df_name varchar(max)
select @df_name = d.name from sys.tables t
    join sys.default_constraints d
        on d.parent_object_id = t.object_id
    join sys.columns c
        on c.object_id = t.object_id
        and c.column_id = d.parent_column_id
    where t.name = 'some_db_table'
    and c.name = 'some_column_in_table'

if @df_name is not null 
begin
	declare @sql varchar(max) = 'alter table some_db_table ' + ' drop constraint ' + @df_name 
	exec (@sql)
end

Java 8 Date-Time API and good old java.util.Date

Am I the only one who prefer Joda Time over the new Java 8 java.time package? I find the official Oracle documentation poor, and the API is not as intuitive.

No matter which high level datetime library is used in an application, be it java.util.Calendar, Joda Time or java.time, developers still often have to work with old fashion java.util.Date. This is because java.sql.Date is a subclass of java.util.Date and therefore most, if not all, data access layer code expects or returns java.util.Date.

To convert a datetime such as 2016-11-21 09:00 to java.util.Date is very simple in Joda Time.

// from Joda to Date
DateTime dt = new DateTime();
Date jdkDate = dt.toDate();

// from Date to Joda
dt = new DateTime(jdkDate);

Java 8 java.time has two separate ways to represent time – human time vs machine time. Classes such as LocalDateTime and LocalDate represents human time. The Instant class represents machine time. Conversions between date time and java.util.Date must be done via an Instant.

// from LocalDateTime to Date
LocalDateTime dt = LocalDateTime.of(2016, 11, 21, 09, 00);
Instant i = dt.atZone(ZoneOffset.UTC).toInstant());
Date d = Date.from(i);

// from Date to LocalDateTime
i = d.toInstant();
dt = LocalDateTime.ofInstant(i, ZoneOffset.UTC);

You can also compare the documenation of the two libraries on interoperability with java.util.time. The Joda Time one is much shorter and easier to read.

Mule flow variables to JSON payload for REST requests

I was working on a mule flow that submit a static JSON request to a REST endpoint. (The variable part was the id in the URL). My first attempt was to set the JSON request directly using <set-payload>.

<set-variable variableName="orderId" value="#[payload.id]" doc:name="set orderId"/>
<set-payload value="{'note' : 'Order auto-approved by X', 'sendEmail' : true}" doc:name="Set Payload"/>
<http:request config-ref="WS_CONFIG" path="/order/#[flowVars.orderId]/approve" method="POST" doc:name="REST approve request">
  <http:request-builder>
    <http:header headerName="Content-Type" value="application/json"/>
  </http:request-builder>
</http:request>

However, mule refused to submit this request, complaining about ‘Message payload is of type: String’. Most pages I found from googling suggested using the DataWeave Transformer. It can transform data to and from a large range of format, including flow variables into JSON. But the DataWeave Transformer was only available in the enterprise edition. After a frustrating hour of more googling and testing various different transformer, I found another way to achieve this easily by using a expression transformer:

<set-variable variableName="orderId" value="#[payload.id]" doc:name="set orderId"/>
<expression-transformer expression="#[['note' : 'Order auto-approved by X', 'sendEmail' : true]]" doc:name="set payload"/>
<json:object-to-json-transformer doc:name="Object to JSON"/>
<http:request config-ref="WS_CONFIG" path="/order/#[flowVars.orderId]/approve" method="POST" doc:name="REST approve request">
  <http:request-builder>
    <http:header headerName="Content-Type" value="application/json"/>
  </http:request-builder>
</http:request>

The flow I worked on didn’t need the order id in the JSON request. But you can reference flow variables in the payload like this:

<set-variable variableName="orderId" value="#[payload.id]" doc:name="set orderId"/>
<expression-transformer expression="#[['note' : 'Order auto-approved by X', 'id':flowVars.orderId, 'sendEmail' : true]]" doc:name="set payload"/>