Last week I was working on a web crawler to crawl 250,000 URLs each night.  We had 5 client crawlers running and connecting to a master MSSQL 2013 database.  FusionGronker in ##coldfusion on freenode recommended that I try FusionReactor (a tool that allows you to monitor EVERYTHING that’s going on in your Java (ColdFusion) servers in real time.

After many hours of debugging, I noticed every hour or so, a thread was getting stuck in a Native Method call.  There was no way to kill the thread, since it was never coming back to Java for me to be able to communicate with.  I used the stack trace tool that comes with FusionReactor and after several hours of crawling, I noticed that there would be more and more of these threads that were stuck in a [Native Method].   It looked something like this: java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)[Native Method].  Several lines below that one, there would be a com.microsoft.sqlserver.jdbc.sqlserverdriver trace.  After 10+ of these threads got stuck, the server would just crash and burn.  I knew that I was creating a lot of SQL Connections, but it’s nothing that I haven’t done before.

I realized that Railo also had the ability to create a new MSSQL Datasource but instead of the Microsoft driver, I used the jTDS, an open source JDBC driver.  This worked great!  I don’t know what my next step would have been had this driver not worked.

Our crawler successfully crawls 250,000 URLs in under 5 hours, all on Railo!  I used FW/1 for the crawler administration tool.

Add Comment

Your email address will not be published. Required fields are marked *

This blog is meant for Corporate Zen employees to write about a variety of topics. Posts may contain information and views not directly related to Corporate Zen!