Last week I was working on a web crawler to crawl 250,000 URLs each night. We had 5 client crawlers running and connecting to a master MSSQL 2013 database. FusionGronker in ##coldfusion on freenode recommended that I try FusionReactor (a tool that allows you to monitor EVERYTHING that’s going on in your Java (ColdFusion) servers in real time.
After many hours of debugging, I noticed every hour or so, a thread was getting stuck in a Native Method call. There was no way to kill the thread, since it was never coming back to Java for me to be able to communicate with. I used the stack trace tool that comes with FusionReactor and after several hours of crawling, I noticed that there would be more and more of these threads that were stuck in a [Native Method]. It looked something like this: java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)[Native Method]. Several lines below that one, there would be a com.microsoft.sqlserver.jdbc.sqlserverdriver trace. After 10+ of these threads got stuck, the server would just crash and burn. I knew that I was creating a lot of SQL Connections, but it’s nothing that I haven’t done before.
I realized that Railo also had the ability to create a new MSSQL Datasource but instead of the Microsoft driver, I used the jTDS, an open source JDBC driver. This worked great! I don’t know what my next step would have been had this driver not worked.
Our crawler successfully crawls 250,000 URLs in under 5 hours, all on Railo! I used FW/1 for the crawler administration tool.