Implementation of crawler4j

Go To StackoverFlow.com

0

I am attempting to get the basic form of crawler4j running as seen here. I have modified the first few lines by defining the rootFolder and numberOfCrawlers as follows:

public class BasicCrawlController {

    public static void main(String[] args) throws Exception {
            if (args.length != 2) {
                    System.out.println("Needed parameters: ");
                    System.out.println("\t rootFolder (it will contain intermediate crawl data)");
                    System.out.println("\t numberOfCralwers (number of concurrent threads)");
                    return;
            }

            /*
             * crawlStorageFolder is a folder where intermediate crawl data is
             * stored.
             */
             String crawlStorageFolder = args[0];

              args[0] = "/data/crawl/root";

            /*
             * numberOfCrawlers shows the number of concurrent threads that should
             * be initiated for crawling.
             */
            int numberOfCrawlers = Integer.parseInt(args[1]);

            args[1] = "7";


            CrawlConfig config = new CrawlConfig();

            config.setCrawlStorageFolder(crawlStorageFolder);

No matter how I seem to define it I still am receiving the error

Needed parameters: 
 rootFolder (it will contain intermediate crawl data)
 numberOfCralwers (number of concurrent threads)

I think that I need to "set the paramaters in the Run Configurations" window but I do not know what that means. How can I properly configure this basic crawler to get it up and running?

2012-04-03 20:54
by KDEx


2

After you compile the program with the javac keyword you need to run it by typing the following:

java BasicCrawler Controller "arg1" "arg2"

The error is telling you that you aren't specifying arg[0] or arg[1] when you run the program. Also, what is with this " args[1] = "7";" after you have already received the number of crawlers parameter?

For what it looks like you are trying to do remove the first 5 lines because you are attempting to use hard coded values anyway. Then set the crawlForStorage String to your directory path and the numberOfCrawlers to 7. Then you wouldn't have to specify command line parameters. If you want to use command line parameters get rid of your hard coded values above and specify them at the CL

2012-04-03 21:15
by user1288802
That worked, I had to hard code the directory and just get rid of the exception catch all together. Thanks - KDEx 2012-04-03 21:55