Program that read’s Web page Content

Jsoup


Jsoup is a Java library used to parse HTML from a URL, file, or string find and extract data .
Latest version  jsoup jar (version 1.8.1)

Program that read’s Web page Content


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.*;
public class ParseHTML {
public static void main(String args[]) throws IOException{
        Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Abstraction_(computer_science)").get();
       try {
 String text = doc.body().text();
File file = new File("E://mahi.txt");
 // if file doesnt exists, then create it
                                    if (!file.exists()) {
                                                file.createNewFile();
                                    }

                                    FileWriter fw = new FileWriter(file.getAbsoluteFile());
                                    BufferedWriter bw = new BufferedWriter(fw);
                                    bw.write(text);
                                    bw.close();

                                    System.out.println("Done");

                        } catch (IOException e) {
                                    e.printStackTrace();
                        } }}



SHARE

About df

    Blogger Comment
    Facebook Comment

0 comments:

Post a Comment