Posts Tagged ‘www’

Find me a Roof !

The last finals at university are flowing and time to choose the subject for my thesis is approaching.
Anyway, today was the turn of the presentation of the final project for the “Web Information Management” class. I and a couple of colleagues developed a search engine on the vertical domain of realties advertisement. Here are the slides sketched for the presentation.




How to use XPath in Java

Need to use XPath in Java?
Here is a quick example, to get something from an XHTML file:

import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.w3c.tidy.Tidy;
import java.io.File;
import java.io.FileInputStream;
 
File file = new File("/abs/path/to/file");
FileInputStream xhtml = new FileInputStream(file);
 
Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
Document doc = tidy.parseDOM(xhtml, null);
 
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
 
NodeList nodes = (NodeList) xpath.evaluate("XPath Expr", doc, 
                                          XPathConstants.NODESET);
 
// NodeList nodes contains now nodes.getLength() nodes, e.g.
System.out.println(nodes.item(0).getNodeValue);
System.out.println(nodes.item(1).getTextContent();
String fst_child = nodes.item(2).getChildNodes().item(0).getNodeValue();

For more, take a look at the javax.xml.xpath package javadoc and at the NodeList interface.


EH Miracle on Thirty-Hack Street A&W out

EH-MTHSAnswers and Winners for Ethical Hacker Network Christmas Contest Miracle on Thirty-Hack Street are out. Unfortunately my answer was not enough complete :D . Anyway, this is a nice chance to spread something on the most famous socnet users privacy management.

This is the answer i submitted on Jan 6 2010:

Read more


Overview of the ExAlg Approach

In the context of the Information Management for the Web class i’m currently attending, students are occasionally asked to review some related or interesting scientific publication. This time, since we had been introduced to the ROADRUNNER project mantained by the database group at Roma Tre University, I and a couple of colleagues offered ourselves to review an approach for Extraction of Structured Data from Web Pages developed at Stanford University by Arvind Arasu and Hector Garcia-Molina. Here is the presentations we wrote down for the speech.



The original paper on this work can be reached here.


Download YouTube videos with a shell script

In my last post I wrote about extracting audio from in-streaming-format videos, that people often may get from content streaming services like eg. YouTube. As said, those videos may be downloaded in several ways that may also be different depending on the hosting service. A pretty effective way is using some plugin for your browser that shows the URL of the source the flash/html5 player download the video from, and allow you to get the content from that. Anyway, just for fun, here is a small shell script that can be used to download videos contents from youtube.

#!/bin/bash
#
################################################################
# --- YouTube Downloader ------------------------------------- #
# --- http://www.n0on3.net ----------------------------------- #
################################################################
#
which wget &>/dev/null; 
if [ $? -ne 0 ];then echo "> Please install wget =)!"; exit; fi
if [ $# -ne 1 ];then 
	read -p "> Enter Youtube video ID: " VID
else	VID=$1; fi
read -p "> Do you want me to try to get the HD version ? [y|N]: " HD
case $HD in Y|y|yes) FMT=22;; N|n|*)FMT=18;; esac
OK=1; PAGE=`tempfile`
wget -q -O $PAGE "http://www.youtube.com/watch?v=$VID"
SIG=`cat $PAGE | egrep -o "\"t\": \"[^\"]+\"" | \
     sed 's/\"//g' | awk '{print $2}'`
URL="http://www.youtube.com/get_video?fmt=$FMT&video_id=$VID&t=$SIG"
TITLE=`cat $PAGE | grep VIDEO_TITLE | awk '{$1="";print $0}' `
TITLE=$(echo $TITLE | sed 's/.\(.*\)../\1/' | sed 's/\\//g')
read -p "> Is the title \" $TITLE \" right for output file ? [Y|n]: " TOK
case $TOK in 	n|N|no) read -p "> Enter filename: " TITLE ;;
		Y|y|yes|*) ;; 
esac; echo "> Saving file to $TITLE.mp4 ... "
wget -q -O "$TITLE.mp4" $URL &>/dev/null
if [ $? -ne 0 ];then echo "> Download failed!";
	OK=0; rm -fr "$TITLE.mp4";
	if [ $FMT == 22 ];then 
		echo "> Maybe the HD version isn't available.."
		FMT=18; echo "> Downloading the non-HD version ..."
	URL="http://www.youtube.com/get_video?fmt=$FMT&video_id=$VID&t=$SIG"; 
	wget -q -O "$TITLE.mp4" $URL &>/dev/null; 
		if [ $? -ne 0 ];then 
			echo "> Download failed!"; rm -fr "$TITLE.mp4";
		else OK=1; fi; 
	fi
fi
if [ $OK -eq 1 ]; then echo "> Successfully Downloaded to $TITLE.mp4 !"
fi;
#

To be said, youtube term of service, at point 6.C, states using YouTube you accept not to access the contents with any technology but the YouTube website player or other stuff explicitly authorized by YouTube. ( That means, yeah, you should really not use this u.u” ).


Extract mp3 audio from streaming videos

Sometimes you may just want to get the soundtrack of a video. And you know, today most videos are available in streaming on the web. There are several ways to download those videos, depending on the service that hosts them, but there are also many websites that explains how to do that. To give an example, youtube player retrieve videos requesting an URL with parameters that include the ID of the video and a signature it got when it’s loaded. To emulate this behaviour and get yourself the video to download, you can eg. use one of the appropriate firefox plugins , rather than getting the url from one of the many websites that build it for you. Anyway, streaming videos are in .flv or .mp4 formats. If you want to get the audio from an .flv video, you just need ffmpeg, running

ffmpeg -i "video.flv" -f mp3 -vn -acodec copy "audio.mp3"

If you want to get the .mp3 audio from an .mp4 video instead, you may use faad2 and lame to just decode the .mp4 audio to .wav e reencode it to an .mp3 .

faad -o - "video.mp4" | lame - "audio.mp3"

You can get both faad2 and lame as well as ffmpeg via apt or whatever package manager you use in Linux, and via MacPorts of Fink in Mac Os X.


Return top

About me