Overview of the ExAlg Approach

In the context of the Information Management for the Web class i’m currently attending, students are occasionally asked to review some related or interesting scientific publication. This time, since we had been introduced to the ROADRUNNER project mantained by the database group at Roma Tre University, I and a couple of colleagues offered ourselves to review an approach for Extraction of Structured Data from Web Pages developed at Stanford University by Arvind Arasu and Hector Garcia-Molina. Here is the presentations we wrote down for the speech.



The original paper on this work can be reached here.


Sushi (27-03-2010)

I’m becoming better at preparing sushi u.u









Great dinner ^^


Himitsu Bako

This is an amazing gift i just received for my birthday. It looks just like a wooden box, and lots of people can find that not interesting at all, but it’s actually an Himitsu-Bako, a typical japanese puzzle-box that may be opened only with the application of a combination of moves on its pieces.

Here is a video that shows what is that like:

Of course this is not something that will always prevent anyone to get what’s inside, since the box can be easily broken down with an hammer hit, or smashed into pieces against something hard, and still the combinations of moves for the pieces are not too many to try if having a long time. But if who may want to open don’t want to break it or don’t want the owner to know the box has been opened, and can’t try many combinations, it’s pretty effective. Obviously publishing a video showing how to open this one is not the best way to ensure it won’t be opened, but i really like this object, and i want to share something about that.

Read more


NTFS read/write on Mac Os X

I’m restoring my osx machine, and it seems Snow Leopard is capable to read/write NTFS Filesystems without any ntfs-3g or whatever 3rd stuffs. It just, by default, mount the volumes read only. Searching the web, it turns out that for each volume, you can get the volume name or UUID, by plugging the drive and running:

diskutil info /Volumes/volume_name | egrep "Volume (UUID|Name)"

and then insert one of following lines in your /etc/fstab

UUID=your_volume_uuid none ntfs rw
LABEL=your_volume_name none ntfs rw

That seems really boring to do that each time you want to plug a new volume…
So you can think to edit ‘/System/Library/Filesystems/ntfs.fs/Contents/Info.plist’, adding the ‘-o rw‘ option to the FSMountArguments element, but it seems that is completely ignored…

Well, know what, at least you can wrap the /sbin/mount_ntfs to add the rw option:

  • move /sbin/mount_ntfs to something like mount_ntfs_original
  • write a shell script for /sbin/mount_ntfs, like:

  • #!/bin/bash
    /sbin/mount_ntfs_original -o rw "$@"
  • give it the right permissions and enjoy ;)

Snow in Rome

Well, this deserves a post. Check out what was the view like this morning.
I don’t remember snowing like this in Rome before :)

Snow in Rome 12/02/2010

Download YouTube videos with a shell script

In my last post I wrote about extracting audio from in-streaming-format videos, that people often may get from content streaming services like eg. YouTube. As said, those videos may be downloaded in several ways that may also be different depending on the hosting service. A pretty effective way is using some plugin for your browser that shows the URL of the source the flash/html5 player download the video from, and allow you to get the content from that. Anyway, just for fun, here is a small shell script that can be used to download videos contents from youtube.

#!/bin/bash
#
################################################################
# --- YouTube Downloader ------------------------------------- #
# --- http://www.n0on3.net ----------------------------------- #
################################################################
#
which wget &>/dev/null; 
if [ $? -ne 0 ];then echo "> Please install wget =)!"; exit; fi
if [ $# -ne 1 ];then 
	read -p "> Enter Youtube video ID: " VID
else	VID=$1; fi
read -p "> Do you want me to try to get the HD version ? [y|N]: " HD
case $HD in Y|y|yes) FMT=22;; N|n|*)FMT=18;; esac
OK=1; PAGE=`tempfile`
wget -q -O $PAGE "http://www.youtube.com/watch?v=$VID"
SIG=`cat $PAGE | egrep -o "\"t\": \"[^\"]+\"" | \
     sed 's/\"//g' | awk '{print $2}'`
URL="http://www.youtube.com/get_video?fmt=$FMT&video_id=$VID&t=$SIG"
TITLE=`cat $PAGE | grep VIDEO_TITLE | awk '{$1="";print $0}' `
TITLE=$(echo $TITLE | sed 's/.\(.*\)../\1/' | sed 's/\\//g')
read -p "> Is the title \" $TITLE \" right for output file ? [Y|n]: " TOK
case $TOK in 	n|N|no) read -p "> Enter filename: " TITLE ;;
		Y|y|yes|*) ;; 
esac; echo "> Saving file to $TITLE.mp4 ... "
wget -q -O "$TITLE.mp4" $URL &>/dev/null
if [ $? -ne 0 ];then echo "> Download failed!";
	OK=0; rm -fr "$TITLE.mp4";
	if [ $FMT == 22 ];then 
		echo "> Maybe the HD version isn't available.."
		FMT=18; echo "> Downloading the non-HD version ..."
	URL="http://www.youtube.com/get_video?fmt=$FMT&video_id=$VID&t=$SIG"; 
	wget -q -O "$TITLE.mp4" $URL &>/dev/null; 
		if [ $? -ne 0 ];then 
			echo "> Download failed!"; rm -fr "$TITLE.mp4";
		else OK=1; fi; 
	fi
fi
if [ $OK -eq 1 ]; then echo "> Successfully Downloaded to $TITLE.mp4 !"
fi;
#

To be said, youtube term of service, at point 6.C, states using YouTube you accept not to access the contents with any technology but the YouTube website player or other stuff explicitly authorized by YouTube. ( That means, yeah, you should really not use this u.u” ).



Return top

About me