Genetic Shaping Layout

It’s been a long time since I decided to try to use genetic algorithms for optimizing a web page style, and here’s my first try. The idea is simple: generating some random styles and then making the user choose which one he prefer. Then using the genetic rules to combine the chosen CSS with the other’s. Actually with a ‘population’ of only six elements it doesn’t have a real genetic value, but the principle is the same.

The problem with this kind of implementation is that you need a human to select the best styles generated, and this prevents from having large population and selection. However, a possibility could be to implement this server side, taking advantage of the selection made by multiple users.

Here’s the link.

Implementation of a K Means Clustering to classify documents by language

Recently i’ve been interested in machine learning, and made some sample implementations to understand better the subject. In particular recently i’ve implemented a simplified version of the K-Means Clustering classifier and then I decided to apply this algorithm to a more practical task.

I’ve implemented the K-Means Clustering to classify some text by the language it’s written in. In brief, you can provide some different text to the algorithm, decide in how many groups you want them to be classified and then run the algorithm. It will partition the text depending on the different relative frequency of the letters in it, trying to recognize some structure in the different languages. It’s not perfect, but works, and there’s a live version to try here.

The larger the text is, the more the algorithm will be accurate. In fact, with short text the result will be quite random.

Graphical representation of a sample K Means Clustering classifier

Moving to the second lesson of this tutorial, i’ve learnt about the K Means Clustering classifier. Basically, We’re giving the algorithm some points of the space and it will partition the elements in K different sets. The algorithm is really easy, I suggest you to read the tutorial for further information.

The only thing I want to explain here about this algorithm is that, given a certain dataset (in our case a set of points) you should already know in how many sets you should partition it. Otherwise, the algorithm will get to a solution which may be inaccurate. To better understand this, try to use my little implementation (the link is below) making 6 sets of close points, and try to run the algorithm with K different from 6. You’ll understand why it’s important to have an accurate guess of the K value.

I modified my recent implementation of the K Nearest Neighbour to use this algorithm.

Here’s the link.

Graphical representation of a sample K Nearest Neighbour classifier

Today i read the first lines of a promising tutorial about machine learning. It starts introducing the subject and showing the first javascript example of classifier. I’ve never read anything about that, so my knowledge is still very limited (yes, I stopped before the end of the first lesson because I wanted to implement this). For technical details, I send you to the tutorial.

Anyway, what I’ve understood so far is that an important part of machine learning consists in classifiers. Classifiers are algorithm meant for “recognizing” (classify) objects by some of their features. Actually you feed them with a known dataset (already classified) and hope they will be able to classify any new object you throw in them by just recognizing their features and comparing them with the known features in the dataset. I won’t go technical on this (I can’t yet), I suggest you to read the linked tutorial if you’re really interested.

There are plenty of classifiers, but one of the simplest is the K Nearest Neighbour Classifier.It works just by representing the n features (which must be numeric) on the axis of a n-dimensional space. When you give it an element to classify, it finds the K nearest known elements (of the given dataset) and finds which class occurs most of the times in this K elements. The element is then assumed to belong to this class, and thus is classified.

I wanted to try to implement my own version of this simple algorithm, so I wrote this little Javascript app which takes some input points (x and y are the numeric features), each of them with a color (the known class), and then generates new random elements (x and y) and classifies them with the K Nearest Neighbour (with K=5 for now, but i’m changing that often), coloring the points (putting them in a class). The result is, after some time, that the entire space is colored in a way dependent by the dataset (your original input). That’s not much useful, but it’s surely funny. At least it has been for me.

Notice that this implementation uses as dataset of the current step every element in the canvas, even those generated randomly and then classified in previous steps. This is of course not very clever for this kind of classifier, but it makes the result less predictable, and fits better my purposes (I have no purposes).

Here’s the link, and here’s a screenshot:
knearest_example

Why unencrypted wireless network are bad

A lot of times, speaking to people about home wifi and security, i hear something like this: “Why should I encrypt my home wireless network, I don’t mind sharing my internet connection, I’m ok with it as long as I don’t need my whole bandwidth”. And I can’t tell that’s wrong at all.

But what these people don’t think, which is not obvious for those who doesn’t know how it works, is the whole question of security: when you connect to a wifi network, you usually exchange data over the air with an access point, which is the only device supposed to receive and process it.

But when you send data through your wireless card, you’re just broadcasting it over the air, and every device close enough to your computer which is capable of receiving wireless data could potentially receive it.

As you could imagine, when you establish a connection through an access point, you’re sending data but you’re also receiving data from it. That means that every wireless card has the capability of receiving data of a wireless connection. Obvious.

But then, how comes that I can surf the Internet without seeing the traffic of all the other people of the network? Of course, the wireless protocol grants that only the data meant to be sent to my wireless card will be processed, ignoring all the packets sent to other’s. That’s clear and reasonable.

But what you should ask now is: who is granting that the wireless protocol is working that way?

Nice question. It’s the operating system. It speaks directly to the hardware, which is meant to receive bits correctly and not much more. The card then turns the bytes received to the operating system, which interprets them and decides what to do. Usually operating systems are built to avoid people messing up with the hardware itself, which is usually a good idea, so the protocols are deeply integrated with them. Here is when you come and say: “The hardware is mine and I want to do what I want with it”. That’s why you should use Linux. Linux is free, and so are you when you use it over your computer hardware. Linux is programmed to work as you expect it would, but it always lets you do what you want if you know how (and have the right permissions).

That means yes, you can actually receive data packets the other people are broadcasting, as long as you’re close enough with the source of the wireless signal and you have a wireless card capable of doing that (most of the cards will work, but some cards which are hardware-blocking this possibility exist).

Now that you understood that you can receive other’s data, let’s go back to the encryption problem. Of course, since you’re broadcasting data over the air, you can always be received by someone else’s wireless card. But if you’re connected to a unencrypted network, you are also sending data in clear. That means you’re broadcasting to potentially anyone everything you’re sending to the network, and they can read it in clear. The funny thing is that, when you’re only receiving data, no one can notice it since you’re not transmitting anything yourself. You don’t even need to be connected to the same network, you just have to listen on the right “channel”.

Fortunately, you’re not beaten yet in this privacy war. If you’re connected to an unencrypted wireless network but you’re using an encrypted service, such as https, you’re still transmitting data in clear, but that data is https data, which has already been encrypted by the https protocol, which you and the endpoint are using (and want to use). So when the malicious listener receives the wireless data, he can see it, but he’ll found it’s encrypted data.

You can now understand that all the unencrypted traffic sent through an unencrypted connection can be intercepted and read in clear by a potential attacker.

Some examples of unencrypted services that transmits unencrypted data are ftp, pop3, smtp, http. If you use one of these protocol over an unencrypted connection you can be easily read by someone other’s computer nearby.

I’ve created a little bash script which looks for an unencrypted wireless network and starts listening for packets sent through it. Then you can use your preferred packet sniffer software to display and analyze the packets received from your wireless card.

Here’s the code:

 

#!/bin/bash

dev=wlan0

echo "Setting $dev to managed mode"
sudo rfkill unblock wifi
sudo ifconfig $dev down; sudo iwconfig $dev mode managed
sudo ifconfig $dev up

channel=""
ssid=""
unencryptedchannel=""
sleep 2
echo "Searching for unsecured network channels"
for word in `sudo iwlist $dev scan`;
do
if [ "$word" == "Cell" ]; then
channel=""
ssid=""
fi
buf=`echo "$word" | grep "Channel:" | cut -d':' -f 2`
#echo "Buf: $buf"
if [ "$buf" != "" ]; then
echo "I've got a channel! The channel $buf"
channel="$buf"
fi

essid=`echo "$word" | grep "ESSID:" | cut -d':' -f 2`
if [ "$essid" != "" ]; then
echo "The essid is $essid"
ssid="$essid"
if [ "$unencryptedchannel" != "" ]; then
break
fi
fi

enc=`echo "$word" | grep "key:" | cut -d':' -f 2`
#echo "enc: $enc"
if [ "$enc" == "off" ]; then
echo "The channel $channel has no encryption!"
unencryptedchannel="$channel"
fi
done

if [ "$unencryptedchannel" == "" ]; then
echo "No unencrypted network. Quitting"
exit
fi

echo "Your channel is $unencryptedchannel, on wifi network $ssid. Proceeding with sniffing"
sleep 1

echo "Putting $dev in monitor mode"
sudo ifconfig $dev down; sudo iwconfig $dev mode monitor
sudo ifconfig $dev up
sleep 1
while [ "`iwconfig $dev | grep Monitor`" == "" ]; do
echo "Monitor mode not set, retrying"
sudo ifconfig $dev down; sudo iwconfig $dev mode monitor
sudo ifconfig $dev up
sleep 1
done

echo "Setting $dev to channel $unencryptedchannel"
sudo iwconfig $dev channel $unencryptedchannel
echo "Interface $dev ready for sniffing."

This code is not intended to be used for malicious purposes, this is just a proof-of-concept to understand which are the real risks of transmitting through an unencrypted network. Use this code to try and intercept your own traffic while sending emails with smtp, or retrieving them with pop3, or connecting to your ftp host. You’ll better understand what I explained in this post.

It’s even possible to intercept images you are seeing on your browser through http, yes, like Facebook photos and similar. That’s because Facebook by default avoids using https after the login due to its bandwidth cost.

I should conclude encouraging you to encrypt your wireless connection if you want to protect your privacy, or at least to be aware of the risks you can take by using it with unencrypted services.

Installing EPSON Stylus SX235W on Linux Ubuntu/Mint

Since I’ve had some difficulties finding on the internet support for this printer model, I decided to write this short post about it.

I had troubles installing this printer on a Linux Mint 14, but I guess it’s the same on various Debian based distributions.

To get to the solution, I followed some instruction of this post.

The first thing to do is downloading the right package from the EPSON website, here searching for the model sx235 (without ‘W’).

Then search for the package for the ‘Linux’ operating system and with the name ‘ESC/P Driver (full feature)’.

Download it and install it.

Installing this package (epson-inkjet-printer-201108w_1.0.0-1lsb3.2_i386.deb) i got a dependency error about lsb >=  3.2, even if I had the 4.0 version currently installed.

I tried then to force installing with

sudo dpkg --force-depends -i epson-inkjet-printer-201108w_1.0.0-1lsb3.2_i386.deb

Then going to the system “Add printer” dialog, select Network Printer while connected to the same network of the printer, and you should see the printer showing up. Just select it and then the system will do the rest. It should download the drivers and set up the printer, which should be working fine.

Now the package manager should be complaining about the missing dependencies, but you can resolve the error with a

apt-get -f install

which should just remove the previously installed epson package. 

Even if it’s now removed, and the package manager is now working again, the printer should still work fine, and you can still print over the network, so everything is ok 🙂

This worked for me, I hope it helps.

SecretChat

In these months, I attended  course of Computability and Complexity, ending with some concepts of cryptography.

It’s been really interesting, so I wanted to experiment something about it.

I created a little web page using Angular.JS and just a little PHP, to implement some kind of “secure chat”, which uses a given passphrase to encrypt every message directly on the client, and then stores the messages in an online database. This chat can host multiple users with the same passphrase, which will be able to see each other messages, doing all the encryption/decription on the client side.

This means that all the traffic going through the internet has already been encrypted, so everything is stored in the database must be decrypted with the key to be read. In theory, even accessing to the database wouldn’t give any information about the messages sent through the chat.

I really don’t know exactly how commercial messaging systems do work, but I always suspected that if the provider of the service wanted to, it would be able to read the messages going through its servers, because the ‘secure’ connection (if any) is established between the server and the clients, and not between the very two endpoints of the communication.

In the system I implemented, instead, the server makes only the ‘buffering’ of the data, it never gets in touch with the encryption keys, so that it can’t read the messages it saves to the database.

I don’t really believe commercial systems work exactly that way, but the doubt was enough to get me developing this little project 🙂

The software uses the encryption standard AES, implemented in javascript with Crypto-JS.

The AES encryption is a simmetric key encryption, which means that every user share a secret key and then use it to encrypt the data sent through an unsecure channel.

Here’s the link to the chat, if you want to be sure i’m not reading your messages in the database, you should inspect the javascript of the page, understanding that the encryption is made with your passphrase, in the client side of the application. This should be enough to understand I can’t read anything without the key.

Here’s the (still in development) link.

Genetic Programming (2) – Santa Claus

My experiments with genetic programming are going on. I’m trying to focus on something more useful, so I’ve made a little page which uses gp to search the shortest path to connect a number of points, randomly generated, or specified by the user.

It is possible to modify the parameters of the genetic algorithm, editing the JSON object in the textarea.

The page is here.

Genetic Programming – Just for fun

I started reading something about genetic programming, and i must say it’s really interesting. I decided to try to implement a genetic algorithm just to go deeper and better understand how it works, so i developed a little gp evolving enviroment.

Everything in that page is really messed up, and it doesn’t completely work for now: anyway, if you want to give it a try, it’s here.