sudo apt-get install gocr yum install gocr
IMAGE FILES
To generate asatisfactory output, we need to use image processing to handle the images (i.e.intensify colors, remove undesired lines, dots, etc.), to facility thecharacters recognition. We will use simple captchas, that will require imageprocessing to modify the colorspace of the images, if the image is colored itwill provide to the GOCR a grayscale copy (it is preferable for a better result).
Our images willhave normal characters from the Latin alphabet that will be uppercase, normaland/or bold - not italic, with a traditional font (i.e. Arial, Times New Roman,Verdana etc.).


We will useImageMagick to process the images. You can use it to handle complex images,trying to generate the better possible input, but our Captcha Bypasser willjust create a grayscale copy.
sudo apt-get install imagemagick yum install imagemagick
Tocreate a grayscale copy of the image we will create a function:
# ---- Applying grayscale
grayscale ()
{
source=$img
id=`date +%N`
img="$temp_dir"/img_$id.jpg
convert $source -type Grayscale -despeckle -enhance "$img"
convert "$img" +level-colors black, "$img"
}THE PROCESS
We will create atemporary folder to save the grayscale copy of the image while running thescript:
# ---- Creating temporary directory
making_env ()
{
dd=`date +%N`
temp_dir=decaptcha_temp_$dd
mkdir "$temp_dir"
}You can use GOCRas below:
gocr [OPTION] [-i] pnm-file
I advise you toread the manual page to understand and increase your script. However theoptions we will use here are:
gocr-l 70 -C [A-Z] -i “$img”
-l level
set grey level to level (0<160<=255, default: 0for autodetect), darker
pixels belong to characters, brighter pixels areinter‐preted as background
of the input image.
-C string
only recognise characters from string, this is afilter function in cases
where the interest is only to a part of the character alphabet, you can
use 0-9 or a-z to specify ranges, use – to detect the minus sign.
-i file
read input from file (or stdin if file is a singledash).
If the image hastext with different grayscale levels would be a problem to discern everycharacter. We will use then, 3 grayscale levels - you can use more, even all:standard, 70 and 85.
gocr -C [A-Z] -i"$img" #standard level: 0
gocr -l 70 -C [A-Z] -i "$img"
gocr -l 85 -C [A-Z] -i "$img"
I decided 70 and85 after I tested many levels and checked the results, but we should let theoption to pass these levels as arguments if we will need (You can see in thecomplet code).
GOCR display anunderscore "_" for unrecognized characters by default. We will storethe results in variables and compare them, if in the first character in thefirst variable is a "_" it will be replaced by the first character inthe second variable, and so on.
# ---- \Decaptching\
dcap ()
{
recog1=$(gocr -C [A-Z] -i"$img")
recog2=$(gocr -l $number -C [A-Z] -i "$img")
for (( i=0; i<${#recog2}; i++ ))
do
array2[$i]=${recog2:$i:1}
done
#-----
for (( i=0; i<${#recog1}; i++ ))
do
array1[$i]=${recog1:$i:1}
done
for ((i=0; i<${#recog1}; i++))
do
if[ "${array2[$i]}" = "_" ]
then
cdecp="$cdecp${array1[$i]}"
else
cdecp="$cdecp${array2[$i]}"
fi
done
} We will call ourfunctions grayscale and/or dcap based on command-line arguments:
decaptcha INPUT [OPTIONS]
These optionsare:
-c colored
Musthave the -c option if the image is colored.
-l level
Tochange the standard grayscale levels.
Mustbe followed by at last 1 and maximum 2 levels.
img="$1"
check=`echo $* | wc -w`
for ((i=1; i<=$check; i++))
do
case $* in
*-c*)
shift;shift;
grayscale${img};
shift;
;;
*-l*)
shift;
case $1 in
*[0-9]*)
number="$1"
;;
*)
number=70;
;;
esac
shift;
dcap ${img} ${number}
f1=$cdecp; cdecp=""
case $1 in
*[0-9]*)
number="$1"
;;
*)
number=85;
;;
esac
dcap ${img} ${number}
f2=$cdecp
;;
*)
number=70
dcap${img} ${number}
f1=$cdecp;cdecp=""
number=85
dcap${img} ${number}
f2=$cdecp
;;
esac
done After that wewill compare the results again to find the correct one:
for ((i=0; i<${#f1}; i++))
do
if[ "${f1:$i:1}" == "${f2:$i:1}" ]
then
string="$string""${f1:$i:1}"
elif[ "${f1:$i:1}" != "${f2:$i:1}" ]
then
case${f1:$i:1} in
_)
string="$string""${f2:$i:1}"
;;
*)
string="$string""${f1:$i:1}"
;;
esac
fi
done
echo “CAPTCHA: $string”
exit 0Bellowis the complet script I created under GPL License:
Some examples: decaptcha captchas/1captcha.jpg

decaptcha captchas/2captcha.jpg -c

Attached File(s)
-
decaptcha-0.1.2.tar.gz (7.77K)
Number of downloads: 57

Help

