Replaced ack by ag

I’m using ack 100 times a day, it’s written in Perl and a bit slow when grepping a big directory. I recently switched to ag (aka the_silver_searcher). It’s much faster and ignores file patterns from your .gitignore. I see no reason to use ack anymore so:

$ brew install the_silver_searcher
$ brew remove ack
$ echo "alias ack=ag" >> ~/.zshrc # my muscle memory don't want to switch

I don’t work for MtGox and I can’t help you recover your funds

On February 25th, 2014 ~2:00 am GMT MtGox died. I’m sorry for all the bitcoin users who lost fiat money or bitcoins in there, and especially for some of them who were using the bitcoin arbitrage toy I wrote.

During the last 6 hours, I received ~20 emails, some of them threatening me, most of them insulting me (+1 for turkish insults). To be clear: I don’t work and I never worked for MtGox and I can’t help you recover your funds. I even warned about the risks of arbitraging bitcoin markets in a previous post, I repeat: Arbitraging is not risky, but putting fiat money or bitcoins in any exchanges is risky. Don’t trust bitcoin exchanges, even those labeled “too big to fail”.

Note: this actually revealed that the tool is used by far more people I was expecting.

Use Genetic Programming To Generate File Converters


Here is an old article I wrote in January 2009. Since then, I received some emails and a lot of 404 errors, so I decided to repost it on this “new” blog. You can find the code here on github.

Genetic algorithm, another metaheuristic inspired by evolution of species. You can read the description of the algorithm on the genetic algorithm article on Wikipedia.

I applied the algorithm to a concrete problem: help lazy programmers and non-programmers to write file converters. I had to write file converters to unify all (more or less) formatted input files into one kind of CSV file. Each of these converters is made using the right combination of filtering / regexp matching / line splitting.


So I wrote a program based on genetic algorithm where an individual is composed by 3 “genes”:

  • a list of filters (remove consecutive spaces, replace tabs by spaces, …)
  • a list of basics regexp to extract formatted basic blocks(int, float, date, line separator, …)
  • a list of cleaning functions (remove empty cell, merge cells, …)

The fitness of an individual is calculated using a sample file to parse that goes through the genes (filters / regexp / cleaning functions). The result of this process is a CSV file that can be compared to the expected sample result (that I wrote manually). The comparison uses the SequenceMatcher of the difflib Python stantard module, it returns the fitness: a float. A fitness close to 1.0 means the results is close to the expected CSV format. When the fitness is exactly 1.0, that means the converter works perfectly for the sample.

Here is a sample file I wanted to convert:

08/12        Hello world 06/12
Paris 121231231         - 22,29
08/12   Something something ... 04/12
1111 1111 1111 1111         - 14,35
08/12   something else
        - 12,96
26/11   Vir AAAAA
AAAAAA 2008     264,51

into this csv file:

"08/12","Hello world 06/12","Paris 121231231","-22,29"
"08/12","Something something ... 04/12","1111 1111 1111 1111","-14,35"
"08/12","something else","","-12,96"
"26/11","Vir AAAAA","AAAAAA 2008","264,51"

The program prints for each generation the 5 bests individual (I’m using elitism, so the best individual is always kept from a generation to the next one). A sample run:

$ python tests/sample1.txt tests/sample1.expected result1.pickled
Generation: 0 (mutation rate=10)
Generation: 1 (mutation rate=10)


Generation: 271 (mutation rate=15)
"08/12","Hello world 06/12","Paris 121231231","-22,29"
"08/12","Something something ... 04/12","1111 1111 1111 1111","-14,35"
"08/12","something else","","-12,96"
"26/11","Vir AAAAA","AAAAAA 2008","264,51"

filters: str_remove_consecutive_spaces, str_remove_somespaces, str_strip, str_remove_consecutive_spaces
regex: ([0-9]{2}/[0-9]{2})(.+?)(
cleaners: clean_strip, _in, _in, _in


The good:

  • It’s fun to watch your program evolve ;)
  • You are lazy and don’t want to write _many_ of this kind of file converters. This was used in a real life problem, with more than 200 different input file formats.
  • With a good interface and if the basic functions and genes size are well-defined, a non-programmer can create his own parser.

The bad:

  • The resulted parsers are not optimal

The ugly:

  • Of course it’s quite easy to write such genes by hand
  • It may never converge to 1.0 and generated parsers of fitness != 1.0 are useless (this may not be the case for other kind of GA applications)
  • You need many well-chosen primitives to cover a wide set of solutions

What next? Write a simple Web interface to help people generate their own converters. Add new construction blocks. Use the same idea to automatically plot any kind of formatted data.

Git tip: pre-commit hook – avoid commit FIXMEs

In my code, I sometimes add some FIXME tag (just the FIXME string in a comment or in a debug print). And before staging them in git, I review my diff to check for them. And rarely I miss one FIXME line or debug print, stage it and then commit it. To avoid that, I just wrote a quick pre-commit hook that greps the regexp "[Ff][Ii][Xx][Mm][Ee]" in staged diff using the command git diff --cached -G "[Ff][Ii][Xx][Mm][Ee]".

Here is the full script:

exec 1>&2
if [ $(git diff --cached -G "$FORBIDDEN_RE"|wc -l) != 0 ]; then
    git --no-pager diff --cached -G "$FORBIDDEN_RE"
    echo "ERROR: You're trying to commit a line containing the $FORBIDDEN_RE re"
    exit 1

Put that in the file .git/hooks/pre-commit and next time you’re trying to commit a file containing “FIXME”, it will abort and print you the involved diff.

Note that you can ignore the pre-commit hook with the following command:

$ git commit --no-verify

Self Hosted Encrypted Backups

Note: It’s not really an alternative to Dropbox since you can’t share your files with others. But personally I’m also using Dropbox to backup some of my important data (encrypted data like Bitcoin wallets or KeePass / 1Password file).

I was looking to setup a private incrementing encrypted backup, like rsync but with encryption. And I found duplicity. So I setup my cron to run this command every night:

$ duplicity --ssh-options="-oIdentityFile=~/.ssh/max.example.priv" 
  /home/max/wallets/ scp://

Install it on OS X:

$ brew install duplicity
$ pip install paramiko

On Ubuntu

$ sudo apt-get install duplicity

System Wide gitignore

I now added my specific temporary folder to my system wide gitignore. Small impact for me, a huge impact for multi-project people I work with ;)

$ sudo git config --system core.excludesfile /etc/gitignore
$ sudo sh -c "(echo '*~'; echo ".DS_Store"; echo "tmp-max/") > /etc/gitignore"

You should buy some bitcoins now (july, 8 2013)

Since now, I bought bitcoins for those reasons:

  • First to learn. I always thought the easiest way to learn about something is to experiment with it. And with my first bitcoins (bought at $2 and then at $5.5 each) I learned the basics: where to buy bitcoins, what is a transaction, what is a confirmation, what is mining, what is the bitcoin economy. I tried 5 bitcoin clients: web, mobile, native. I Wrote some scripts and plugins, I installed woocommerce + bitcoin plugins to test the ease of opening a bitcoin eshop. I read to learn more about projects, IPO, economy.

  • A dream: my life without being a bank customer. Because to live in any developed country, you need to the a bank customer. I’m 99.99% sure that dream will never come true, but at least if I could avoid Paypal, VISA, Mastercard that would be great.

  • I want bitcoin to be the future of money. if you want that, you should start using bitcoin and contribute to the economy by selling and buying things. I was pleased:

    • to pay for Reddit Gold with BTC
    • to buy my last Humble Bundles with BTC
    • to sell and buy stuff on
  • To spread the word: give small amounts to your friends and family. It works, my friends know about bitcoins and some of them owns bitcoins.

After that, I started to work for bitcoins, I’m a software developer working as freelance. I found a couple of contracts paid in bitcoins.

And I sold some because:

  • When you’re paid in bitcoins, you still need to pay the bills and taxes in $/€/¥ depending where you live.

  • I needed some cash to travel (it’s hard to find hotels and restaurants that accept bitcoins in China)

  • I did arbitrage, that implies to buy and sell bitcoins. I earned some BTC by doing this and it helps exchanges to stay aligned.

If I had cash, I would buy some today as an investment. When I write “investment”, that does not mean: “to sell later at a higher price”, that means: “to don’t have to buy them later at a higher price”. No one knows if it the good moment to buy, but the price seems low compared to the growing economy, the number of future bitcoins startups ready to launch and the very fragile banking system the world has.

Stay Away From Bitcoin Arbitrage If You’ve Got A Heart Condition

This must be true for most financial activities. But since I’m
receiving something like 5 emails per day from random people [1] who
wants to run my
bitcoin-arbitrage tool,
most of the time I’m replying with some warnings:

  • Can you trust the exchanges ? The answer is NO, you can’t:

    • Trade engine can be laggy (up to 1 hour lag).

    • APIs are buggy. get-order-book replying with 2 hours delay is
      unacceptable. some API function doesn’t work time to time

    • Bad “cloudflare” config sometimes blocks normal API calls (Yes,
      I’m looking at you bitstamp)

    • The site closes but a part of API is still working (Yes, i’m
      looking at you bitcoin24)

    • And the worst thing: they can close at every moment and block your
      money (1 hour to infinity). History prove this one, see what
      happened to Bitcoinica, Bitfloor, Bitcoin24,
      Bitcoin-Central. Who’s next ?

  • Can you trust tools ? Of course NO. If you can’t read code (blind,
    not a developer or closed source), don’t use the tool. A malicious
    dev can steal your money so easily.

  • To trade you need money, fiat AND bitcoin. If you don’t have
    bitcoins, are you ready to buy some ? Yes, risks everywhere !

  • Moving fiat is so long… SEPA transfers ? sure let me wait a week
    (I’m wondering why it’s so slow, do banks have computers ?). That
    means you’ll need A BUNCH of money (fiat AND bitcoin) in every
    exchanges you’re trading on. Then balance them time to time when
    they drift. And that opens a big risk if one exchange closes and
    blocks your money.

So yes, Bitcoin arbitrage seems easy and very lucrative, but it’s very
risky. Not because of arbitrage principle itself, but mainly because
you can’t trust exchanges.

[1] random people includes: developers (who wants to rewrite the tool
in Ruby, Java, C, Javascript, Python 2.x), students (finance, math),
home trader (that one is scary), unknown (“hey, i stumbled upon your
program, I’ll give you $500 to make the setup on my computer”).

Redis ZSET Underlying Datastructure

Sorted Sets: ZSET

Redis ZSETs are a very good to store ordered data like leaderboards:

  • You need to get the current score for an userid (every SQL or NOSQL
    can do that efficiently)
  • You need to get the current rank for an userid
  • You need to get a list of pairs userid/score sorted by score


In the Redis documentation, we can read complexity is O(lg(n)) to ZADD / ZRANK / ZREM and O(1) for ZSCORE. How are they implemented ? O(lg(n)) you said… hmmm smells like a tree or binary search. ZSCORE (a kind of search) in constant time… hmmm smells like a hashtable. So how is it implemented ? a mix of hashtable and tree ? No, but that’s close, it’s a mix of hashtable and Skip List. Skip list are an alternative to balanced binary tree.

You should read the original Skip List paper.

From the redis documentation:

ZSETs are ordered sets using two data structures to hold the same elements in order to get O(log(N)) INSERT and REMOVE operations into a sorted data structure.

The elements are added to an hash table mapping Redis objects to scores. At the same time the elements are added to a skip list mapping scores to Redis objects (so objects are sorted by scores in this “view”).

So, how to perform a ZRANGEBYSCORE in linear time ?

  1. search the min element. O(lg(n)) complexity (with n=|set|)
  2. start from min and follow links to the next element until you reach max. O(m) complexity (with m elements returned)

Also, there is a modification from the original skip list implementation, level 1 list is doubly linked, so it gives you: ZREVRANGE, yeah !

Algorithm Complexity

I started to code some random problems from Project Euler. To have an extra hint of which kind of algorithm I’m looking for, I made the following table.

Read it like: “Assuming a computer can do 10 000 000 operations/sec, with a O(n^2) algorithm, you can solve a n ~ 3000 under a second”

Complexity      |   n for 1 second  |   n for 1 minute
O(1)            |   inf             |   inf
O(lg(n))        |   huge            |   2^(10^7)
O(sqrt(n))      |   10^14           |   36*10^16
O(n)            |   10 000 000      |   600 000 000
O(n*lg(n))      |   526 172         |   24 446 750
O(n*sqrt(n))    |   46 415          |   711 378
O(n^2)          |   3162            |   24 494
O(2^n)          |   23              |   29
O(n!)           |   10              |   12

Project Euler problems should be solved under 1 minute (even in Python, most problems I solved has been < 2 secs), so if the input N is ~10 000 000, you should look for a O(n*lg(n)) solution or better.

Also if you’re developping Python code, you should have a look at these Python Time Complexity Tables. I think other language libraries use the same underlying datastructures and algorithms.

Game Design Lesson – Super Mario Bros (NES)

I read this very interesting snippet in the Wikia Super Mario Bros’ page

“Interestingly, in the very first portion of World 1-1, the developers designed it so that the a newcomer almost always gets a Mushroom. In the first level, there are blocks that the player goes under. A menacing Goomba approaches the player, and instinctively the player jumps over it. By the time the player reaches the Goomba and jumps, they will hit a ? block above that would reveal a mushroom. The mushroom goes to the right, hits a pipe and comes towards the player. Since the mushroom resembles the Goomba, the player thinks to jump over it again. Doing this, however, will almost always lead the player to jump right into the Mushroom since after they jump they hit another block from above which causes them to come back to the ground and hit the mushroom. This was to teach players that Mushrooms were a positive thing in the game.”

POST 10 http requests, maximum 5 in parallel, data POSTed is id=1, 2, …

$ seq 1 10 | xargs -t -P 5 -I % curl -s --data id=% > /dev/null
curl -s --data id=1
curl -s --data id=2
curl -s --data id=3
curl -s --data id=4
curl -s --data id=5
curl -s --data id=6
curl -s --data id=7
curl -s --data id=8
curl -s --data id=9
curl -s --data id=10