1. Perl 5.18 will find your bugs, whether you want it to or not

    Here at Nestoria we like to run a fairly up to date Perl 5. We don’t use the bleeding edge - we have a business to run, and the .0 release that rolls around every May usually has a few issues that need ironing out - but on the other hand I start to get a little nervous if we’re more than a year behind. We want to have the opportunity to use the latest and greatest language features and CPAN modules after all.

    So in August we spent a few days to upgrade from Perl 5.16.3 to Perl 5.18.2.

    Understanding the differences between those two versions means reading the relevant perl5delta documents:

    • perl5180delta describes differences between the v5.16.0 release and the v5.18.0 release.
    • perl5181delta describes differences between the 5.18.0 release and the 5.18.1 release.
    • perl5182delta describes differences between the 5.18.1 release and the 5.18.2 release.

    The main new features were:

    • Smart Match (~~) marked “experimental” and should not be used
    • Hash Key Randomization much much more random
    • Support for all the new characters from Unicode 6.2.0 - Yay ❤
    • ${^LAST_FH} for accessing the last read file-handle
    • Set operations for character classes
    • Lexical subroutines (my sub { ... })
    • Computed labels (next $foo;)
    • Negative signal names (kill -INT, $pid)
    • Vertical tabs are now matched by \s (yes this actually affects us for parsing XML feeds for nestoria.in!)

    With today’s post I’d like to talk about those first two - smart match and hash key randomization - and how those changes helped us find a number of nasty bugs in our code base.

    Smart Match

    The Smart Match operator (~~) was a great idea, but unfortunately poorly executed. All we really wanted was $x ~~ \@xs wasn’t it? But instead we got something so nasty that the Perl 5 team correctly decided to officially deprecate its usage.

    For us that meant going around replacing all of our existing uses of smart match. Some were easy:

    $a_keyword ~~ @$ra_keywords
    

    Becomes:

    any { $a_keyword eq $_ } @$ra_keywords
    

    Others were trickier:

    @four_octets ~~ [0, 0, 0xFE, 0xFF]
    

    Got me to reach for CPAN:

    $ArrayCompare->compare(\@four_octets, [0, 0, 0xFE, 0xFF])
    

    (100 points to anybody who correctly recognised that as a check for the Byte-Order Mark for UCS-4BE.)

    My person favourite tricky case was:

    my @countries_to_run = map { $_ if $_ ~~ @live_countries } @arg_countries;
    

    A smart match inside a map… using the simple List::MoreUtils::any() solution goes out the window somewhat… or does it?

    my @countries_to_run = map { my $c = $_; (any { $c eq $_ } @live_countries) ? $_ : () } @arg_countries;
    

    Nasty, but it works.

    Don’t forget that you also need to remove all uses of the given/when switch statement as well because that implicitly uses smart match.

    To check that our work was done, and avoid any forgetful regressions, we added a test that scans all of our Perl code for ~~ and given and fails if it finds any.

    We call these types of tests “infrastructure” tests. Other examples include Perl::Critic, checking that all .pl scripts are executable in version control, and that our crontab files contain valid bash commands. If you’d like to hear more about our suite of infrastructure tests let us know in the comments.

    Hash Key Randomization

    This is a big one.

    In Perl 5.16 and before your hash key ordering was random (it never represented alphabetical sorting, or insert time, or anything like that) but it was quite consistent. Specifically if you used the same Perl executable on the same machine to run the same script multiple times, you would probably get consistent results.

    But with Perl 5.18 and beyond the hash key ordering is much more random. Specifically when your Perl process starts up it pre-seeds a random element into the hashing algorithm, such that the chance of consistent results over multiple runs of the same script drops dramatically.

    By way of an example:

    ~$ perl5.16.3 -E 'my %h = (a => 1, b => 2, c => 3); say join(" ", keys %h) for 1 .. 3;'
    c a b
    c a b
    c a b
    ~$ perl5.16.3 -E 'my %h = (a => 1, b => 2, c => 3); say join(" ", keys %h) for 1 .. 3;'
    c a b
    c a b
    c a b
    ~$ perl5.18.4 -E 'my %h = (a => 1, b => 2, c => 3); say join(" ", keys %h) for 1 .. 3;' 
    c a b
    c a b
    c a b
    ~$ perl5.18.4 -E 'my %h = (a => 1, b => 2, c => 3); say join(" ", keys %h) for 1 .. 3;'
    b a c
    b a c
    b a c
    

    This change was made primarily to fight against a possible hash-related security hole, which you can read more about in this post by Yves Orton on Booking.com’s blog. Now that security concern was unlikely to affect us, but as Yves alluded to in that post, randomization is good anyway! It finds bugs!

    As most CPAN authors found, when CPAN Testers bots started using Perl 5.18 and above, we had a lot of hash ordering assumptions in our test code.

    Another important change was when sorting on the keys of a hash, it became very important to add secondary and tertiary sort orders to keep the results completely deterministic.

    For example:

    my @sorted_places = sort {
        distance_from($a, $g) <=> distance_from($b, $g)
    } keys %places_found;
    

    Becomes:

    my @sorted_places = sort {
        distance_from($a, $g) <=> distance_from($b, $g)
        || $a cmp $b
    } keys %places_found;
    

    Or better yet, shout out to Module of the Month July 2014 winner Sort::Key which let’s you write:

    use Sort::Key::Multi qw(i_s_keysort);
    
    my @sorted_places = i_s_keysort { distance_from($_, $g), $_ } keys %places_found;
    

    The second trickiest case I had to solve was related to our use of -content-type and -charset in CGI.pm. To quote my git commit message:

    CGI::headers() accepts both ‘-content-type’ and ‘-type’/’-charset’ separately, and it appears that which one it counts as the default and which it counts as the override is now subject to hash key ordering randomness. Bah.

    This lead to some mojibake showing up on http://www.nestoria.es for a few hours before we figured out it was a hash key ordering problem. Bah indeed.

    Aside: yes we are still using mod_perl and CGI.pm :-( It’s on the long long to do list… I expect 2015 will be the year of PSGI at Nestoria. Are we fashionably late to the party?

    And finally, the trickiest case I had to solve: to keep Nestoria fast one the tricks we use is generating a sprite image and corresponding CSS sprite file. One of our developers wrote and released CSS::SpriteMaker and a corresponding grunt task grunt-css-spritemaker, which uses node-perl to interface directly from Node.js to Perl.

    It does this by linking against libperl.

    And when we upgraded to Perl 5.18 we forgot to re-link.

    So weeks after we thought we’d cleared out all the hash order bugs we were upgrading Node.js and re-linked node-perl against Perl 5.18’s libperl. And suddenly our sprite kept re-ordering itself!

    Which wouldn’t actually be a problem in most cases, if the sprite image was consistent with the CSS file who cares what order things are in? Well unfortunately we do: you see our sprite file contains our map pins, and their locations are calculated in JavaScript rather than using CSS classes, which means that we need our map pins to always be at the top of the sprite image.

    A lot of s/keys/sort keys/g and a github pull request later and the last Nestoria hash order bug was solved… or at least so we hope!


    Thanks for reading! Let me us know in the comments what your favourite new or upcoming Perl 5 feature is. I for one can’t wait to use Perl 5.20’s post-fix dereferencing!

    Alex Balhatchet (@kaokun)

     
  2. Comments
  3. Good looking addresses

    The team at our sister brand OpenCage Data have rolled something out that we though Nestoria dev readers might find relevant.

    Given a bunch of geodata about a location, (for example like what you might get from OpenStreetMap) how can we form that data into an address or placename that makes sense to a consumer. This challenge is non-trivial as address formats are different from country to country. So we’ve released an open source set of templates (in the widely used Mustache format) that can be used to programmaticaly format addresses correctly.

    It’s all in the “address-formatting" repository over on github. NOt all countries are done yet, but bit by it they are being added.

    The templates themselves are programming language independent. As regular readers of this blog will know, we’re heavy users of perl and as such have written a perl implementation of a parser. Geo::Address::Formatter is over on CPAN, just waiting for you to use it (and the code’s of course on github, where we welcome your pull requests). That said, they would welcome other parser implementations.

    There’s a post up on the OpenCage blog that goes into the motivations of project, and explains the problem set in more detail. There have already been a few pull requests coming in, and more are of course welcome.

    In other geo news, we continue our long running streak by sponsoring WhereCamp Berlin which is coming up in mid-November. If you can’t make it to Berlin, hopefully you can join us at the next #geomob in London on the 4th of November.

     
  4. Comments
  5. Maintaining a consistent linear history for git log -​-first-parent

    A merge commit in Git is simply a commit with two or more parents. What you may not realise is that the parents of a commit are ordered, and you can use this property in your Git workflow.

    Let’s take this example starting point. We have two branches, master and example-branch, represented by orange rectangles, and a few commits, represented by yellow rectangles. Arrows point from commits to their parent commits. If you learnt Git by reading books like Pro Git, this kind of diagram should be familiar to you.

    From this starting point, we run these commands:

    $ git checkout master
    $ git merge example-branch
    

    After running these commands and resolving any conflicts, a new commit is created with two parents, the first parent being the tip commit of master at the time, and the second parent being the tip commit of example-branch. The parents are guaranteed to be in that order. The tip of master is then updated to be the newly created commit, of course.

    End point diagram

    What most books fail to mention is that parent commits are ordered, hence why I numbered the arrows leaving C5 in my diagram. (To be consistent with commands like git revert -m, I began the numbering of parents at 1, not 0.)

    git log --graph respects the order of parent commits:

    $ git log --graph --oneline master
    * C5 # merge commit
    |\
    * | C4
    | * C3
    | * C2
    | * C1
    |/
    * C0
    

    It is possible to create the same graph as in the previous diagram, but where the order of the parents of C5 is reversed. I’ll leave you to work it out and move on to my next point.

    Why is the order of parent commits important?

    It’s often useful to view a repository’s history as if it were linear, flattening the graph, often hiding certain commits in the process. For example, you can hide merge commits with git log --no-merges.

    The other option is git log --first-parent, which we use extensively at Lokku:

    $ git log --first-parent --oneline master
    C5 # merge commit
    C4
    C0
    

    git log starts at master, and then follows each commit’s parents. With the --first-parent option, it only follows the first parent, thus skipping C3, C2 and C1 in this example.

    We use --first-parent for our continuous integration testing infrastructure. Only commits that are in master's --first-parent history are tested. This allows us to keep a simple testing infrastructure that expects a linear history.

    Once I got used to --first-parent, I started to think of merge commits as summary commits. A merge commit brings into the main branch a bunch of smaller commits that may be quite trivial, but they are hidden by --first-parent, and only the merge commit with its summarising message is displayed. I found it helped to think of it like folds in my favourite text editor. Once this mentality sinks in, options like --no-ff make more sense, as you may still want to create a merge/summary commit even when a fast-forward merge is possible.

    $ git log --first-parent --oneline master
    C5 # merge commit, which hides commits C3, C2 and C1
    C4
    C0
    

    The “git pull” with no “-​-rebase” pitfall

    Much has been written about this pitfall, but it has another dimension if you want to maintain a consistent linear history using --first-parent, like we do.

    For this example, let’s take a repo publish at company.com with just one commit. Alice clones it, makes some changes and commits the change C2 locally:

    When she attempts to push her commit, she discovers that her colleague Bob has already pushed a commit himself (C1), so she must merge her changes. To do, she just runs git pull, which fetches and merges the published changes:

    Notice the order of the parent commits of C3. Because Alice merged origin/master into master, her commits are in the --first-parent history.

    At this point in time, the --first-parent published history looks like this:

    $ git log --first-parent --oneline origin/master
    C1
    C0
    

    Then she pushes her changes, publishing the commits C2 and C3:

    Now, the --first-parent published history looks like this:

    $ git log --first-parent --oneline origin/master
    C3
    C2
    C0
    

    C1 has disappeared from the published --first-parent history!

    For our purposes, this is not ideal. It now looks like C3 is the merge commit that brought in Bob’s commit C1, when really, we would like C3 to look like it brought in Alice’s commit C2.

    One solution is to recommend that people use git pull --rebase, which would rebase the commits onto published history, avoiding the creation of a merge commit altogether. Among the dev team, we’ve made git pull --rebase the default behaviour, only creating merge commits explicitly and only when we want the merge commit to act as a summary commit.

    Wouldn’t it be nice, though, if there was a server-side hook that enforced the rule that published --first-parent history must not change? We wouldn’t need to worry about users accidentally running git pull instead of git pull --rebase, for one. Well, we’ve written one, and it’s battle-tested.

    The hook to enforce consistent -​-first-parent history

    Here’s the server-side update hook. Install it on your server in the usual manner. It will reject any pushes that change published --first-parent history, with a helpful error message, but only on master.

    #!/bin/bash
    
    # This update hook's purpose is to make sure that commits don't disappear from
    # "git log --first-parent master" server-side. This often happens when people
    # run "git pull" without --rebase.
    
    # TODO: make it work with empty repo
    
    
    refname="$1"
    oldrev="$2"
    newrev="$3"
    
    # --- Safety check
    if [ -z "$GIT_DIR" ]; then
        echo "Don't run this script from the command line." >&2
        echo " (if you want, you could supply GIT_DIR then run" >&2
        echo "  $0 <ref> <oldrev> <newrev>)" >&2
        exit 1
    fi
    
    if [ -z "$refname" -o -z "$oldrev" -o -z "$newrev" ]; then
            echo "Usage: $0 <ref> <oldrev> <newrev>" >&2
            exit 1
    fi
    
    # We only want to enforce this hook on master:
    if [ "$refname" != "refs/heads/master" ] ; then
        exit 0
    fi
    
    # Just in case:
    if [ "$oldrev" = "$newrev" ] ; then
        exit 0
    fi
    
    # Now we want to check if $oldrev has disappeared from 'git log --first-parent
    # "$newrev"' by running:
    #
    #   git rev-list --first-parent "$newrev" | grep -q "^$oldrev$"
    #
    # This is slow, because git rev-list prints out a large number of commits
    # SHA1s. Running this command should be equivalent, but optimized:
    #
    #   git rev-list --first-parent "$oldrev^..$newrev" | grep -q "^$oldrev$"
    #
    # Note the ^ in the rev-list command.
    
    if git rev-list --first-parent "$oldrev"^.."$newrev" | grep -q "^$oldrev$" ; then
        true
    else
        echo "Error: this push hides some commits previously displayed in \"git log --first-parent $refname\" on the server's side" >&2
        echo "" >&2
        branchname="$(printf %s "$refname" | sed 's!^refs/heads/!!')"
        echo "This probably happened because you ran "git pull" without the --rebase flag (or because you ran "git push -f" after deleting previously published commits)." >&2
        echo "To fix the first problem, run these two commands client-side before pushing, replacing \"origin\" with the appropriate remote name:" >&2
        echo "    # Update refs/remotes/origin/$branchname on the local machine to match the server's value" >&2
        echo "    git fetch origin +refs/heads/$branchname:refs/remotes/origin/$branchname" >&2
        echo "    # Rebase unpublished commits on published history:" >&2
        echo "    git rebase origin/$branchname" >&2
        echo "This will linearize the unpublished history, there are other solutions if you want to keep merge commits." >&2
        echo "Check the result with \"git log --graph\" before pushing again." >&2
        exit 1
    fi
    
    # Do not allow "Merge branch 'master'" commit messages either in the --first-parent history.
    #
    # Hopefully, the vast majority of these case would have been caught by the earlier check.
    
    if git log --first-parent --format=%s "$oldrev".."$newrev" | grep -q "^Merge branch 'master'" ; then
        echo "Error: this push includes some commits in master's --first-parent history with the commit message \"Merge branch 'master'\"" >&2
        exit 1
    fi
    
    exit 0
    

    You can suggest changes by sending pull requests to our GitHub repo.

    How do you linearise your Git history, or have you freed yourself from that mindset? Let us know in the comments.

    David Lowe

     
  6. Comments
  7. Module of the month September 2014: DBIx::Connector

    Welcome to another Module of the Month blog post, a recurring post in which we highlight particular modules, project or tools that we use here at Nestoria.

    This month’s award goes to DBIx::Connector, a great tool in the grand UNIX tradition of doing one thing well; in this case, managing a connection to a database.

    Like many things in programming, connecting to a database should be something that’s trivially easy - but then the real world intervenes!

    Database connections…

    • are often over the network, which is not always reliable
    • occur in forking processes (eg. a forking web server)
    • are an “edge” from your Perl process to the outside world, so encoding becomes important
    • need to handle normal query/response, but also warnings and errors from the database server
    • need to handle different servers as seamlessly as possible

    DBIx::Connector does it all!

    There’s drivers for MSSQL, Oracle, PostgreSQL, SQLite and MySQL. I can only comment on the MySQL one at the moment, but I have nothing but good things to say about it.

    I have used it in heavily forking environments with no troubles, both in web servers but also in basic parallel scripts using Parallel::ForkManager.

    One feature I’ve not yet tried, but I’m excited to give a go at some point, is the Connection Modes.

    We run all of our queries on the safest mode - “ping” - which causes the database handle to ping and make sure the connection is alive before every request.

    Other available modes are “fixup” - where the query is sent with no ping, and then the connection is re-established and the query re-sent on a failure. In most cases this leads to many many fewer pings with no downside, which can be a great performance improvement.

    That’s pretty much all I have to say about DBIx::Connector - like I said, it does one thing well. So without further ado let me congratulate David Wheeler and bestow upon him our traditional $1/week (for one year) donation. DWHEELER is a prolific and talented CPAN author and I highly recommend you check out some of his modules, especially DBIx::Connector! Thanks David!

     
  8. Comments
  9. Nestoria Summer Team Day

    Some photos from our Summer Team Day. A chance for us to step away from our terminals, and look at our challenges from a slightly different perspective.

    The morning of our team days is usually taken up with the “work part”, where our management team shares some of the ups and downs of the last few months. However we’ve started having much more regular meetings on the business performance this year, so the work half of the team day was free for some new exercises.

    We gathered at the Dial Arch pub in Woolwich and took part in two exercises: a Nestoria Metrics themed quiz, and a team brainstorming exercise.

    I’m quite proud of the quiz, because I made it. It was called “Flags ‘n’ Numbers” and involved matching different metrics to different countries that we operate in. It proved to be quite challenging, and hopefully opened some eyes to the vast array of metrics we have available through our internal dashboards.

    After those two exercises, and some very tasty pizza provided by the pub, we headed off to our first of two afternoon “fun part” activities: Laser Tag at Bunker 51!

    This was absolutely fantastic, and I highly recommend it to anybody too wussy to try paintball (myself included!) We played four different 15 minute games - teams, agents, one hit kill free for all, and regular free for all. My favourite was agents, where two players are being scored by how long they stay alive - but if you kill an agent you become one, so the teams are constantly changing every 10 seconds or so. We explored every corner of that bunker trying to hide and/or find those with red LEDs flashing on their chests.

    Our second afternoon activity was at Up at the O2, a company that literally allows you to climb up and over the O2 in Greenwich!

    The climb takes about an hour, but in reality it’s more like 15 minutes each side and 30 minutes at the top on the viewing platform. I would definitely recommend it to tourists only in London for a few days - much more hands on than the London Eye.

    To finish up the day we had drinks and dinner together at the Blueprint Cafe, which is above the Design Museum. The food was amazing… so amazing that I forgot to take any photos of it! We were too busy eating ham, risotto, lamb, salmon, brownies, crumble and drinking delicious wine and beer I’m afraid.

    You come away from a day like that with two things: a stronger camaraderie with your colleagues, and an attitude that you’re ready for anything! A great way to embark upon the remainder of 2014 if you ask me!

     
  10. Comments
  11. 7 strategies to quickly become productive in an unfamiliar codebase

    When starting a new job or on to a new project you will rarely be working on a completely greenfield codebase. Getting to grips with unfamiliar code is a difficult process and the amount of new information to take in can feel overwhelming. Coming to Nestoria from a Ruby background this was doubly so for me, I was not only learning a new codebase but also learning Perl at the same time. Here are seven strategies that I used to get productive as quickly as possible.

    Be humble

    Humility might not be the first thing you think of when it comes to programming. After all hubris is one of the Three Virtues of a Great Programmer. However when confronted with unfamiliar legacy code you are likely to get demoralised by how often you come across things you don’t understand and the number of mistakes you make. Humility is required to accept this and just get stuck in. Sometimes you will not understand a piece of code because it is hacky and badly written and sometimes you will not understand it because you are not familiar with the domain, or simply because the underlying algorithm is inherently complex. Mistaking the latter for the former can waste a lot of time and will probably annoy your coworkers as well! Have the humility to assume that the original developer knew what they were doing and wait until you really understand something before being too critical or making significant changes.

    Test first

    If your codebase is perfectly covered with unit and integration tests then you probably don’t need this guide. However most likely there are areas of the code that could do with better test coverage, or more reliable tests. By adding or improving tests you can improve the reliability of the code base without the risk of breaking production code. Writing or rewriting tests forces you to really understand code in a way that is not possible when just reading it. Whatever you will be working on next, first spend some time reading the tests and then adding new tests where you see gaps. It might be boring but it will help a lot when you start writing production code. A good resource on how to work with unfamiliar code in this way is Therapeutic Refactoring by Katrina Owen.

    Make something

    As soon as possible get started shipping code. There is nothing that will get you familiar with the code quicker than working on it. Perhaps start with a small project and don’t worry too much about how long it takes to complete, concentrating just as much on learning as on completing the task at hand.

    Ask questions

    You could struggle along trying to work everything out for yourself but you will get up to speed much quicker by simply asking questions. Try to do a little research first, but asking dumb questions is better than struggling unnecessarily. The rest of the team can help by being open to questions and responding promptly. Even if a coworker’s question seems trivial, or you are not sure of the answer, new team members are more likely to ask questions if you respond to them in an encouraging way. It might be frustrating to be interrupted but getting new team members up to speed quickly is beneficial for everyone in the long term. If you adopt a RTFM culture in your team, people are more likely to struggle in silence, when a simple question could have saved them hours of work.

    Pair with someone

    Even better than asking questions is to pair with someone who is already familiar with the code. You get instant feedback and will pick up the numerous conventions that can be hard to pickup just from reading code. This requires discipline to be effective and experienced team members have to make sure they don’t end up just taking over. This is particularly useful for areas that you are completely unfamiliar with. For example I didn’t know very much about Linux sysadmin, so when doing tasks that required Linux knowledge other team members would sit with me as I completed them. Code reviews can also provide useful feedback.

    Write the docs

    Once you have some experience with the code start writing documentation when you feel it is needed. At first do this just for yourself, perhaps in a personal wiki page or even just in a text file. Other forms of documentation like creating and answering StackOverflow questions and then linking back in your documentation can also be useful. Once you start to feel sure your documentation is correct and would be useful to others then start adding it to the code and/or official documentation.

    Zoom out

    Right at the beginning, it is beneficial to get a high level overview of the architecture of your system. Get someone to draw and explain a diagram of the architecture of the new system you will be working on and then try to map the different modules in the code to the diagram yourself.

    After a few months you will probably start to feel quite comfortable in your new codebase. You might only have touched a fraction of it, but you understand the conventions that the team uses and where all the most important pieces are. Now is the time to zoom out again and think about whether the architecture makes sense. How might you have done it differently and how can you use your previous experience to improve it? Is the architecture as explained to you in the beginning actually how the code works in your experience, how would you explain it differently?

    I’m sure there are many other tactics to help get to know a code base, but these are the ones that have helped me the most. Let us know about yours in the comments.

    Samuel Scully

     
  12. Comments
  13. Nestoria Devs at YAPC

    YAPC::EU, the European edition of the Yet Another Perl Conference, was this past weekend. As mentioned in a previous post we sent along four of our developers: Alex, Sam, Ignacio and Tim. Here’s a brief (and photo filled) summary of our time in София България (that’s Sofia Bulgaria for those of you who don’t read Cyrillic.)

    I (Alex, Nestoria CTO) have been to a lot of YAPCs, but for my team mates Sam, Tim and Ignacio it was their first one. A great opportunity for them to dive deep into the Perl community and learn a huge amount in a short time.

    Thursday

    Unfortunately this year our flight was too late in the evening, and we ended up missing the traditional pre-conference drinks. I won’t be making this mistake again with any future YAPCs we attend - it sounded like we definitely missed out on a fun night.

    On the plus side our flight from London to Sofia was pleasantly uneventful, and our hotel - 10 minutes from the airport, 1 minute from the conference venue - was very nice. I think we all slept well and were ready for the conference to begin on Friday morning.

    Friday

    Our 1 minute walk from the hotel to the conference venue was nice - no danger of getting lost, just follow the nerdy T-Shirts. Unsurprisingly a lot of other Perl Mongers were staying at our hotel, and the hotel breakfasts got more social as the week went on.

    The venue was nice, especially the large room set aside for keynotes, lightning talks and the talks expected to be the most popular. Good chairs, good audio/visual equipment, and very helpful conference staff.

    A huge thank you and shout out to Marian Marinov and his team!

    And a smaller, but more personal, thank you to Marian for getting our banner printed in time for Friday despite me emailing him the PDF on Thursday morning :-) What do you think? I’m quite proud of it.

    Speaking of things I’m proud of, on Friday I spoke about the Nestoria Geocoder and the new OpenCage Data API that allows people outside Nestoria to take advantage of it. I think the talk was quite well received, although everybody’s geocoding challenges are a bit different so some audience members who wanted exact house-number addressing were disappointed.

    The scheduling committee had done a nice job this year of grouping together similar talks, which meant that my talk kicked off an afternoon of Geo-related presentations. I particularly enjoyed Hakim Cassimally’s talk on Civic Hacking. I hadn’t realised that MySociety’s projects were being used in Africa and Asia as well as within the UK - very cool!

    As usual after the main tracks ended we had the lightning talks. I spoke again - this time about Test Kit 2.0, a slightly shorter version of a talk I gave at a recent London.pm Technical Meeting. Hopefully I convinced a few other developers to delete all the boilerplate from their .t files.

    After the lightning talks, Curtis “Ovid” Poe gave a fantastic key note about managerless companies. He started out comparing the extremely hierarchical companies of the 90s and 00s with feudal society centuries ago in Britain, and then went on to give some great real world examples of companies being run differently and how they are succeeding. As well as the usual tech examples of Valve Software and Github he mentioned some non-tech companies, such as Semco in Brazil, which was certainly eye-opening for me. At Nestoria we are pretty good at hiring smart people and giving them the freedom to solve problems in whatever way they see fit; but going truly managerless is a big step up from that, and lead to some great discussions between me and my devs.

    Friday ended with the traditional conference dinner, with the traditional challenges of getting a few hundred developers onto a few coaches and to a very very large restaurant. The food was very tasty, and very plentiful; we had fresh bread rolls, two starters, then some Bulgarian folk dance as entertainment, followed by a large main and a very tasty dessert. But the food was definitely topped by the view: the restaurant was on a lake in the Bulgarian countryside, and the sight was stunning.

    Saturday

    Saturday morning started out with a small Dev Ops track for me, while Sam and Ignacio went to some Web talks, and Tim saw some presentations about search and data.

    For my part I really enjoyed Marian’s talk about creating Linux containers with Perl, and look forward to his libraries being finished and up on CPAN.

    After lunch was pretty much an MST-fest, as Matt S Trout gave a 50 minute talk on Devops Logique and a 50 minute keynote on The State of the Velociraptor. Both were very interesting, and I had to smile when the topic of Prolog came up - back in university we studied Prolog and Haskell in our first year, quite an unusual introduction to programming I think.

    Before the keynote came the second day of lightning talks, and the second day where I gave a talk. This time around I talked about this very blog - and announced live that this month’s Module of the Month winner was Tim Bunce for Devel::NYTProf. Unsurprisingly Tim got a thunderous round of applause, despite not being there this year.

    Dan Muey of cPanel gave a great talk about Unicode and Perl which definitely resonated with me; by which I mean it exactly matched our unicode style guide :-)

    Sunday

    Sam and Ignacio were very excited for Sunday, as that seemed to be where all the web related talks went. Sawyer X gave a particularly good introduction to Plack and PSGI, and then went on to share how Booking.com has managed to gradually shift over to PSGI running on uWSGI. I learned a huge amount, and I hope we can make a similar shift at Nestoria sometime soon.

    Susanne Schmidt (Su-Shee) also gave a wonderful introduction to the wide and not-so-varied world of web frameworks. In preparation she had built the same application - a cat GIF browser, naturally - in about 10-20 different frameworks across 5-10 different languages! Unsurprisingly a lot of them are almost identical - Dancer, Sinatra, Django, Rails, Mojolicious all seem to have borrowed ideas from one another over the years. I had no idea though that R (yes, the statistics language) has a web framework! And it’s pretty nice too, you can produce some really great graphs and charts with it with very little code.

    I’d also like to shout out Tatsuro Hisamori (aka まいんだー) for coming over from Japan and to tell us about how he sped up his test suite from 40 minutes to 3 minutes. We actually have a pretty similar set up here at Nestoria - spreading different groups of tests over different VMs, with lots of parallelisation and a a home grown web interface to the results. Their project Ukigumo’s web interface looks scarily similar to ours.

    To round out the day Sawyer reprised his The Joy In What We Do keynote from YAPC::NA. It’s a touching tale of how he learned programming, and Perl, and how we should all take time to reflect how fun programming can be. The talk ends with some of the Perl language features and CPAN libraries we should be proud of, and we should be talking about in the wider programming community. All in all I left feeling pretty happy to be a Perl dev.


    So that was YAPC::EU 2014! It was an absolute blast, and we can’t wait to sponsor and send some devs over to Granada for YAPC::EU 2015 next September.

    Of course we don’t have to wait that long for the next Perl event. We’re sponsoring and attending The London Perl Workshop 2014 in November - hope to see you there!

     
  14. Comments
  15. Open Geo interview series

    A quick post to let you know that your sister brand, OpenCage Data, has launched an interview series with thought leaders from the Open Geo world over on their blog. It’s a fast moving space and they’re hoping to provide a forum to feature some of the various innovations. Give it a read (and check out OpenCage in general), and let them know who else should be interviewed.

     
  16. Comments
  17. Module of the month August 2014: Devel::NYTProf

    Welcome to another Module of the Month blog post, a recurring post in which we highlight particular modules, project or tools that we use here at Nestoria.

    This month’s award goes to the amazing Devel::NYTProf, simply the best Perl code profiler there is and one of the most powerful tools you can reach for when working on a large and complex code base.

    Let’s start out with some quotes from some rather intelligent blokes:

    Prove where the bottleneck is

    "Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you have proven that’s where the bottleneck is." — Rob Pike

    Don’t do it yet

    "The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet." — Michael A. Jackson

    … but only after the code has been identified

    "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" — Donald Knuth

    All three of these quotes point towards a single truth: when your code is slow, and you suspect it could be faster, reach for your profiler!

    And of course, the more powerful and feature-rich your profiler is, and the more code you can easily point it at, the more bottlenecks and potential optimization sites you will find.


    At Nestoria we have wielded the great Devel::NYTProf against most areas of our code. It’s helped us get our internal geocoding down to 100ms per listing, which means we can re-geocode an entire country of listings in less than 24 hours. It’s helped us respond to over 90% website requests in less than 200ms. And it’s helped us process all of our metrics logs before 8am every day, so that our commercial team can quickly act on those numbers and do their jobs well.

    Most recently we have been using Devel::NYTProf::Apache (also from TIMB!) to profile our website in production. By using the addpid option we have each Apache child process write its own nytprof.out file, which we can then merge together the files with nytprofmerge. We have around 30 Apache children at any given time so we end up with 5 hours of information from only 10 minutes of real time where our site is slower for our users.

    (Note: we do turn off statement level profiling with stmts=0 and make sure to write the nytprof.out files to a ramdisk. Without those two hacks the site falls over.)


    So thank you Tim, for Devel::NYTProf and for everything else you’ve done, and for being one of the nicest people in the Perl community :-)

    Enjoy your $1 per week Gittip donation from us!

     
  18. Comments
  19. Happy CPAN Day!

    Saturday August 16th 2014 is CPAN Day, 19 years since Andreas König uploaded Symdump 1.20 to our favourite comprehensive archive network.

    Neil Bowers has been writing all about it on blogs.perl.org over the last few weeks. His posts have been about improving your CPAN distributions with better documentation, test coverage, and community involvement with his “thank a CPAN author” suggestion.

    I thought we could join in by giving a quick run down of some of the distributions we’ve released to CPAN over the years. These are often released as different CPAN authors, but they can be found in one place on our company github page: https://github.com/lokku.

    Geo::What3Words

    As we wrote about just a couple of days ago, we wrote the Perl library for interfacing with the What3Words API.

    Geo::Coder::OpenCage

    The Perl interface to our sister company OpenCage Data's geocoder API.

    I will be talking about this in detail at YAPC::EU in Sofia next Friday :-)

    Geo::Coder::Many

    Can you tell we’re big geo nerds?

    Geo::Coder::Many is a way to multiplex requests between multiple remote geocoding APIs, such as Yahoo!’s PlaceFinder, Google’s Geocoder v3, and of course our own OpenCage geocoder.

    It can handle caching, and adjust the likelihood of hitting a particular API based on that API’s daily limits.

    File::CleanupTask

    This is a very powerful configuration-based tool for handling “cleaning up” (moving, archiving, or just deleting) of old unwanted files. It’s great for keeping all your logs around for as long as you need them, but automating the job of tidying them up when they are no longer useful.

    We like to use symbolic links a lot to handle atomic changes of code, configuration, and data and so we built into File::CleanupTask the ability to avoid deleting a file if it is symlinked from another directory, no matter how old it is. That way you get the automated cleanup without the danger of deleting something which is still in use.

    CSS::SpriteMaker

    Create sprite images and their associated CSS files to speed up your website and make your users happier. This is a great technique which can shave a huge amount of per-image overhead off your file sizes and your request times.

    Savio Dimatteo, who has sadly since left Nestoria and London (we miss you Savio!), spoke about this at YAPC::EU 2013 in Kiev. Here are some slides and a video.

    Algorithm::DependencySolver

    This is a very abstract and algorithmic module which takes operations with dependencies, things which will be affected by the operation, and prerequisites for the operation, and then attempts to automatically derive an ordering for those operations to be run in.

    It can be very useful when dealing with long complicated pipelines which manipulate objects. We used to use it in our Geobuild to help determine that the URL Creator needed to be run before the URL Deduplicator which needed to be run before the Place Deduplicator.

    It greatly aids debugging by outputting ASCII or PNG graphs of the operations.

    Number::Format::SouthAsian

    Did you know that in South Asia the number ten million is called “one crore” and is written “1,00,00,000”?

    We didn’t before we launched Nestoria India, but when we found out it immediately seemed like something that belongs on the CPAN.

    Big thanks to Wikipedia for basically writing my tests for me, to CPAN Testers for catching 32 bit bugs in earlier versions, and to Larry for making sure numbers like 1000000000000000000000000000000000000000 just work in my favourite language ;-)

    WebService::Nestoria::Search

    Last but not least, we of course have our module for interfacing with the Nestoria Search API.

    This gives you access to our property listings, of course, but also has an interface to our average house price data. Do with it what you like :-) More details at http://www.nestoria.co.uk/help/api-tech.


    So my suggestion, adding to Neil’s long list, is: open up some source code and release a CPAN distribution based from your work’s code base this CPAN Day!

     
  20. Comments