Tagged: Software Development Toggle Comment Threads | Keyboard Shortcuts

  • kmitov 7:06 am on January 22, 2019 Permalink |
    Tags: cache, , Software Development   

    Implementation of Multi-levels caching. Example for Rails 

    There are two difficult things in Computer Science. Choosing a name for a variable and cache invalidation.

    That being said I went on a journey to implement multi-levels caching in one of our platforms. Cache should be fast. Fast cache is expensive. If you can use 1M of very fast and expensive cache, why not implement a second level cache that is 10M, not that fast and not that expensive and 100M of normal cache that is cheap, but still faster than going to db.


    I decided to implement a DbCache that will store cached html rendered values directly in db and will access them from the DB instead of the fast Redis/Memcachier cache. All in the name of saving a few dollars on expensive fast cache that I do not really need.

    <% cache(cache_key) do # this is the redis cache %>
      <%= db_cache(cache_key) do # this is the db cache %>
        <% # actual calculation of value %>
      <% end %>
    <% end %>


    There is no need to constantly render the html/json of the content that we would like to serve to the client. It could be rendered once and served until updated. We are using Memcachier for a very fast access to cached values, but it is costing us more money. And in many cases we do not need this fast access.

    That’s why there is a DbCache implementation

    It works in the following way. It has a key and a value.

    When using in the view you can use

     <% cache(cache_key) do %>
       <%= db_cache(cache_key) do %>
       <% end %>
     <% end %>

    In this way if something is in cache we take it. This is L1 cache if you like. If it is not in L1 cache (that is stored in memory) than we ask db_cache. db_cache is our second level cache – L2. If the value is not in db_cache then we render it. The principle could be applied for L3 cache, although we are not there yet as a platform to need L3 cache.

    But it is db_cache. It is accessing the db. Why do you even call it cache?

    When the db_cache is accessed we make a single query to the db and retrieve a single record. For an indexed, not very large table this is fast. If the value is to be rendered again it will mean making a few more request for different objects and their associations and moving through all the trouble of rendering it again which involves views. By retrieving the HTML/JSON directly from DB we could directly serve it.

    How is db_cache implemented?

    DbCache model that stores the values in the db. It has a key and value columns. That’s it. Creating/retrieving/updating DbCache records is what is interesting.

    The key is column in the DB that is an integer. NOT a string. This integer is generated with a hash function and is than shifted right. The postgres db has a signed int column and the hash is generating an unsigned int. We have to shift because there is not enough space for storing unsigned int in a postgres db. In this way the cache key given from the developer is transformed to an internal key that is used for finding the record. And there of course is an index on this hash64 column.

    def db_cache key, &block
      # This will give us a 64 bit hash
      # Shift the value to reduce it because the column is signed and there is no room for
      # an unsigned value
      internal_key = Digest::XXH64.hexdigest(key).to_i(16) >> 1
      db_cache = DbCache.find_by(hash64: internal_key)

    How are keys expired?

    If a key has expired we must create a new record for this key. Expiring keys could be difficult. Every record has an updated_at value. Every day a job on the db is run and if the updated_at value is more than specific days old it is automatically expired. This controls the number of records in the DB. I am kind of interested in storing only records that are regularly accessed. I think that if a page was not accessed in a couple of days, you generally do not need a cached value for it.

    This opens the next question:

    How are keys marked not to expire? If we change the accessed_at for a record on every read that will be slow because of a write to accessed_at

    True. It is important to expire old keys, but it is also important not to touch records on every read request because this will be very slow. If we make a touch on every request to the cache this will involve an update that will slow down the method. So an update is happening only once a day. See the db_cache.touch call below. The record could be accessed thousands of times today but there will be only one write to update the updated_at value. To touch the record.

    def db_cache key, &block
      internal_key = Digest::XXH64.hexdigest(key).to_i(16) >> 1
      db_cache = DbCache.find_by(hash64: internal_key)
      if db_cache.nil?
        # create cache value
        db_cache.touch if db_cache.updated_at < Time.now - 1.day

    How fast is DbCache?

    These are just referenced values and of course this depends on the size of your cached values and the db and the load. In our specific case on Heroku we’ve found that the DbCache generally retrieves values in the range of 2 to 8 ms. In comparison the very fast Memcachier does this in the range of 2 to 8 ms.

    We also used NewRelic to look at the performance that end users are experiencing. And there was a large improvement because we could cache hundreds of MB of records in DB compared to the few MB for Memcachier that we are paying for.

    Rails specific details

    Since this code has to live in our platform and it is also bound to use some other rails object there are a few things more that I’ve done. Here is the full code that I hope gives a complete picture.

    # Author::    Kiril Mitov  
    # Copyright:: Copyright (c) 2018 Robopartans Group
    # License::   MIT
    module DbCacheHelper
      def db_cache key, &block
        # puts "db_cache: key: #{key}"
        result = nil
        if controller.respond_to?(:perform_caching) && controller.perform_caching
          # This will give us a 64 bit hash
          # Shift the value to reduce it because the column is signed and there is now room for
          # un unsigned value
          internal_key = Digest::XXH64.hexdigest(key).to_i(16) >> 1
          db_cache = DbCache.find_by(hash64: internal_key)
          if db_cache.nil?
            # puts "DBCache Miss: #{key}, #{internal_key}"
            Rails.logger.info "DBCache Miss: #{key}, #{internal_key}"
            content = capture(&block)
            # Use a rescue. This will make sure that if
            # a race condition occurs between the check for
            # existence of the db_cache and the actuall write
            # we will still be able to find the key.
            # This happens when two or more people access the site at exactly the
            # same time.
              # puts "DBCache: Trying to create"
              # puts "DBCache Count before find or create: #{DbCache.count}"
              db_cache = DbCache.find_or_create_by(hash64: internal_key)
              # puts "DBCache Count after find or create: #{DbCache.count}"
              # puts "DBCache: Found or record is with id:#{db_cache.id}"
            rescue ActiveRecord::RecordNotUnique
            # The update is after the create because the value should not be part of the
            # create.
            db_cache.update(value: content)
            # puts "DBCache Hit: #{key}, #{internal_key}"
            Rails.logger.info "DBCache Hit: #{key}, #{internal_key}"
            db_cache.touch if db_cache.updated_at < Time.now - 1.day
          result = db_cache.value
          result = capture(&block)
        # Result could be nil if we've cached nil. So just dont return nil,
        # but return empty string
        result ? result.html_safe : ""

  • kmitov 2:02 pm on January 9, 2019 Permalink |
    Tags: i18n, parallel, , , Software Development   

    i18n locales and the pass of rspec parallel specs and 

    First rule of good unit testing is: each test should be independent of the other tests.

    But if there is a global variable like I18n.locale than one spec could touch it and another spec will be run in a different locale from the default.


    Before each suite of specs set the locale to the default. This ensures that the specs are run against the same locale each time. Specific code is:

    # spec/rails_helper.rb
    RSpec.configure do |config|
      config.before :each do
        I18n.locale = Globalize.locale = I18n.default_locale

    i18n breaks spec isolation

    Internationalization, or i18n, should be part of most platforms. This means that i18n features should be properly tests. In suites when one of the specs modifies i18n the specs that are run after that are dependent on this new local.

    This seem trivial, but we only go to explore it the moment we started running specs in parallel on different CPU cores. The specs were started and run in different times and order on each run.

  • kmitov 7:21 am on January 8, 2019 Permalink |
    Tags: capybara, chromedriver, feature tests, google-chrome, , , Software Development, tests   

    Chromedriver not filling all the whole password field in automated RSpec, Capybara, Feature Tests 

    This is such a lovely story. You will like it.

    When using google chromedriver to run capybara tests sometimes, just sometimes, especially if the tests are run in parallel, when the test has to fill a text field like a password, it fills only part of it. Last time checked for chromedriver 2.45

    TL; DR;

    Solution – google does not care, or it seems it is too difficult for the chromedriver team to resolve so there simply is no solution.

    What is the test?

    We are using Rails, Rspec, Capybara, Google chromedriver. We are developing feature tests. Tests are run in parallel with

    rake parallel:spec

    Here is the test in question. Simply fill a password on the form for upgrading a subscription, click on Confirm and expect to be redirected to a page that says – “You’ve upgraded your subscription”

    def submit_and_expect_success password
          # dialog opens to confirm with password
          fill_in "owner_password", with: password
          click_on "Confirm"
          expect_redirect_to "/subscriptions/#{subscription.to_param}"
          # If it redirects to this page, it means that the change was successful
          expect(page).to have_current_path "/subscriptions/#{subscription.to_param}"

    And the tests are failing with timeout at expect_redirect_to. No, expect_redirect_to is a custom method, because we are using ActionCable to wait for subscription upgrade to finish. Because of the payment service at the back this sometimes takes a while and we want to show a nice progress and we need a websocket. But that being said the method is nothing special.

    module ExpectRedirect
      def expect_redirect_to url
        # If the path doesn't change before the timeout passes,
        # the test will fail, because there will be no redirect
        puts "expect url: #{url}"
          Timeout.timeout(Capybara.default_max_wait_time) do
            sleep(0.1) until url == URI(page.current_url).path
        rescue Timeout::Error=>e
          puts "Current url is still: #{page.current_url}"
          puts page.body
          raise e

    If we are redirected to the url withing Capybara.default_max_wait_time than everything is fine. If not, we are raising the Timeout::Error.

    Parallel execution

    For some reason the test in question fails only when we are doing a parallel execution. Or at least mostly when we are doing parallel execution of the tests. So we moved through some nice articles to revise our understanding of Timeout again and again.


    But nevertheless the tests were failing with Timeout::Error on waiting for a redirect and in the html we could see the error returned by the server:

    <div><p>Invalid password</p></div>

    How come the password is Invalid

    No this took a while to debug and it seems rather mysterious but this is what we got:

    User password in DB is: 10124ddeaf1a69e3748e308508d916b6

    The server receives from the html form: 10124ddeaf1a69e3748e30850

    User password in DB is: 74c2a3e926420e1a30363423f121fc1e

    The server receives from the html from: 74c2a3e926420e1a3

    and so on and so on.

    Sometimes the difference is 8 symbols. Sometimes it is 2. Sometimes it is 16.

    It seems to be a client side issue

    Like. JavaScript. If there is an error this strange it has to be in the JavaScript. Right. There in the javascript we see:

    let form_data = form.serializeObject();
    this.perform('start_change', form_data);

    The form gets serialized. Probably it is not serialized correctly. Probably the values that we are sending are just not the values on the form. So I revised my knowledge on serializing objects in JavaScript with


    So far so good. But the serialization was not the problem. Here is what I did. I fixed all the passwords to be 32 symbols.

    let form_data = form.serializeObject();
     if(form_data["owner_password"].lenght != 32) {
            form_data["owner_password"] = "this was:" + form_data["owner_password"] + " which is less than 32 symbols"
    this.perform('start_change', form_data);

    It it happened. The value of the password field was simply not 32 symbols long. It was not filled during the test.

    A little bit of search and we arrive at:


    and there in the bottom of the issue, there is the standard: “Not our problem resolution” with the link to:


    It seems that google chromedriver is not filling all the characters in the password field. It is doing it on random and is completely unpredictable.

    Issue still exists on:

    Issue still exist for
    Chrome: Version 71.0.3578.98 (Official Build) (64-bit)
    Chromedriver chromedriver --version
    ChromeDriver 2.45.615279 (12b89733300bd268cff3b78fc76cb8f3a7cc44e5)
    Linux kireto-laptop3 4.4.0-141-generic #167-Ubuntu x86_64 GNU/Linux
    Description:	Ubuntu 16.04.5 LTS
    Release:	16.04
    Codename:	xenial

    Today we would try Firefox driver.

  • kmitov 6:00 pm on January 3, 2019 Permalink |
    Tags: http, Software Development   

    The curious case of …http status code 505 

    Today I learned that there was such a thing as http status code 505. Well, I’ve been in software development for quite some time, but 505 was a new for me.

    TL; DR;

    There was a space in the URL and this results in an error 505


    The curious case started with an error when we were checking that some links on the platform are valid. This is what happened:

    Message: "Error: The link https://www.fllcasts.com/competitions/ returned status code 505! 
    Error: The link https://www.fllcasts.com/competitions/ returned status code 505! 
    Error: The link http://www.fllcasts.com/courses/19-moving-straight-with-lego-mindstorms-ev3-robots returned status code 505! 
    Error: The link http://www.fllcasts.com/courses/6-box-robot-two-fewer-parts-and-one-motor-simplifying-a-robot returned status code 505! "

    Strange. A quick curl revealed that the url was correct and curl was returning a correct result. wget also showed that it is working.

    It took me about one hour. One hour looking and the problem was the following:

    When the HTML document was constructed it had the following content

    <a href="{{ competitions_link }} "><img..

    Note the space. The space after }} and before “. The space right here <a href=”{{ competitions_link }}HERE”>. This took an hour today. And the solution is just to strip.


    If interested here is the test for this problem:

    it "strips url and then checks them to avoid error 505" do
         link = ' https://www.fllcasts.com '
    stub_request(:head, "https://www.fllcasts.com").to_return(status: 201)
    success, message = UrlChecker::check_link link, true
    expect(success).to be_truthy
    expect(message).to be_empty

  • kmitov 10:02 am on December 31, 2018 Permalink |
    Tags: Software Development, Software Planning, Trello   

    How to plan with Trello? Part 1 – backlog and sprint board 

    I recently shared this with a friend that is constantly getting lost with Trello and how exactly to structure his software project plan. I shared my experience with him and he kind of liked it so here is my story and the few rules that are keeping me sane for the past 2 years of following them.

    Main issues with planing a software project with Trello is to decide

    • are different features in different board,
    • why do you need lables. Are different features marked with labels
    • are different features in lists?
    • how do you set the priority for a task. Do you have a list for priority, or label for priority.

    Because of these questions for the last 4-5-6 years I’ve started and stopped using Trello many times.

    These are all difficult questions. Here are my simple solutions.


    Create two boards. Backlog and SprintXX. In the SprintXX you have three lists. XX is the number of the sprint. “SPXX Planned“, “Ongoing“, “Done SPXX December 01- December 15“. When the sprint that is two – three weeks finishes you archive “Done SPXX December 01- December 15” and create a new “Done SPXX+1 December 16-December 31” list. Then you rename list “SPXX Planned” to “SPXX+1 Planned” where XX is the number of the Sprint.

    This keeps the Trello clear.

    Create two boards

    Board one is the Sprint board
    Board two is the Backlog board

    If you are currently not working on the task and there is little to no chance to work on it in the next 3-4 weeks that it is in the Backlog. This means it will be handled later.

    Sprint Board

    The Sprint board has the name of the current Sprint. I like sprints that are 2-3 weeks long. It has three lists

    SPXX Planed

    The list has all the tasks that are planned for the current sprint or probably the next one. These are tasks that you are genuinely planning to do something about.


    These are all the tasks that we are currently working on. If we have even a single line of code for this task than we are working on it.

    Done SPXX December 01 – December 15

    These are all the tasks completed in the spring XX. Note that the list has the name “Done SPXX December 01 – December 15”. This is the full name of the sprint.

    At the end of the sprint

    When the spring ends you archive “Done SPXX December 01 -December 15”. You do not archive the tasks. You archive the whole list. This gives you a chance to get back to the list at the regular reviews that you are having with the team and actually review what has happened in this sprint.

compose new post
next post/next comment
previous post/previous comment
show/hide comments
go to top
go to login
show/hide help
shift + esc