Tagged: cache Toggle Comment Threads | Keyboard Shortcuts

  • kmitov 3:19 pm on May 31, 2021 Permalink |
    Tags: cache, ,   

    When caching is bad and you should not cache. 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    On Friday we did some refactoring at FLLCasts.com. We removed Refinery CMS, which is a topic for another article, but one issue pop-up – on a specific page caching was used in a way that made the page very slow. This article is about how and why. It is mainly for our team as a way to share the knowledge among ourselves, but I think the whole community could benefit, especially the Ruby on Rails community.

    TL;DR;

    When making a request to a cache service, be it MemCachir, Redis or any other, you are making a request to a cache service. This will include a get(key) method call and if the value is not stored in the cache, it will include a set(key) method call. When the calculation you are doing is simple it will take more time to cache the result from the calculation than to do the calculation again, especially if this calculation is a simple string concatenation.

    Processors (CPUs) are really good at string concatenation and could do them in a single digit milliseconds. So if you are about to cache something, make sure that you cache something worth caching. There is absolutely no reason to cache the result of:

    # Simple string concatenation. You calculate the value. No need to cache it.
    value = "<a href=#{link}>Text</a>". 
    
    # The same result, but with caching
    # There isn't a universe in which the code below will be faster than the code above.
    hash = calculate_hash(link)
    cached_value = cache.get(hash)
    if cached_value == nil
       cached_value = "<a href=#{link}>Text</a>". 
       cache.set(hash, cached_value)
    end 
    
    value = cached_value

    Context for Rails

    Rails makes caching painfully easy. Any server side generated HTML could be cached and returned to the user.

    <% # The call below will render the partial "page" for every page and will cache the result %>
    <% # Pretty simple, and yet there is something wrong %>
    <%= render partial: "page", collection: @pages, cached: true %>

    What’s wrong is that we open the browser and it takes more than 15 seconds to load.

    Here is a profile result from New Relic.

    As you can see there a lot of Memcached calls – like 10, and a lot of set calls. There are also a lot of Postgres find methods. All of this is because of how caching was set up in the platform. The whole “page” partial, after a decent amount of refactoring turns out to be a simple string concatenation as:

    <a href="<%= page.path%>"><%= page.title %></a>

    That’s it. We were caching the result of a simple string concatenation which the CPU is quite fast in doing. Because there were a lot of pages and we were doing the call for all of the pages, when opening the browser for the first time it just took too much to call all the get(key), set(key) methods and the page was returning a “Time out”

    Conclusion

    You should absolutely use caching and cache the values of your calculations, but only if those calculations take more time than asking the cache for a value. Otherwise it is just not useful.

     
  • kmitov 7:06 am on January 22, 2019 Permalink |
    Tags: cache, ,   

    Implementation of Multi-levels caching. Example for Rails 

    There are two difficult things in Computer Science. Choosing a name for a variable and cache invalidation.

    That being said I went on a journey to implement multi-levels caching in one of our platforms. Cache should be fast. Fast cache is expensive. If you can use 1M of very fast and expensive cache, why not implement a second level cache that is 10M, not that fast and not that expensive and 100M of normal cache that is cheap, but still faster than going to db.

    TL;DR;

    I decided to implement a DbCache that will store cached html rendered values directly in db and will access them from the DB instead of the fast Redis/Memcachier cache. All in the name of saving a few dollars on expensive fast cache that I do not really need.

    <% cache(cache_key) do # this is the redis cache %>
      <%= db_cache(cache_key) do # this is the db cache %>
        <% # actual calculation of value %>
      <% end %>
    <% end %>

    Implementation

    There is no need to constantly render the html/json of the content that we would like to serve to the client. It could be rendered once and served until updated. We are using Memcachier for a very fast access to cached values, but it is costing us more money. And in many cases we do not need this fast access.

    That’s why there is a DbCache implementation

    It works in the following way. It has a key and a value.

    When using in the view you can use

     <% cache(cache_key) do %>
       <%= db_cache(cache_key) do %>
       <% end %>
     <% end %>

    In this way if something is in cache we take it. This is L1 cache if you like. If it is not in L1 cache (that is stored in memory) than we ask db_cache. db_cache is our second level cache – L2. If the value is not in db_cache then we render it. The principle could be applied for L3 cache, although we are not there yet as a platform to need L3 cache.

    But it is db_cache. It is accessing the db. Why do you even call it cache?

    When the db_cache is accessed we make a single query to the db and retrieve a single record. For an indexed, not very large table this is fast. If the value is to be rendered again it will mean making a few more request for different objects and their associations and moving through all the trouble of rendering it again which involves views. By retrieving the HTML/JSON directly from DB we could directly serve it.

    How is db_cache implemented?

    DbCache model that stores the values in the db. It has a key and value columns. That’s it. Creating/retrieving/updating DbCache records is what is interesting.

    The key is column in the DB that is an integer. NOT a string. This integer is generated with a hash function and is than shifted right. The postgres db has a signed int column and the hash is generating an unsigned int. We have to shift because there is not enough space for storing unsigned int in a postgres db. In this way the cache key given from the developer is transformed to an internal key that is used for finding the record. And there of course is an index on this hash64 column.

    def db_cache key, &block
      # This will give us a 64 bit hash
      # Shift the value to reduce it because the column is signed and there is no room for
      # an unsigned value
      internal_key = Digest::XXH64.hexdigest(key).to_i(16) >> 1
    
      db_cache = DbCache.find_by(hash64: internal_key)
    
      ...
    end

    How are keys expired?

    If a key has expired we must create a new record for this key. Expiring keys could be difficult. Every record has an updated_at value. Every day a job on the db is run and if the updated_at value is more than specific days old it is automatically expired. This controls the number of records in the DB. I am kind of interested in storing only records that are regularly accessed. I think that if a page was not accessed in a couple of days, you generally do not need a cached value for it.

    This opens the next question:

    How are keys marked not to expire? If we change the accessed_at for a record on every read that will be slow because of a write to accessed_at

    True. It is important to expire old keys, but it is also important not to touch records on every read request because this will be very slow. If we make a touch on every request to the cache this will involve an update that will slow down the method. So an update is happening only once a day. See the db_cache.touch call below. The record could be accessed thousands of times today but there will be only one write to update the updated_at value. To touch the record.

    def db_cache key, &block
    
      internal_key = Digest::XXH64.hexdigest(key).to_i(16) >> 1
    
      db_cache = DbCache.find_by(hash64: internal_key)
      if db_cache.nil?
        # create cache value
      else
        db_cache.touch if db_cache.updated_at < Time.now - 1.day
      end
    
      ...
    end

    How fast is DbCache?

    These are just referenced values and of course this depends on the size of your cached values and the db and the load. In our specific case on Heroku we’ve found that the DbCache generally retrieves values in the range of 2 to 8 ms. In comparison the very fast Memcachier does this in the range of 2 to 8 ms.

    We also used NewRelic to look at the performance that end users are experiencing. And there was a large improvement because we could cache hundreds of MB of records in DB compared to the few MB for Memcachier that we are paying for.

    Rails specific details

    Since this code has to live in our platform and it is also bound to use some other rails object there are a few things more that I’ve done. Here is the full code that I hope gives a complete picture.

    # Author::    Kiril Mitov  
    # Copyright:: Copyright (c) 2018 Robopartans Group
    # License::   MIT
    module DbCacheHelper
    
      def db_cache key, &block
        # puts "db_cache: key: #{key}"
        result = nil
        if controller.respond_to?(:perform_caching) && controller.perform_caching
    
          # This will give us a 64 bit hash
          # Shift the value to reduce it because the column is signed and there is now room for
          # un unsigned value
          internal_key = Digest::XXH64.hexdigest(key).to_i(16) >> 1
    
          db_cache = DbCache.find_by(hash64: internal_key)
          if db_cache.nil?
            # puts "DBCache Miss: #{key}, #{internal_key}"
            Rails.logger.info "DBCache Miss: #{key}, #{internal_key}"
    
            content = capture(&block)
    
            # Use a rescue. This will make sure that if
            # a race condition occurs between the check for
            # existence of the db_cache and the actuall write
            # we will still be able to find the key.
            # This happens when two or more people access the site at exactly the
            # same time.
            begin
              # puts "DBCache: Trying to create"
              # puts "DBCache Count before find or create: #{DbCache.count}"
              db_cache = DbCache.find_or_create_by(hash64: internal_key)
              # puts "DBCache Count after find or create: #{DbCache.count}"
              # puts "DBCache: Found or record is with id:#{db_cache.id}"
            rescue ActiveRecord::RecordNotUnique
              retry
            end
    
            # The update is after the create because the value should not be part of the
            # create.
            db_cache.update(value: content)
          else
            # puts "DBCache Hit: #{key}, #{internal_key}"
            Rails.logger.info "DBCache Hit: #{key}, #{internal_key}"
            db_cache.touch if db_cache.updated_at < Time.now - 1.day
          end
    
          result = db_cache.value
        else
          result = capture(&block)
        end
        # Result could be nil if we've cached nil. So just dont return nil,
        # but return empty string
        result ? result.html_safe : ""
      end
    
    end


     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel