Recent Updates Toggle Comment Threads | Keyboard Shortcuts

  • kmitov 9:32 pm on January 11, 2021 Permalink |
    Tags: acts_as_paranoid, , globalize-rails, , ,   

    The problems with acts_as_paranoid and generally with Soft Delete of records in Rails 

    I am about to refactor and completely remove acts_as_paranoid from one of the models in our platform. There are certain issues that were piling up in the last few years and it is now already difficult to support it. As I am about to tell this to colleagues tomorrow I thought to first structure the summary in an article and directly send the article to the whole team.

    If you are thinking about using acts_as_paranoid and generally soft delete your records then this article could give you a good understanding of what to expect.

    Disclaimer: I am not trying to roast acts_as_paranoid here. I think it is a great gem that I’ve used successfully for years and it has helped me save people’s work where they accidentally delete something. It could be exactly what you need. For us it got too entangled and dependent on workarounds with other gems and I am planning to remove it.

    acts_as_paranoid VS Globalize – problem 1

    We have a model called Material. Material has a title. The title is managed by globalize as we are delivering the content in a number of languages.

    class Material < ApplicationRecord
      acts_as_paranoid
      translates :title
    end

    The problem is that Globalize knows nothing about acts_as_paranoid. You can delete a Material, and it should delete the translations, but when you try to recover the Material then there is an error because of how the translations are implemented and the order in which the translations and the Material are recovered. Which record should be recovered first? Details are at https://github.com/ActsAsParanoid/acts_as_paranoid/issues/166#issuecomment-650974373, but as a quote here

    Ok, then I think I see what happens: ActsAsParanoid tries to recover your translations but their validation fails because they are recovered before the main object. When you just call recover this means the records are just not saved and the code proceeds to save the main record. However, when you call recover!, the validation failure leads to the exception you see.

    In this case, it seems, the main record needs to be recovered first. I wonder if that should be made the default.

    I have work around this and I used it like this for an year

    module TranslatableSoftdeletable
      extend ActiveSupport::Concern
    
      included do
    
        translation_class.class_eval do
          acts_as_paranoid
        end
    
        before_recover :restore_translations
      
      end
    
      def restore_translations
        translations.with_deleted.each do |translation|
          translation.deleted_at = nil
          translation.save(validate: false)
        end
        self.translations.reload
      end
    
    end

    acts_as_paranoid VS Globalize – problem 2

    Let’s say that you delete a Material. What is the title of this deleted material?

    material = Material.find(1)
    material.destroy
    material.title == ?

    Remember that the Material has all of its translations for the title in a table that just got soft deleted. So the correct answer is “nil”. The title of the delete material is nil.

    material = Material.find(1)
    material.destroy
    material.title == nil # is true

    You can workaround this with

    material.translations.with_deleted.where(locale: I18n.locale).first.title

    But this is quite ugly.

    acts_as_paranoid VS cancancan

    The material could have authors. An author is the connection between the User and the Material. We are using cancancan for the controllers.

    In the controller for a specific author we show only the models that are for this author. We have the following rule:

    can [:access, :read, :update, :destroy], clazz, authors: { user_id: user.id }
    

    Here the problem is that you can only return non deleted records. If you would like to implement a Trash/Recycle Bin/Bin functionality for the Materials that you can not reuse the rule. The reason is the cancancan can not have can? and cannot? with an sql statement and the “authors: {user_id: user.id}” is an sql statement.

    What we could do is to use scopes

    # in the ability
    can [:access, :read, :update, :destroy], Material, Material.with_authors(user.id)
    
    # In the Material
    scope :with_authors, -> (user_ids) {
        joins(:authors).where(authors: {user_id: user_ids})
    }
    

    We move the logic for authorization from the ability to the model. I can live with that. It is not that outrageous. But it does not work, because the association with authors will return empty authors for the Material as the materials are also soft deleted.

    acts_as_paranoid vs cancancan – case 2

    When declaring load_and_authorize in the controller it will not fetch records that are already deleted. So you can not use cancancan to load in the controller a record that is deleted and you must load it yourself.

    class MaterialsTrashController < ApplicationController
    
      before_action only: [:show, :edit, :update, :destroy] do
        @content_picture = Material.only_deleted.find(params[:id])
      end
    
    
      load_and_authorize_resource :content_picture, class: "Material", parent: false
    
    end

    This was ugly, but as we had one trash controller it was kind of acceptable. But with a second one it got more difficult and deviated from the other controllers logic.

    acts_as_paranoid vs ContentPictures

    Every Material has many ContentPictures on our platform. There is a ContentPictureRef model. One of the pictures could be the thumbnail. Once you delete a material what is the thumbnail that we can show for this material?

    As the content_picture_refs relation as also soft deleted we should change the logic for returning the thumbnail. We change it from

    def thumbnail
        thumbnail_ref = self.content_picture_refs.where(content_picture_type:  :thumbnail).first
    end

    to

    def thumbnail
        thumbnail_ref = self.content_picture_refs.with_deleted.where(content_picture_type:  :thumbnail).first
    end

    I can live with this. Even relation that we have for the deleted record we must call with “with_deleted”. ContentPictures, Authors and other relations all should be changed for us to be able to see the deleted materials in a table that represents the Trash.

    acts_as_paranoid vs Active Record Callbacks

    There is another issue right at https://github.com/ActsAsParanoid/acts_as_paranoid/issues/168

    It took me hours to debug it a few months back. The order of before_destroy should not really matter.

    class Episode < ApplicationRecord
       acts_as_paranoid
       before_destroy do 
          puts self.authors.count
       end
       has_many :authors, dependent: :destroy
    
    class Episode < ApplicationRecord
       before_destroy do 
          puts self.authors.count
       end
       acts_as_paranoid 
       has_many :authors, dependent: :destroy

    acts_as_paranoid vs belongs_to

    By using acts_as_paranoid belongs_to should be:

    class Author < ApplicationRecord
       belongs_to :material, with_deleted: true
    end

    I could also live with this. At a certain stage it was easier to just add with_deleted: true to the belongs_to relations.

    acts_as_paranoid vs Shrine

    So the Material has a Shrine attachment stored on Amazon S3. What should happen when you delete the Material. Should you delete the file from S3? Of course not. If you delete the file you will not be able to recover the file.

    The solution was to modify the Shrine Uploader

    class Attacher < Shrine::Attacher
    
      def activerecord_after_destroy 
        # Just dont call super and keep the file
        # Sine objects at BuildIn3D & FLLCasts are softdeleted we want to 
        # handle the deletion logic in another way.
        true
      end
    
    end
    

    I was living with this for months and did not had many issues.

    What Pain and Problem are we addressing?

    The current problem that I can not find a workaround for is the MaterialsTrashController that is using CanCanCan. All the solutions would require for this controller to be different than the rest of the controllers and my biggest concern is that this will later result in issues. I would like to have a single place were we check if a User has access to the Material and whether they could read or update it or recover it. If we split the logic of the MaterialsController and the MaterialsTrashController we would end up with a hidden and difficult to maintain duplication.

    But was is the real problem that we want to solve?

    On our platform we have authors for the instructions and we work close with them. I imagine one particular author that I will call TM (Taekwondon Master). So TM uploads a material and from time to time he incidentally could delete a material. That’s it. When he deletes a Material it should not be deleted, but rather put in a trash. Then when he deletes it from the trash he must confirm with a blood sample. That’s it. I just want to stop TM from losing any Material by accident.

    The solution is pretty simple.

    In the MaterialsController just show all the materials that do not have a :deleted_at column set.

    In the MaterialsTrashController just show only the Materials with :delete_at controller.

    I can solve the whole problem with one simple filter that would take me like 1 minute to implement. We don’t need any of the problems above. They simply will not exist.

    That’s it. I will start with the Material and I will move through the other models as we are implementing the additional Trash controllers for the other models.

     
  • kmitov 4:19 pm on June 13, 2021 Permalink |
    Tags: , , ,   

    Dependencies – one more variable adding to the “cost of the code” 

    One thing I have to explain a lot is what are the costs of software development. Why are things taking so long? Why is there any needed for maintenance and support? Why are developers spending significant amount of their time looking over the existing code base and why we can not just add the next and the next feature?

    Today I have an example of this – and these are “dependencies”.

    The goal of this article is to give people more understanding on how the “tech works.”. I’ve seen that every line of code and every dependency that we add to a project will inevitably result in further costs down the road so we should really keep free of unnecessary dependencies and features.

    Daily builds

    Many contemporary professional software projects have a daily build. This means that every day at least once the project is “built” from zero, all the tests are run and we automatically validate that the customers could use it.

    Weekly dependencies updates

    Every software project depends on libraries that implement common functionality and features. Having few dependencies is healthy for the project, but having no dependencies and implementing everything on your own is just not viable in today’s world.

    These libraries and frameworks that we depend on also regularly release new versions.

    My general rule that I follow in every project is that we check for new versions of the dependencies every Wednesday at around 08:00 in the morning. We check for new dependencies, we download them, we build the project and we run the specs/tests. If the tests fail this means that the new dependencies that we’ve downloaded have somehow changed the behavior of the project.

    Dependencies change

    Most of the time dependencies are changed in a way that does not break any of the functionality of your project. This week was not such a week. A new dependency came along and it broke a few of the projects.

    The problem came from a change in two dependencies:

    Fetching websocket-driver 0.7.5 (was 0.7.4)
    Fetching mustache-js-rails 4.2.0.1 (was 4.1.0)
    Installing mustache-js-rails 4.2.0.1 (was 4.1.0)
    Installing websocket-driver 0.7.5 (was 0.7.4) with native extensions
    

    We have installed new versions of two of the dependencies “websocket-driver” and “mustache-js-rails’

    These two dependencies broke the builds.

    Why should we keep up to date

    Now out of the blue we should resolve this problem. This takes time. Sometimes it is 5 minutes. Sometimes it could be an hour or two. If we don’t do it, it will probably result in more time at a later stage. As the change is new in ‘mustache-js-rails’ we have the chance to get in touch with the developers of the library and resolve the issue while it is fresh for them and they are still “in the context” of what they were doing.

    Given the large number of dependencies that each software project has there is a constant need to keep up to date with new recent versions of your dependencies.

    What if we don’t keep up to date?

    I have one such platform. We decided 6-7 years ago not to invest any further in it. It is still working but it is completely out of date. Any new development will cost the same as basically developing the platform as brand new. That’s the drawback of not keeping up to date. And it happens even with larger systems on a state level with the famous search for COBOL developers because a state did not invest in keeping their platform up to date for some 30+ years.

     
  • kmitov 6:41 am on June 5, 2021 Permalink |
    Tags: , , , ,   

    Yet another random failing spec 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    This article is about a random failing spec. I spent more than 5 hours on this trying to track it down so I decided to share with our team what has happened and what the stupid mistake was.

    Random failing

    Random failing specs are most of the time passing and sometimes failing. The context of their fail seems to be random.

    Context

    At FLLCasts.com we have categories. There was an error when people were visiting the categories. We receive each and every error on an email and some of the categories stopped working, because of a wrong sql query. After migration from Rails 6.0 to Rails 6.1 some of the queries started working differently mostly because of eager loads and we had to change them.

    The spec

    This is the code of the spec

     scenario "show category content" do
        category = FactoryBot.create(:category, slug: SecureRandom.hex(16))
        episode = FactoryBot.create(:episode, :published_with_thumbnail, title: SecureRandom.hex(16))
        material = FactoryBot.create(:material, :published_with_thumbnail, title: SecureRandom.hex(16))
        program = FactoryBot.create(:program, :published_with_thumbnail, title: SecureRandom.hex(16))
        course = FactoryBot.create(:course, :published_with_thumbnail, title: SecureRandom.hex(16))
    
        category.category_content_refs << FactoryBot.create(:category_content_ref, content: episode, category: category)
        category.category_content_refs << FactoryBot.create(:category_content_ref, content: material, category: category)
        category.category_content_refs << FactoryBot.create(:category_content_ref, content: program, category: category)
        category.category_content_refs << FactoryBot.create(:category_content_ref, content: course, category: category)
    
        expect(category.category_content_refs.count).to eq 4
        visit "/categories/#{category.to_param}"
    
        find_by_xpath_with_page_dump "//a[@href='/tutorials/#{episode.to_param}']"
        find_by_xpath_with_page_dump "//a[@href='/materials/#{material.to_param}']"
        find_by_xpath_with_page_dump "//a[@href='/programs/#{program.to_param}']"
        find_by_xpath_with_page_dump "//a[@href='/courses/#{course.to_param}']"
    
      end

    We add a few objects tot he category and then we check that we see them when we visit the category.

    The problem

    Sometime while running the spec only 1 of the objects in the category are shown. Sometimes non, most of the time all of them are shown.

    The debug process

    The controller

    def show
      @category_content_refs ||= @category.category_content_refs.published
    end

    In the category we just call published to get all the published content that is in this category. There are other things in the show but they are not relevant. We were using apply_scopes, we were using other concerns.

    The model

      scope :published, lambda {
        include_contents.where(PUBLISHED_OR_COMING_WHERE_SQL)
      }

    The scope in the model makes a query for published or coming.

    And the query, i kid you not, that was committed in 2018 and we’ve had this query for so long was

    class CategoryContentRef < ApplicationRecord
       
        PUBLISHED_OR_COMING_WHERE_SQL = ' (category_content_refs.content_type = \'Episode\' AND (episodes.published_at <= ? OR episodes.is_visible = true) ) OR
         (category_content_refs.content_type = \'Course\' AND courses.published_at <= ?) OR
         (category_content_refs.content_type = \'Material\' AND (materials.published_at <= ? OR materials.is_visible = true) ) OR
         category_content_refs.content_type=\'Playlist\'', *[Time.now.utc.strftime("%Y-%m-%d %H:%M:%S")]*4].freeze
    
    end
    

    I will give you a hit that the problem is with this query.

    You can take a moment a try to see where the problem is.

    The query problem

    The problem is with the .freeze and the constant in the class. The query is initialized when the class is loaded. Because of this it takes the time at the moment of loading the class and not the time of the query.

    Because the specs are fast sometimes the time of loading of the class is right before the spec and sometimes there are specs executed in between.

    It seems simple once you see it, but these are the kind of things that you keep missing while debugging. They are right in-front of your eyes and yet again sometimes you just can’t see them, until you finally see them and they you can not unsee them.

     
  • kmitov 3:19 pm on May 31, 2021 Permalink |
    Tags: , ,   

    When caching is bad and you should not cache. 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    On Friday we did some refactoring at FLLCasts.com. We removed Refinery CMS, which is a topic for another article, but one issue pop-up – on a specific page caching was used in a way that made the page very slow. This article is about how and why. It is mainly for our team as a way to share the knowledge among ourselves, but I think the whole community could benefit, especially the Ruby on Rails community.

    TL;DR;

    When making a request to a cache service, be it MemCachir, Redis or any other, you are making a request to a cache service. This will include a get(key) method call and if the value is not stored in the cache, it will include a set(key) method call. When the calculation you are doing is simple it will take more time to cache the result from the calculation than to do the calculation again, especially if this calculation is a simple string concatenation.

    Processors (CPUs) are really good at string concatenation and could do them in a single digit milliseconds. So if you are about to cache something, make sure that you cache something worth caching. There is absolutely no reason to cache the result of:

    # Simple string concatenation. You calculate the value. No need to cache it.
    value = "<a href=#{link}>Text</a>". 
    
    # The same result, but with caching
    # There isn't a universe in which the code below will be faster than the code above.
    hash = calculate_hash(link)
    cached_value = cache.get(hash)
    if cached_value == nil
       cached_value = "<a href=#{link}>Text</a>". 
       cache.set(hash, cached_value)
    end 
    
    value = cached_value

    Context for Rails

    Rails makes caching painfully easy. Any server side generated HTML could be cached and returned to the user.

    <% # The call below will render the partial "page" for every page and will cache the result %>
    <% # Pretty simple, and yet there is something wrong %>
    <%= render partial: "page", collection: @pages, cached: true %>

    What’s wrong is that we open the browser and it takes more than 15 seconds to load.

    Here is a profile result from New Relic.

    As you can see there a lot of Memcached calls – like 10, and a lot of set calls. There are also a lot of Postgres find methods. All of this is because of how caching was set up in the platform. The whole “page” partial, after a decent amount of refactoring turns out to be a simple string concatenation as:

    <a href="<%= page.path%>"><%= page.title %></a>

    That’s it. We were caching the result of a simple string concatenation which the CPU is quite fast in doing. Because there were a lot of pages and we were doing the call for all of the pages, when opening the browser for the first time it just took too much to call all the get(key), set(key) methods and the page was returning a “Time out”

    Conclusion

    You should absolutely use caching and cache the values of your calculations, but only if those calculations take more time than asking the cache for a value. Otherwise it is just not useful.

     
  • kmitov 9:14 am on May 7, 2021 Permalink |
    Tags: ,   

    “[DOM] Input elements should have autocomplete attributes” 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    This is one of the things that could make a platform better. Here is how the warning looks like in the browser console.

    More information at – https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/autocomplete

    The autocomplete attributes could allow browsers, extensions and other agents guess what the user should do on this page. It could make it easier for the user. For example an extension could suggest a new password in the field, or could understand to fill the name of the user in the “name” field.

    Additionally we don’t like warnings.

    To check out the behavior, if you have a password manager for example go to

    https://www.fllcasts.com/users/sign_in

    or

    https://www.buildin3d.com/users/sign_in

     
  • kmitov 8:39 am on May 7, 2021 Permalink |
    Tags: csrf, , security   

    [Rails] Implementing an API token for post requests 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    At the BuildIn3D platform we provide clients with API to send certain HTTP POST requests. Questions is – how do we authenticate them.

    Here is one of the authentication steps – we implemented our own build_token. When authenticity_token for CSRF is available we also use the authenticity_token. But it is not always available because the authenticity_token depends on the session and the session cookie. But there might not be a session and a cookie in some cases and yet we still need some authentication. Here is how we do it.

    Generate a Unique encrypted token on the server side

    The server generates a token based on pass params. This could be username or password or other info.

        def to_build_token
          len   = ActiveSupport::MessageEncryptor.key_len
          salt  = Rails.appplicaton.secret_build_token_salt
          key   = ActiveSupport::KeyGenerator.new(Rails.application.secret_key_base).generate_key(salt, len)
          crypt = ActiveSupport::MessageEncryptor.new(key)
          encrypted_data = crypt.encrypt_and_sign(self.build_id)
          Base64.encode64(encrypted_data)
        end

    This will return a new token that has encrypted the build_id.

    encrypted_data = crypt.encrypt_and_sign(self.build_id)
    # We could easily add more things to encrypt, like user, or some params or anything you need to get back as information from the token when it is later submitted

    Given this token we can pass this token to the client. The token could expire after some time.

    We would require the client to send us this token on every request from now on. In this way we know that the client has authenticated with our server.

    Decryption of the token

    What we are trying to extract is the build_id from the token. The token is encrypted so the user can not know the secret information that is the build_id.

    def self.build_id_from_token token
      len   = ActiveSupport::MessageEncryptor.key_len
      salt  = Rails.application.secret_salt_for_build_token
      key   = ActiveSupport::KeyGenerator.new(Rails.application.secret_key_base).generate_key(salt, len)
      crypt = ActiveSupport::MessageEncryptor.new(key)
      crypt.decrypt_and_verify(Base64.decode64(token))
    end

    Requiring the param in each post request

    When a post request is made we should check that the token is available and it was generated from our server. This is with:

      def create
          build_token = params.require("build_token")
          build_id_from_token = Record.build_id_from_token(build_token)
          .... # other logic that now has the buid_id token
      end

    The build token is one of the things we use with the IS at BuildIn3D and FLLCasts.

    Polar bear approves of our security.

     
  • kmitov 8:15 am on May 7, 2021 Permalink |
    Tags: ,   

    [Rails] Crawlers you don’t want to cause a notification for an error 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    We are using exception_notification gem. It is great because it will send us a direct email when there is an error on the platform. This means we are not looking at the logs. We get the notification directly on the email.

    But some of the crawlers (like all of them) cause errors, mainly in the cross-origin domain. The JavaScript we are delivering is not allowed for their domain and this generates an error. One way to stop these errors is to ignore errors caused crawlers in the exception_notifcation configuration:

    Rails.application.config.middleware.use ExceptionNotification::Rack,
        ignore_crawlers: %w{Googlebot bingbot YandexBot MJ12bot facebookexternalhit SEMrushBot AhrefsBot},
        :email => {
          :email_prefix => ...,
          :sender_address => ...,
          :exception_recipients => ...
        }

     
  • kmitov 8:05 am on May 7, 2021 Permalink |
    Tags: ,   

    [Rails] Disabling Forgery Protection on API controllers 

    Forgery protection comes for free in Ruby on Rails and is described in the security guide – https://guides.rubyonrails.org/security.html#cross-site-request-forgery-csrf.

    You don’t want forgery protection on some API controllers. If the API controllers extend the ActionController::Base the forgery protection could be disabled in the following way. Here is an example for a controller accepting requests from Amazon AWS SNS.

    class SnsController < ActionController::Base
    
      # Disable it for this controller. 
      # If there is no session it is just null session
      protect_from_forgery with: :null_session
    
      http_basic_authenticate_with :name => ENV["SNS_USERNAME"], :password => ENV["SNS_PASSWORD"]
    
      ...
    end
    

    An even better approach would be not extending from ActionController::Base, but from ActionController::API. But then we would have to include the modules for HttpAuthentication which is a topic for another article.

     
  • kmitov 7:56 am on May 7, 2021 Permalink |
    Tags: ,   

    [Rails] Please use symbols for polymorphic route arguments 

    This error occurred today with our platform. Issue is at https://github.com/rails/rails/issues/42157

    The issue occurs because of a new security patch with with Rails tracked with CVE-2021-22885.

    You can no longer call polymorphic_path or other dynamic path helpers with strings, because if this strings are provided by the user this could result in unwanted router helper calls.

    # previous call
    polymorphic_path([article, "authors"])
    # should now be
    polymorphic_path([article, "authors".to_sym])
    # or better
    polymorphic_path([article, :authors])

    Migration effort

    A perspective on how serious it is to upgrade – tl;dr – it is not – about half an hour.

    All the calls in our platform for polymorphic_path

    $ git grep  "polymorphic_path" | wc -l
    321

    All the file that have calls to polymorphic_path

    $ git grep -l  "polymorphic_path" | wc -l
    143

    Numbers of files that I’ve changed – 13
    Time it took me to review all of them -16 minutes, from 18:24 to 18:40

    Now I am waiting for specs to pass.

    I guess it is not a big change and could be migrate in half an hour. Most of our calls were using symbols already and only about 15 calls from 321 were with strings. These are 4% of all the calls.

     
  • kmitov 5:03 am on April 14, 2021 Permalink |
    Tags: , ,   

    Rendering plain text from the server is not plain text in the browser 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    One thing that is good to know:

    If you render plain text from a web framework like Rails you will not have plain text in the browser. You will still have HTML in the browser.

    This bite us with one of the RSpec scenarios that we were running and I thought I should share with the team and the whole community.

    IRL

    The real life scenario is the following.

    In the Rails controller we do

    class DigestMessagesController < ApplicationController
        def show
            render plain: "Text"
        end
    end
    

    What we would expect to see in the browser is a simple “Text” text. If fact this is exactly what is returned by the server – you can see in the “Response” tab.

    you can see that the response from the server is just “Text”

    But if you now Inspect the content of the page in the browser you will see that there is HTML inside.

    
    <html>
     <head></head>
     <body>
     <pre style="word-wrap: break-word; white-space: pre-wrap;">Text</pre>
     </body>
    </html>
    

    This HTML was not returned by the framework. This HTML was added by the browser. This is not good because an RSpec system scenario with Capybara will pass with rack_test driver, but will fail with a Selenium driver. rack_test driver will not add this HTML while all the major browsers will add it.

    scenario 'for text' do
          visit "/digest_messages/#{digest_message.id}?preview=plain"
          expect(page.body.to_s).to include("Text")
          # This expect here will fail for Selenium and will pass for rack_test driver.
          expect(page.body.to_s).not_to include("html")
    end
    

    I hope this is helpful and could save you a couple of minutes debugging.

     
  • kmitov 6:28 pm on April 7, 2021 Permalink |
    Tags: , , , rails-ujs, ,   

    From a ticket to deploy in an hour. jQuery runs the scripts, DOM does not. 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    This is an article of a production incidents with Rails/Stimulus/Rails-ujs/DOM and jQuery.

    We made mistake that made us lose 31 registrations on the FLLCasts and a few more on the BuildIn3D platform. The stack involved includes Ruby on Rails with Rails-ujs, Stimulus, DOM and jQuery. I will enter into some details about how it occurred, why and how we resolved it in under an hour. I am writing this article mainly to share with my our team, but I am sure it could be useful for other teams.

    FLLCasts registration form as of April 2021

    The incident – people could not Sign Up on the platforms

    On the Sign up page there is a captach. Years ago we found our that the captcha helps us reduce invalid registartion.
    There is also one more thing on this platform – usernames. As the FLLCasts platforms is used by students and academies, one of the very useful features on the platform is the automatically generated username. Users type their names in the “Full name” field and a username is automatically generated for them. In this way there is a unique username for users because email is not unique – there could be many users with the same email.

    To generate a new username we send a request to the server and refresh form. We use Stimulus and rails-ujs for this.

    The is an “change->registrations#refreshForm” for the input field for the name. When the full name is changed the form is refreshed and a unique username is returned from the server.

    <%= user_form.fields_for :data, {parent_builder: user_form} do |data_form| %>
      <%= data_form.text_field :name_attribute,
        {
          icon_prepend: "icon-finance-067 u-line-icon-pro",
          autofocus: true,
          data: {
            registrations_target: "name",
            action: "change->registrations#refreshForm"
          },
          autocomplete: "off"
        }
      %>
    
      <%= data_form.email_field :email %>
    <% end %>

    Here is how it should work:

    How the form is supposed to work

    The incident was that when the form was refresh the captcha got lost and we were not showing it after that. This means that on the server we were checking the captcha, but there was not captcha on the client.

    The ticket

    A wild ticket appears from a user.

    Hi,
    
    We wanted to enroll our Kid for Personal B programme and we were trying to Sign Up to pay online , but we are unable to sign up using our details.
    We get an error message though we tried a few times.
    
    Please help to sort this out.
    
    Thanks
    
    Regards

    This is a really wild ticket. The user wants to buy and they can not sign up. 1,2,3 go.

    The problem

    We’ve made the following commit in the registration form

    --- a/app/views/devise/registrations/new.js.erb
    +++ b/app/views/devise/registrations/new.js.erb
    @@ -1 +1 @@
    -$("#<%= @form_id %>").replaceWith("<%= escape_javascript(render 'form') %>")
    \ No newline at end of file
    +document.getElementById("<%=form_id%>").outerHTML= '<%= escape_javascript(render "#{partial_name}") %>';
    \ No newline at end of file

    We are in the process of removing jQuery from the code. I saw this jQueyr call and decided to change it to a simple DOM call. It should be the same, right? No…

    jQuery executes scrips while document.outerHTML does not

    I knew this. But I did not consider it in this commit. What happens is that we receive the new form from the server and replace the form on the page with the new form received from the server that contains the generate username.

    But in this form (and this form only on the whole platform) there is a recaptach. This recaptcha is a JavaScript and when dynamically changing the DOM the JavaScript must be executed. Well, jQuery automatically executed this for us. document.outerHTML does not.

    Developing and RSpec system spec.

    First I deployed the fix to production. It took about 5 minutes.

    After that I developed the spec. This is how the spec looks like:

    scenario "shows recaptach when dynamically reloading the form", js: true  do
        with_recaptcha_enabled do
          visit_sign_up
    
          expect(page).to have_recaptcha
    
          name_field.set unique_name
          email_field.set "#{SecureRandom.hex(16)}@example.com"
    
          # This means we have reloaded with js because we've set the email field 
          # and this has change the focus and there was a fire event for the name_field.
          #
          # We are now waiting for the username to appear on the screen
          expect(page).to have_username_filled
    
          expect(page).to have_recaptcha
        end
      end

    We expect that there is a recaptcha before and after refreshing the form.

    What surprised us

    Looking at the data in the time frame this commit was on production – we’ve received 9 registrations on the platform. In the same previous period we’ve receive 40 registrations.

    This means that we’ve lost 31 registrations on the platform.

    What surprised us is that 9 people managed to register. The only way you could register on the platform during this period is to first enter your username. If you enter the username on your own we do not refresh the form as we don’t have to generate a username for you. This means that about 25% of the people start with their username and about 75% of the users at FLLCasts start the registration with their name or email.

    Good to know. Good to know.

    Conclusion

    Looking at the issue we could have prevented it in a number of ways. In a test environment it is difficult to have a recaptcha because there is no way to test that the recaptcha works. After all that is the whole purpose of a recaptcha – to prevent bots and a Selenium driver is exactly a bot. But it turned out we’ve missed it as a scenario in the specs. It is always a missing spec.

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel