We had this strange case of a spec that was randomly failing
scenario "generate a subscribe link for not logged in users" js: true do
visit "/page_url"
expect(page).to have_xpath "//a[text()='Subscribe']"
click_link "Subscribe"
...
end
When a user is logged in we generate a button that subscribes them immediately. But when a user is not logged in we generate a link that will direct the users to the subscription page for them to learn more about the subscription.
This works well, but the spec is randomly failing sometimes.
We expect there to be a link, eg. “//a” but on the page there is actually a button, eg. “//button”
What this means is that when the spec started there was a logged in user. The user was still not logged out from the previous spec. This explains why sometimes the spec fails and why not – because we are running all the specs with a random order
There is a Warden.test_reset! that is supposed to reset the session, but it seems for js: true cases where we have a Selenium driver the user is not always reset before the next test starts.
# spec/rails_helper.rb
RSpec.configure do |config|
...
config.after(:each, type: :system) do
Warden.test_reset!
end
end
Logout before each system spec that is js: true
I decided to try to explicitly log out before each js: true spec that is ‘system’ so I improved the RSpec.configuration
RSpec.configure do |config|
config.before(:each, type: :system, js: true) do
logout # NOTE Sometimes when we have a js spec the user is still logged in from the previous one
# Here I am logging it out explicitly. For js it seems Warden.test_reset! is not enough
#
# When I run specs without this logout are
# Finished in 3 minutes 53.7 seconds (files took 28.79 seconds to load)
# 383 examples, 0 failures, 2 pending
#
# With the logout are
#
# Finished in 3 minutes 34.2 seconds (files took 21.15 seconds to load)
# 383 examples, 0 failures, 2 pending
#
# Randomized with seed 26106
#
# So we should not be losing on performance
end
end
Conclusion
Warden.test_reset! does not always logout the user successfully before the next spec when specs are with Selenium driver – eg. “js: true”. I don’t know why, but that is the observed behavior. I’ve added a call to “logout” before each system spec that is js: true to make sure the user is logged out.
This is some serious advanced stuff here. You should share it.
A colleague, looking at the git logs
I recently had to create a spec with Capybary+RSpec where I refresh the page and wait for a value to appear on this page. It this particular scenario there is no need for WebSockets or and JS. We just need to refresh the page.
But how to we test it?
# Expect that the new records page will show the correct value of the record
# We must do this in a loop as we are constantly refreshing the page.
# We need to stay here and refresh the page
#
# Use the Tmeout.timeout to stop the execution after the default Capybara.default_max_wait_time
Timeout.timeout(Capybara.default_max_wait_time) do
loop do
# Visit the page. If you visit the same page a second time
# it will refresh the page.
visit "/records"
# The smart thing here is the wait: 0 param
# By default find_all will wait for Capybara.default_max_wait_time as it is waiting for all JS methods
# to complete. But there is no JS to complete and we want to check the page as is, without waiting
# for any JS, because there is no JS.
#
# We pase a "wait: 0" which will check and return
break if find_all(:xpath, "//a[@href='/records/#{record.to_param}' and text()='Continue']", wait: 0).any?
# If we could not find our record we sleep for 0.25 seconds and try again.
sleep 0.25
end
end
I hope it is helpful.
Want to keep it touch – find me on LinkedIn or Twitter.
(Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)
This article is about a random failing spec. I spent more than 5 hours on this trying to track it down so I decided to share with our team what has happened and what the stupid mistake was.
Random failing
Random failing specs are most of the time passing and sometimes failing. The context of their fail seems to be random.
Context
At FLLCasts.com we have categories. There was an error when people were visiting the categories. We receive each and every error on an email and some of the categories stopped working, because of a wrong sql query. After migration from Rails 6.0 to Rails 6.1 some of the queries started working differently mostly because of eager loads and we had to change them.
We add a few objects tot he category and then we check that we see them when we visit the category.
The problem
Sometime while running the spec only 1 of the objects in the category are shown. Sometimes non, most of the time all of them are shown.
The debug process
The controller
def show
@category_content_refs ||= @category.category_content_refs.published
end
In the category we just call published to get all the published content that is in this category. There are other things in the show but they are not relevant. We were using apply_scopes, we were using other concerns.
The scope in the model makes a query for published or coming.
And the query, i kid you not, that was committed in 2018 and we’ve had this query for so long was
class CategoryContentRef < ApplicationRecord
PUBLISHED_OR_COMING_WHERE_SQL = ' (category_content_refs.content_type = \'Episode\' AND (episodes.published_at <= ? OR episodes.is_visible = true) ) OR
(category_content_refs.content_type = \'Course\' AND courses.published_at <= ?) OR
(category_content_refs.content_type = \'Material\' AND (materials.published_at <= ? OR materials.is_visible = true) ) OR
category_content_refs.content_type=\'Playlist\'', *[Time.now.utc.strftime("%Y-%m-%d %H:%M:%S")]*4].freeze
end
I will give you a hit that the problem is with this query.
You can take a moment a try to see where the problem is.
The query problem
The problem is with the .freeze and the constant in the class. The query is initialized when the class is loaded. Because of this it takes the time at the moment of loading the class and not the time of the query.
Because the specs are fast sometimes the time of loading of the class is right before the spec and sometimes there are specs executed in between.
It seems simple once you see it, but these are the kind of things that you keep missing while debugging. They are right in-front of your eyes and yet again sometimes you just can’t see them, until you finally see them and they you can not unsee them.
(Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)
One thing that is good to know:
If you render plain text from a web framework like Rails you will not have plain text in the browser. You will still have HTML in the browser.
This bite us with one of the RSpec scenarios that we were running and I thought I should share with the team and the whole community.
IRL
The real life scenario is the following.
In the Rails controller we do
class DigestMessagesController < ApplicationController
def show
render plain: "Text"
end
end
What we would expect to see in the browser is a simple “Text” text. If fact this is exactly what is returned by the server – you can see in the “Response” tab.
But if you now Inspect the content of the page in the browser you will see that there is HTML inside.
This HTML was not returned by the framework. This HTML was added by the browser. This is not good because an RSpec system scenario with Capybara will pass with rack_test driver, but will fail with a Selenium driver. rack_test driver will not add this HTML while all the major browsers will add it.
scenario 'for text' do
visit "/digest_messages/#{digest_message.id}?preview=plain"
expect(page.body.to_s).to include("Text")
# This expect here will fail for Selenium and will pass for rack_test driver.
expect(page.body.to_s).not_to include("html")
end
I hope this is helpful and could save you a couple of minutes debugging.
(Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)
This is an article of a production incidents with Rails/Stimulus/Rails-ujs/DOM and jQuery.
We made mistake that made us lose 31 registrations on the FLLCasts and a few more on the BuildIn3D platform. The stack involved includes Ruby on Rails with Rails-ujs, Stimulus, DOM and jQuery. I will enter into some details about how it occurred, why and how we resolved it in under an hour. I am writing this article mainly to share with my our team, but I am sure it could be useful for other teams.
The incident – people could not Sign Up on the platforms
On the Sign up page there is a captach. Years ago we found our that the captcha helps us reduce invalid registartion. There is also one more thing on this platform – usernames. As the FLLCasts platforms is used by students and academies, one of the very useful features on the platform is the automatically generated username. Users type their names in the “Full name” field and a username is automatically generated for them. In this way there is a unique username for users because email is not unique – there could be many users with the same email.
To generate a new username we send a request to the server and refresh form. We use Stimulus and rails-ujs for this.
The is an “change->registrations#refreshForm” for the input field for the name. When the full name is changed the form is refreshed and a unique username is returned from the server.
The incident was that when the form was refresh the captcha got lost and we were not showing it after that. This means that on the server we were checking the captcha, but there was not captcha on the client.
The ticket
A wild ticket appears from a user.
Hi,
We wanted to enroll our Kid for Personal B programme and we were trying to Sign Up to pay online , but we are unable to sign up using our details.
We get an error message though we tried a few times.
Please help to sort this out.
Thanks
Regards
This is a really wild ticket. The user wants to buy and they can not sign up. 1,2,3 go.
The problem
We’ve made the following commit in the registration form
--- a/app/views/devise/registrations/new.js.erb
+++ b/app/views/devise/registrations/new.js.erb
@@ -1 +1 @@
-$("#<%= @form_id %>").replaceWith("<%= escape_javascript(render 'form') %>")
\ No newline at end of file
+document.getElementById("<%=form_id%>").outerHTML= '<%= escape_javascript(render "#{partial_name}") %>';
\ No newline at end of file
We are in the process of removing jQuery from the code. I saw this jQueyr call and decided to change it to a simple DOM call. It should be the same, right? No…
jQuery executes scrips while document.outerHTML does not
I knew this. But I did not consider it in this commit. What happens is that we receive the new form from the server and replace the form on the page with the new form received from the server that contains the generate username.
But in this form (and this form only on the whole platform) there is a recaptach. This recaptcha is a JavaScript and when dynamically changing the DOM the JavaScript must be executed. Well, jQuery automatically executed this for us. document.outerHTML does not.
Developing and RSpec system spec.
First I deployed the fix to production. It took about 5 minutes.
After that I developed the spec. This is how the spec looks like:
scenario "shows recaptach when dynamically reloading the form", js: true do
with_recaptcha_enabled do
visit_sign_up
expect(page).to have_recaptcha
name_field.set unique_name
email_field.set "#{SecureRandom.hex(16)}@example.com"
# This means we have reloaded with js because we've set the email field
# and this has change the focus and there was a fire event for the name_field.
#
# We are now waiting for the username to appear on the screen
expect(page).to have_username_filled
expect(page).to have_recaptcha
end
end
We expect that there is a recaptcha before and after refreshing the form.
What surprised us
Looking at the data in the time frame this commit was on production – we’ve received 9 registrations on the platform. In the same previous period we’ve receive 40 registrations.
This means that we’ve lost 31 registrations on the platform.
What surprised us is that 9 people managed to register. The only way you could register on the platform during this period is to first enter your username. If you enter the username on your own we do not refresh the form as we don’t have to generate a username for you. This means that about 25% of the people start with their username and about 75% of the users at FLLCasts start the registration with their name or email.
Good to know. Good to know.
Conclusion
Looking at the issue we could have prevented it in a number of ways. In a test environment it is difficult to have a recaptcha because there is no way to test that the recaptcha works. After all that is the whole purpose of a recaptcha – to prevent bots and a Selenium driver is exactly a bot. But it turned out we’ve missed it as a scenario in the specs. It is always a missing spec.
(Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)
TL; DR; – JSON.parse(“1”) works. JSON.parse(“k1”) throws an exception
This is a valid JSON
"1"
This is NOT a valid JSON
"k1"
The code
$ irb
2.6.5 :001 > require 'json'
=> true
2.6.5 :002 > JSON.parse("1")
=> 1
2.6.5 :003 > JSON.parse("k1")
Traceback (most recent call last):
6: from /home/kireto/.rvm/rubies/ruby-2.6.5/bin/irb:23:in `<main>'
5: from /home/kireto/.rvm/rubies/ruby-2.6.5/bin/irb:23:in `load'
4: from /home/kireto/.rvm/rubies/ruby-2.6.5/lib/ruby/gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
3: from (irb):3
2: from /home/kireto/.rvm/gems/ruby-2.6.5/gems/json-2.5.1/lib/json/common.rb:216:in `parse'
1: from /home/kireto/.rvm/gems/ruby-2.6.5/gems/json-2.5.1/lib/json/common.rb:216:in `parse'
JSON::ParserError (809: unexpected token at 'k1')
2.6.5 :004 > JSON.parse("\"1\"")
It is always mistakes as simple as that, that are causing the bugs.
Details
I am sharing the details mostly for our team.
When users use the Instructions Steps framework on the BuildIn3D and FLLCasts platforms we have a communication between the client and the server. The client sends information as JSON, the server parses it. For a few days some of the communication was missed. Not everything was recorded on the server. This was not anything critical, but about 360,000 events were not recorded and it shows how we did not understand JSON.parse entirely.
The commit
The commit to cause this was
else
+ data = JSON.parse(events_array[0]["data"])
+ referrer = data["referrer"] if data.is_a? Hash
+
We get a string from events_array[0][“data”] and we parse this string. But this string is not always a JSON. It is sometimes just pure string. All the specs pass and we are committing the code.
In the specs suite all the data that we are testing is “1”,”,2″- generally numbers. Why? Because this makes sense looking at the data of the spec and in the spec we’ve had only numbers as the string, but not a general string like “k1”.
(Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)
Today on a review I saw an RSpec Matcher that was more complex than it should be. We’ve discussed it internally. I think this article could be useful to our team and to the community as a whole.
An RSpec Matcher should be simple. It should return just ‘true’ or ‘false’
IRL example of a complex RSpec Matcher
This is the real live example of an RSpec Matcher from todays review
RSpec::Matchers.define :have_content_picture_type_set_to do |picture_ref, content_picture_type, content_object|
match do |page|
content_object_undersore = content_object.model_name.to_s.underscore
content_object_undersore = "tutorial" if content_object_undersore == "episode"
content_object_undersore = "courses/#{content_object.course.to_param}/section" if content_object_undersore == "course_section"
picture_ref_row = page.find(:xpath, "//tr[.//img[@src='#{picture_ref.content_picture.image_url}']]")
js_agnostic_click_table_action picture_ref.to_param, "/author/#{content_object_undersore}s/#{content_object.to_param}/pictures/#{picture_ref.to_param}/edit", "Edit Picture"
page.select content_picture_type, from: "content_picture_ref[content_picture_type]"
click_on_and_expect "Update Picture", "Picture successfully updated"
expect(page).to have_picture_ref_with_content_picture_type picture_ref, content_picture_type
end
failure_message do |page|
"expected page to have content_picture with the specified type, but was #{page.body}"
end
end
There is a lot to unpack here.
We do some calculations of paths with the whole content_object_underscore logic
Then we find something on the page with the page.find method
Then we call ‘js_agnostic_click_table_action‘ which is a complex method
Then we select something from an HTML select
Then we click a button with the click_on_and_expect
At the end we expect something
We should have only step 6. That’s the purpose of an RSpec Matcher. Return true or false if the page contains something or not.
Why has this happened?
Because RSpec.define :matcher is a function like any other ruby function. And as a ruby function you can do whatever you want with it. You can call anything you want in this function.
Why should an RSpec Matcher be simple?
That’s the convention for RSpec Matchers. Imagine the following spec:
visit_pictures material
expect(page).to have_picture_ref_with_content_picture_type material.content_picture_refs.first, "Thumbnail"
With this spec you would expect visit a page and check if there is something on the page. But now have_pictuer_ref_with_content_picture_type makes calls, changes the db, modifies the current page and there is no way for you to understand it from reading the spec. Let alone reuse the Matcher in another situation.
What feature is this about?
The feature above is about showing a picture on the animated 3D models and building instructions that we show at BuildIn3D. Authors could choose which picture is the Thumbnail – example is this FabBrix Hen :).
Today the specs for an index page failed and I had to improve it. I decided to share a little trick for when we depend on the order of the objects.
The specs is for the /subscriptions page where we show an index of all the subscriptions. We order the subscriptions by created_at in DESC. There are 20 subscriptions on the page. Then you must got to the next page. In the spec we open the page and check that there is a link to our subscription.
The problem was that there are other specs which for some reason create subscriptions into the future. This means that at a certain point in time when more than 20 subscriptions are created into the future in the DB, then our spec will fail. It will fail because the newly created subscription is on the second page.
All the subscriptions on this page are into the future as today is 2020-11-30. So our newly created subscription is not here.
What are our options?
Move to the correct page in the spec
This is an option. It will require us to have a loop in the spec that would more to the next page in the index and search for each subscription. This is too much logic for a spec.
Delete all the future subscriptions before starting the spec
Could be done. But is more logic for the spec. It actually needs to delete subscriptions and this is not the job of this specs.
Create a subscription that is with the most recent created_at
It is simple.
let(:subscription) {FactoryBot.create(:subscription, created_at: subscription_created_at_time)}
def subscription_created_at_time
(Subscription.order(:created_at).last.try(:created_at) || Time.now)+1.second
end
It is the simpler change for this spec. Just create a subscription that’s last. In this way we know it will appear on the first page even when other specs have created subscriptions into the future.
One common “mistake” I’ve seen a couple of times is to clean the Database before/after specs are run. It seems to be a common practice with reasonable arguments. I think this is a bad idea. Here is why and what we should do instead.
Why is the DB cleared before/after the specs
When running specs that need access to a DB we might have to create a User or an Article or a Project model, then connect them in a certain way and test the business logic of our spec. After the spec is finished it is not wise to delete these objects from the DB directly in the spec. Sometimes it takes additional time, sometimes it executes additional logic. In most cases you don’t clear the DB after each and every spec.
It is a good idea to clean the db before all the specs or after all the specs if they are successful. In this way we reset the DB only once, it saves some time and is much cleaner because you can plug in this behavior if you want to.
Why the DB should not be cleared before/after the specs
The simple answer is that our code will never, absolutely never work on a clean db in a production. If we have a test procedure that runs the specs against a clean and empty db they might pass when the db is clean. But what use do we have from code that could work in a clean environment, but could not work in a real production environment. The answer is – non.
We don’t clean our db before/after each spec. In this way we’ve been able to track some really nasty bugs. Like slow queries that are slow only when you have too many users. Other cases involve special relations that are built in time. Like users that are part of an organization and the organization was once having one check for uniqueness of the user and now it has another check. Because the db is not cleared every time we make sure that it is properly migrated with all the needed migrations.
We found out that a 7 years out test db that is not cleared is closer to a 7 years old production db.
The test db is not the production db
The test db is not the production db. It might have the same scheme, that is for sure, but the amount of data in them and the complexity of this data is different. What we need is code that could run on a production db. There is no use of any code that could run only in test environment.
So here is what we do:
We export the production db, we change some data like user emails, names and any other sensitive data and we import it as a test db. We run the specs on this db.
In this way we make sure that the code could actually run on a real db before deploying it.
Each time there is an if-else logic in your code there is some logic branching. A state is checked and based on the result from the check different path are taken. This means that two scenarios are implemented and two scenarios should be support from now on. This could quickly become a problem because the number of supported cases grows exponentially.
At the end you start from Process1 but you have 4 different cases and path that you should support. These are 4 different tests that should be written and these are for different places where things could go wrong.
The four different paths are Process 1,2,4; 1,2,5; 1,3,6; 1,3,7
Demonstration of a live example and a real production bug
We have Subscriptions in the platform. Each subscription has a max_users field that shows how many users you could add to this subscription. If max_users=1 this means you can have only one user using the subscription.
When adding people to the subscription you have two cases
You successfully add someone to the subscription and you show a message “Success”
The max users is reached and you show a message “Error”
The code in a simplified manner looks something like this:
if subscription.save
form.show_message "Success"
else
form.show_message "Error"
While developing we’ve changed the code for the error from form.show_message “Error” to modal_dialog.show_message “Error”
After that we’ve changed the implementation further and the code for modal_dialog.show_message “Error” was no longed called.
As a result when the use tries to add someone to his subscription, for which they’ve payed, the app freezes and nothing happens. There is no error displayed and no user is added to the subscription.
The bug occurred because with the latest changes we’ve forgot to manually check the second case of the if-else clause and there is was not test developed for this case.
The subscription.save knows what message to set based on whether the subscription was successfully saved. It could set Error or it could set Success. After that we just show whatever message the subscription has to the form. If it is empty that’s great. This means no errors have occurred. subscriptions.get_save_message could be implemented in many different ways. It could be on the subscription object or another object, but this depends on the technology and framework used. But after the method save is called the message is set and there is a single flow and now branches in our code. The method form.show_message is called a single time on a single place in our code. If we change the API of this method we would change in a single place and will not forget about the second place. There is always a single scenario. Save->Show message.
Reply
You must be logged in to post a comment.