Tagged: security Toggle Comment Threads | Keyboard Shortcuts

  • kmitov 10:49 am on September 3, 2021 Permalink |
    Tags: aws, cloudflare, nginx, security   

    When the policeman becomes the criminal – how Cloudflare attacks my machines. 

    On the Internet you are nobody until someone attacks you.

    It gets even more interesting when the attack comes from someone with practically unlimited resources and when these are the same people that are supposed to protect you.

    This article is the story of how Cloudflare started an “attack” on a machine at the FLLCasts platform. This increased the traffic of the machine about 10x and AWS started charging the account 10x more. I managed to stop them and I hope my experience is useful for all CTOs, sysadmins, devops and others that would like to understand more and look out for such cases.

    TL; DR;

    Current up to date status is – after all the investigation it turns out that when a client makes a HEAD request for a file this will hit Cloudflare infrastructure. Cloudflare will then send a GET request to the account machine and will cache the file. This has changed at 28 of August. Before 28 of August when clients were sending HEAD requests, Cloudflare was sending HEAD requests (that don’t generate traffic). After 28 of August clients are still sending HEAD requests, but now Cloudflare is sending GET requests, generating terabytes of additional traffic that is not needed.

    Increase of the Bill

    On 28 of August 2021 I got a notification from AWS that the account is close to surpassing its budget for the month. This is not surprising as it was the end of the month, but nevertheless I decided to check. It seems that the traffic to one of the machines has increased 10x in a day. Nothing else has increased. No visits, no other resources, just the traffic to this one particular machine. 
    That was strange. This has been going on for 7 days now and this is the increase of the traffic.

    AWS increase of the bill

    Limit billing on AWS

    First thought was “How can I set a global limit to AWS spending for this account? I don’t want to wake up with $50K in traffic charges the next day?”

    The answer is “You can’t”. There is no way to set a global spending limit for an AWS account. This was something I already knew, but decided to check again with support and yes, you can’t set such a limit. This means that AWS is providing all the tools for you to be bankrupt by a third party and they are not willing to limit it.

    Limit billing on Digital Ocean

    I have some machines on Digital Ocean and I checked there. “Can I set a global spending limit for my account where I will no longer be charged and all my services will stop if my spending is above X amount of dollars?”.
    The answer was again – “No. Digital ocean does not provide it”.

    Should there be a global limit on spending on cloud providers?

    My understanding is – yes. There is a break even point where users are coming to your service and generating revenue and you are delivering the service and this is costing you money. Once it costs you more to deliver the service than the revenue that the service is generating, I would personally prefer to stop the service. No need for it to be running. Otherwize you could wake up with a $50K bill.

    AWS monitoring

    I had the bill from AWS so I tried looking at the monitoring.
    There is a spike every day between 03:00 AM UTC and 05:00 AM UTC. This spike is increasing the traffic with hundreds of gigabytes. It could easily be terabytes next time.
    The conclusion is that the machine is heavily loaded during this time.

    AWS monitoring

    Nginx access.log

    Looking at the access log I see that there are a lot of requests by machines that are using a user agent called ‘curl’. ‘curl’ is a popular tool for accessing files over HTTP and is heavily used by different bots. But bots tend to identify themselves.

    This is how the access.log looks like:

    172.68.65.227 - - [30/Aug/2021:03:26:02 +0000] "GET /f9a13214d1d16a7fb2ebc0dce9ee496e/file1.webm HTTP/1.1" 200 27755976 "-" "curl/7.58.0"

    Parsing the log file

    I have my years in bash experience and couple of commands later I get a list of all the IPs and how many requests we’ve received from these IPs.

    grep curl access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -n

    The result is 547 machines. The full log file is available at – Full list of Cloudflare IPs attacking my machine. The top 20 are (there are some IPs that are not from Cloudflare). The first is the number of requests, the second is the IP of the machine.  

    NumberOfRequest IP
        113 172.69.63.18
        117 172.68.65.107
        150 172.70.42.135
        158 172.70.42.161
        164 172.69.63.82
        167 172.70.42.129
        169 172.69.63.116
        170 172.68.65.231
        173 172.68.65.101
        178 172.69.63.16
        178 172.70.42.143
        188 173.245.54.236
        264 172.70.134.69
        268 172.70.134.117
        269 172.70.134.45
        287 172.70.134.153
        844 172.70.34.131
        866 172.70.34.19
        904 172.70.34.61
        912 172.70.34.69

    These are Cloudflare machines!

    Looking at the machines that are making the requests these are 547 different machines, most of which are Cloudflare machines. These are servers that Cloudflare seems to be running that are making the request.

    How does Cloudflare work?

    For this particular FLLCasts account with this particular machine I have years ago setup Cloudflare to sit in front of the machine to help  protect the account from internet attacks.

    The way Cloudflare works is that only Cloudflare knows what is the IP address of our machine. This is the promise that Cloudflare is making. Because only they know the IP address of the machine, only they know what is the IP address for a given domain. In this way when a user points their browser at the “http://domainname” the internet will direct this request to Cloudflare, then Cloudflare will check if this request is ok, and then and only then forward this request to our machine. But in the meantime Cloudflare is trying to help businesses like the platform by caching the content. This means that when Cloudflare receives a request for a file, they will check on their Cloudflare infrastructure if this file was cached and send a request to the account machine only if there is no cache.

    In a nutshell Cloudflare maintains a cache for the content the platform is delivering.

    Image is from Cloudflare support at https://support.cloudflare.com/hc/en-us/articles/205177068-How-does-Cloudflare-work-

    What is broken?

    Cloudflare maintains a cache of the platform resources. Every night between 03:00 AM UTC and 05:00 AM UTC some 547 Cloudflare machines decide to update their cache and they start sending requests to our server. These are 10x more requests that the machine generally receives from all users. The content on the server does not change. It’s been the same content for years. But for the last 7 days Cloudflare is caching the same content every night on 547 machines.

    And AWS bills us for this.

    Can Cloudflare help?

    I created a ticket. The response was along the lines of “You are not subscribed for our support, you can get only community support”. Fine.
    I called them on the phone early in the morning.
    I called enterprise sales and I asked them.

    Me - "Hi, I am under attack. Can you help?"
    They - "Yes, we can help. Who is attacking you?"
    Me - "Well, you are. Is there an enterprise package I could buy so that you can protect me against your attack?"

    Luckily the guy on the phone caught my sense of humor and urgency and quickly organized a meeting with a product representative. Regrettably there were no solution engineers on this call.

    Both guys were very knowledgeable, but I had difficulties explaining that it was actually Cloudflare causing the traffic increase. I had all the data from AWS, from the access.log files, but the support agents still had some difficulty accepting it.

    To be clear – I don’t think that Cloudflare is maliciously causing this. There is no point. What I think has happened is some misconfiguration on their side that caused this for the last 7 days.

    What I think has happened?

    I tried to explain to the support agents that there are three scenarios all of which Cloudflare is responsible for.

    1. Option 1 – “someone that has 547 machines is trying to attack the FLLCasts account and Cloudflare is failing to stop it”. First this is very unlikely. Nobody will invest in starting 547 machines just to make the platform pay a few dollars more this month. And even if this is the case, this is what Cloudflare should actually prevent, right? Option 1: “Cloudflare is failing in preventing attacks” (unlikely)


    2. Option 2 – “only Cloudflare knows the IP of this domain name and they have been compromised.”. The connection between domain name and ip address is something that only Cloudflare knows about. If a third party knows the domain name and they are able to find the IP name this means that they are compromising Cloudflare. Option 2: “Cloudflare is compromised” (possible, but again, unlikely)

    3. Option 3 – “there is a misconfiguration in some of the Cloudflare servers”. I don’t like looking for malicious activity where everything could be explained with simple ignorance or a mistake. Most likely there is a misconfiguration in the Cloudflare infrastructure that is causing these servers to behave in this way. Option 3: “There is a misconfiguration in Cloudflare infrastructure”

    4. Option 4 – “there is a mistake on our end”. As there basically is nothing on our end and this nothing has not changed in years, the possibility for this to be the case is minimal. 

    On a support call we set a plan with the support agents to investigate it. I will change the public IP of the AWS machine and will reconfigure it on Cloudflare. In this way we hope to stop some of the requests. We have no plan for what to do after that.

    Can I block in on the Nginx level?

    Nginx is an HTTP server,serving files. There are a couple of options to explore there, but the most reasonable was to stop all curl requests to the Nginx server. This was the shortest path. There was no need to protect against other attacks, there was only the need to protect against Cloudflare attacks. The Cloudflare attack was using “curl” as a tool. I decided to stop ‘curl’

      # Surely not the best, but the simplest and will get the job done for now.
      if ($http_user_agent ~ 'curl') {
          return 444; # Consider returning 444. It's a custom nginx code that drop the connection without responding.
      }

    Resolution

    I am now waiting to see if the change of the public IP of the AWS machine will have any impact and if not I am just rejecting all “curl” requests that seem to be what Cloudflare is using.

    Update 1

    The first solution that we decide to implement is to

    Change the public IP of the AWS machine and change it in the DNS settings at Cloudflare. In this way we would make sure that only Cloud flare really knows this IP.

    Resolution is – It did not work!

    I know it won’t, because it was another way for support to get me to do anything without really looking into the issue, but I went along with it. Better exhaust this options and be sure.

    The traffic of a Cloudflare attacked machine. Changing the IP address of 03 of September had no effect.

    Update 2

    Adding CF-Connection-IP header

    Cloudflare support was really helpful. They asked me to include CF-Connection-IP in the logs. In this way we would know what is the real IP that is making the requests and if these are in fact Cloudflare machines.

    The header is described at https://support.cloudflare.com/hc/en-us/articles/200170986-How-does-Cloudflare-handle-HTTP-Request-headers-

    I went on and updated the Nginx configuration

    log_format  cloudflare_debug     '$remote_addr - $remote_user [$time_local] "$request" '
                          '$status $body_bytes_sent "$http_referer" '
                          '"$http_user_agent" "$http_x_forwarded_for" "$http_cf_connecting_ip"';
    
    access_log /var/log/nginx/access.log cloudflare_debug;
    

    Now the log file contained the original IP.

    Cloudflare is making GET when client makes a HEAD request

    This is what I found out. The platform has a daily job that checks the machine and makes sure files are ok. This integrity check was left there from times when we had to do it, like years ago. It is still running and is starting every night checking the machine with HEAD requests. But Cloudflare started making GET request at 28 of August 2021 and this increases the traffic to the machine.

    Steps to reproduce

    Here are the steps to reproduce:

    1. I am sending a HEAD request with ‘curl -I’

    2. Cloudflare has not cached the file so there is “cf-cache-status: MISS”

    3. Cloudflare sends a GET request and gets the whole file

    4. Cloudflare responds to the HEAD request.

    5. I send a HEAD request agian with ‘curl -I’

    6. Cloudflare has the file cached and there is a “cf-cache-status: HIT”

    7. The account server is not hit.

    The problem here is that I am sending a HEAD request to my file and Cloudflare is sending a GET request for the whole file in order to cache this file

    Commands to reproduce

    This is a HEAD request:

    $ curl -I https://domain.com/file1.webm
    HTTP/2 200
    date: Sat, 04 Sep 2021 07:09:11 GMT
    content-type: video/webm
    content-length: 2256504
    last-modified: Sat, 04 Jan 2014 14:24:01 GMT
    etag: "52c81981-226e78"
    cache-control: max-age=14400
    cf-cache-status: MISS
    accept-ranges: bytes
    expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
    report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=Xg9TLgssa5Gm6j1fRlJZH8VahaoY21LdCE1W1JqVueu49mzdiTmh9MZp4pFZDsVeSmRg%2Bc%2FMryoN7tgmKUmdxhWzE7UZdVvgG%2FRxHSZ%2FYS6pDtxLwpXSD71jo5ADNyT4TSpKXtE%3D"}],"group":"cf-nel","max_age":604800}
    nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
    server: cloudflare
    cf-ray: 689564111e594ee0-FRA
    alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400

    This is the log right after the HEAD request. Not that I am sending HEAD request to domain.com and Cloudflare is sending GET request for the file.

    162.158.94.236 - - [04/Sep/2021:07:09:12 +0000] "GET /file1.webm HTTP/1.1" 200 2256504 "-" "curl/7.68.0" "188.254.161.195" "188.254.161.195"

    Then I send a second HEAD requests

    $ curl -I https://domain.com/file1.webm
    HTTP/2 200
    date: Sat, 04 Sep 2021 07:09:53 GMT
    content-type: video/webm
    content-length: 2256504
    last-modified: Sat, 04 Jan 2014 14:24:01 GMT
    etag: "52c81981-226e78"
    cache-control: max-age=14400
    cf-cache-status: HIT
    age: 42
    accept-ranges: bytes
    expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
    report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=CKSvpGGHoj5LfV6xXpPUK5kHJtdsX3fylgt%2F2%2B6G94oUsdAd8FnHmUgEUIgnj5dd2Vvsv%2BKQxxgsHdHA0RvpjTxATakFKFuirMeI%2FS3lAdDX5VA0tY74z0CRYEHM2rS%2Fld6K738%3D"}],"group":"cf-nel","max_age":604800}
    nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
    server: cloudflare
    cf-ray: 689565175dffc29f-FRA
    alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400

    And then there is NOTHING in the log file

    Note that for the last HEAD request there is a “cf-cache-status: HIT”.

    Status and how it could be resolved?

    Yes, we are doing HEAD requests every day to the files in order to check that they are all working. Every day we send a HEAD request for every file to make sure all files are up to date. This has been going on for years and is a left over of an integrity check we implemented in 2015.

    What has changed on 28 of August 2021 is that when Cloudflare receives a HEAD request for a file it is sending a GET request to our machine in order to cache the file. This is what has changed and this is generating all the traffic.

    We send HEAD request with ‘curl -I’

    I have 30 weeks of log files that show that Cloudflare was sending HEAD requests like

    I have asked Cloudflare

    Could you please rollback this change in the infrastructure and do not send a GET request to our machine when you receive a HEAD request from a client?

    Let’s see how will this be resolved.

    Up to date conclusion

    Check your machines from time to time. I hope you don’t get in this situation.

    Want to keep in touch? – find me on LinkedIn or Twitter

     
  • kmitov 8:39 am on May 7, 2021 Permalink |
    Tags: csrf, , security   

    [Rails] Implementing an API token for post requests 

    (Everyday Code – instead of keeping our knowledge in a README.md let’s share it with the internet)

    At the BuildIn3D platform we provide clients with API to send certain HTTP POST requests. Questions is – how do we authenticate them.

    Here is one of the authentication steps – we implemented our own build_token. When authenticity_token for CSRF is available we also use the authenticity_token. But it is not always available because the authenticity_token depends on the session and the session cookie. But there might not be a session and a cookie in some cases and yet we still need some authentication. Here is how we do it.

    Generate a Unique encrypted token on the server side

    The server generates a token based on pass params. This could be username or password or other info.

        def to_build_token
          len   = ActiveSupport::MessageEncryptor.key_len
          salt  = Rails.appplicaton.secret_build_token_salt
          key   = ActiveSupport::KeyGenerator.new(Rails.application.secret_key_base).generate_key(salt, len)
          crypt = ActiveSupport::MessageEncryptor.new(key)
          encrypted_data = crypt.encrypt_and_sign(self.build_id)
          Base64.encode64(encrypted_data)
        end

    This will return a new token that has encrypted the build_id.

    encrypted_data = crypt.encrypt_and_sign(self.build_id)
    # We could easily add more things to encrypt, like user, or some params or anything you need to get back as information from the token when it is later submitted

    Given this token we can pass this token to the client. The token could expire after some time.

    We would require the client to send us this token on every request from now on. In this way we know that the client has authenticated with our server.

    Decryption of the token

    What we are trying to extract is the build_id from the token. The token is encrypted so the user can not know the secret information that is the build_id.

    def self.build_id_from_token token
      len   = ActiveSupport::MessageEncryptor.key_len
      salt  = Rails.application.secret_salt_for_build_token
      key   = ActiveSupport::KeyGenerator.new(Rails.application.secret_key_base).generate_key(salt, len)
      crypt = ActiveSupport::MessageEncryptor.new(key)
      crypt.decrypt_and_verify(Base64.decode64(token))
    end

    Requiring the param in each post request

    When a post request is made we should check that the token is available and it was generated from our server. This is with:

      def create
          build_token = params.require("build_token")
          build_id_from_token = Record.build_id_from_token(build_token)
          .... # other logic that now has the buid_id token
      end

    The build token is one of the things we use with the IS at BuildIn3D and FLLCasts.

    Polar bear approves of our security.

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel