Caution: my English is far from perfect. (Русский тоже не всегда хорош).

blog

Monday 26 September 2022

Only Persistent LogIn

Google login dialog does not have the "stay signed-in" checkbox anymore - you session is always persistent. Once logged in, this web browser will have access to your account even after reboot.

 

Monday 15 August 2022

A robots.txt Problem

To prevent a part of our web application from being scanned by search engines and other web crawlers, we add a robots.txt like

    User-agent: *
    Disallow: /path


It's so simple, what can go wrong?

A real story happened to me.

Turns out, my cloud platform - Google App Engine - has a caching and compression layer between the application and the Internet. It can gzip content for one client, cache it, and then return the same gzipped responses to other clients, even if they haven't specified the Accept-Encoding: gzip header; or even explicitly requested uncompressed content.

This unwise, in my opinion, behaviour is documented here: https://cloud.google.com/appengine/docs/legacy/standard/java/how-requests-are-handled#response_caching

Example:

# Force a gzipped response

$ curl -v -H 'Accept-Encoding: gzip' -H 'User-Agent: gzip' https://yourapp.appspot.com/robots.txt
...
content-encoding: gzip
...
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.


# Now explicitly request uncompressed robots.txt

$ curl -v -H 'Accept-Encoding: identity' https://yourapp.appspot.com/robots.txt
...
content-encoding: gzip
...
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.


(BTW, despite the doc says the default caching duration is 10 minutes, I observed Google App Engine returning gzipped responses for at least 30 minutes).

A web crawler (Dotbot from moz.com) has encountered such a gzipped robots.txt response and was unable to parse it, so considers all the URLs in the app domain as allowed for crawling. Moreover, the crawler caches this gzipped response. All its subsequent requests to robots.txt are conditional (ETag based, I think), and result in 304 Not Modified, thus the crawler continues relying on the gzipped version it cannot parse, and regularly visits the unwanted URLs.

Luckily, the Dotbot clearly identifies itself in the User-Agent header, and they have a working support email, so after a five month communication in a ticket I discovered the reason.

Fixed the Google App Engine behaviour by adding an explicit configuration to the appengine-web.xml:

  <static-files>
    <include path="/**">
      <http-header name="Vary" value="Accept-Encoding"/>
    </include>
    <exclude path="/**.jsp"/>
  </static-files>

Also made a little modification to the robots.txt, to be sure the ETag changes.


Saturday 1 May 2021

Stackoverflow cookies

Stackoverflow and related sites repeatedly display their cookie confirmation dialog.


So many times I had to press "Customize settings", and there select "Confirm my choices".

 

This time I mistakenly pressed "Accept all cookies".

How to undo that? Why do they show it every time? Will they continue showing it, or it only annoys you until you press "Accept all cookies"?


Monday 25 March 2019

- Что делал слон, когда пришел Наполеон?
- Уху ел.

Friday 21 July 2017

копичатка

копичатка - мелкая ошибка, сделанная в результате копи-паста. Напрмер, забыли заменить всё, требующее замены.

Скопировали сточку кода. Переменную, передаваемую в параметре, заменили на нужную, а объект, у которого вызывается метод, забыли заменить.

Monday 26 September 2016

Partitioned, Available and Consistent?

A database storing a bank account with balance $100 is replicated to two nodes. A network partition happens between the nodes. I am withdrawing $20 cash from an ATM which has access to a replica on one side of the partition. At the same moment I'm charged $10 for hosting and the payment gateway has access the replica on the other side of the partition.

That's OK, the DB nodes process the withdrawal as long as the amount is under a half of the balance at the time of their last synchronization. (Each node can spend up to $50).

Does it qualify as simultaneous availability, consistency and partition tolerance?

Account log says, depending on the node you're connected to, either:

4. final balance: between $30 and $80
3. spent $20
2. network partition 50 / 50
1. initial balance: $100

or

4. final balance: between $40 and $90
3. spent $10
2. network partition 50 / 50
1. initial balance: $100

Thursday 10 March 2016


Мчится тройка удалая,
Тройка сталинская


Friday 2 October 2015

Новости международной политики


(Предлагаю смотреть 40 секунд начиная с этого места.)

Saturday 5 September 2015

Imperative to Functional

How to automatically transform any imperative code into a functional code?

Think for a moment before opening the answer: >> <<

Copy all the data before invoking the imperative code.

That's not a joke. There are techniques (e.g. copy on write) and data structures which can make it efficient. I have no time to write-down all the existing analogies coming to my mind now. What I want to say, the imperative and functional approaches in many important aspects are not that different. We can think of imperative assignment to a variable as of pure function which computes new world where this variable has new value.

Wednesday 12 August 2015

Javascript: LexicalEnvironment vs VariableEnvironment

One javascript feature I learned while working on POCL.

Consider:

function foo() {
    try {
        throw 'hello'
    } catch (e) {
        return function inner() {return e}
    }
}

foo()()
=> 'hello'

function foo2() {
    try {
        throw 'hello'
    } catch (e) {
        function inner() {return e}
        return inner;
    }
}

foo2()()
Uncaught ReferenceError: e is not defined

Why do the results differ?

In the first case INNER is a function expression, while in the second case INNER is a function declaration.

According to the standard:

Javascript execution context has two Lexical Environments. One is called LexicalEnvironment (he-he), and another is VariableEnvironment [§10.3].

The CATCH statement introduces new LexicalEnvironment where E is defined, and leaves the VariableEnvironment untouched.

Function expression receives current LexicalEnvironment as its scope, while function declaration receives current VariableEnvironment as its scope [§13], therefore E is invisible for the function declaration.

Why everything is so complicated, why two Lexical Environments exist?

My guess it's because we can call function before its declaration:

(function () {
    return inner();
    function inner() {return 1};
})()

=> 1

(function () {
    var x = inner();
    return x;
    function inner() {return 1};
})()

=> 1

(function () {
    var x = inner();
    try {
       throw 'error'
    } catch (unused) {
       function inner() {return 1};
    };
    return x;
})()

=> 1;

To support this behavior, the variable INNER is created and bound to the function object early, when control enters the surrounding function; before any CATCH is executed; and therefore it doesn't see the variable introduced by the CATCH clause.

Unlike normal variables, visible in the scope of the whole enclosing function, the variable introduced by CATCH is only visible inside the CATCH clause:

(function() {
    var x = 1;
    console.log(x);
    try {
        throw 2
    } catch (x) {
        console.log(x)
    }
    console.log(x)
})()

// log output:
1
2
1


That's why (I guess) javascript needs to distinguish LexicalEnvironment and VariableEnvironment.

During years of JS programming I never hit this difference in practice - never captured a catch variable by a closure. I only detected this strange thing when reading the standard in order to implement some code transformations needed for POCL, and was wondering, why does it need to distinguish LexicalEnvironment from VariableEnvironment.

BTW, another case when names are bound within a scope smaller that the enclosing function is the WITH statement. So it has the same problem regarding function expression / function declaration:

var o = {x: 1, y: 2}
(function() {with (o) {return function inner() {return x}}})()()
=> 1

(function() {with (o) {function inner() {return x}; return inner}})()()
Uncaught ReferenceError: x is not defined


Friday 7 August 2015

Predictive Optimizing Code Loading

An idea I kept in mind for several years. Finally experimented with it: https://github.com/avodonosov/pocl.

I consider the experiment successful. It could be a useful technique of application acceleration.

Saturday 20 December 2014

Web Design Advice

Don't just use gray text on white background. If you really want to make your text difficult to read, you will achieve even better results with white text on white background.

Upd: I am not alone who thinks so: http://contrastrebellion.com/

Friday 22 August 2014

Semantic Versioning is Not the Solution

People often think they can introduce incompatible changes in their library API, and just increase major version number, as semantic versioning proposes, to save the library clients from problems.

It is not true.

Consider a dependency tree:
my-application
  web-server 1.1.1
    commons-logging 1.1.1
  db-client 1.1.1
    commons-logging 1.1.1
  authentication 1.1.1
    commons-logging 1.1.1
Now commons-logging changes its API incompatibly and is released as commons-logging 2.0.1. Authentication adopts commons-logging 2.0.1 while other libraries still depend on 1.1.1:
my-application
  web-server 1.1.1
    commons-logging 1.1.1
  db-client 1.1.1
    commons-logging 1.1.1
  authentication 1.1.2
    commons-logging 2.0.1
Now my-application is broken, because the dependency tree includes two versions of commons-logging which share packages, class/functions names, and thus can not be loaded simultaneously.

When you release an incompatible API this way, you essentially split the world of dependent libraries into two parts: the ones depending on the old version, and ones depending in new version. Libraries from the first part can not be used together with libraries from the second part.

A better way to introduce incompatible API is to release it as a new library, for example commons-logging2, or new-logging. Make it possible to use the new library simultaneously with the old one, e.g. it should have new package name.
Doing so will protect clients in majority of cases.

If we are releasing new library for new API, there is no need for such a thing as "major version number".

NB: in some module managers, most notably in javascript, there are no global package/class names on which different versions of a library can interfere. But in majority of programming languages that problem exists.

Saturday 25 January 2014

Какие похожие картинки. Очень надеюсь, что до похожего результата дело не дойдет.







Blog Archive