Caution: my English is far from perfect. (Русский тоже не всегда хорош).

blog

Thursday, 5 September 2024

Scientific Method for News Reading and Political Thought

I drafted this text around 10 years ago, but was not satisfied, hoping to finish it. However didn't have time and energy to complete. So I just post it as as. Maybe will edit later.

Disclaimer: banalities and naivety to be expected.

People fall into all kinds of believes:
one race is superior to another, private property is the root of evil,
Trump is a Russian spy, US 2020 election was forged,
Putin is behind the migrant crisis on Polish / Belarus border,
the financial system is a conspiracy run against us all,
global warming is caused by human activity, COVID-19 pandemic was
intentionally induced by the world rulers.

How to know what to believe?

There is an opinion that the most efficient way of studying the reality is the
Scientific Method, which allowed us in just several hundred years to acquire
more knowledge than in all previous tens of thousand years of human history.

The essence of the Scientific Method is that if we have an explanation
consistent with all the observed facts in the area of interest, this explanation is not
considered true. It's merely a hypothesis. Why? Because there might be other explanations
of the same facts.

[TODO: Illustrations]

After coming up with a hypothesis, scientists then try to test it by looking for
new facts that contradict the hypothesis. Often this is done by conducting experiments
about predictions following from the hypothesis.

Example: If light has mass, then during solar eclipse the starts nearby the Sun
should be observed at positions different than when the Sun is at other
place of the sky, because the Sun's gravity should change the trajectory
of the light from those starts. Try to observe this, if not happens in reality,
the hypothesis is disproved, if happens - it may speak in favor of the hypothesis;
but also may have some other explanation.

Only after the hypothesis sustained extensive testing that way it becomes a theory.
(But never absolute truth).

Someone can ask: "in many cases experiments about global social and
political reality will be impossible, because we don't control it".
But if the scientific method mindset is chosen, clever tricks can be invented.
At first thought, how can one weight the Earth?
But https://en.wikipedia.org/wiki/Cavendish_experiment


Moreover, today's Internet is an ocean of facts. Experiments may be not needed,
just search for facts that can correct your current understanding.

And that's the main practical take-away for me. When reading
about a news topic interesting to me, I intentionally look for information
that contradicts my current view of the topic.

Then comes fact checking of course.

But not just fact checking of the supportive evidence that formed the current view.
It's important to search specifically for contradicting information.

Classic example: if the hypothesis is that all swans are white,
looking for more and more white swans is almost useless.
Instead, we should intentionally search for swans of other colors.

Such corrective facts are the most informative (https://en.wikipedia.org/wiki/Quantities_of_information).
Therefore, the scientific method allows one to maximize
the correctness of his understanding of the reality, while minimizing
the efforts needed.

Being the shortest path to the truth, it's still a difficult
exercise for a single person to study every question up to complete
understanding.

A collaborative medium would probably be useful (a wiki with certain
agreements or even a specialized structured discussion system).

Still, even practiced personally, this approach often helps to easily
reject some false views imposed by propaganda,
and to keep the remaining views critically questioned.

Selective presentation of facts is one of the main methods of
manipulation and propaganda today.
It allows to maintain false picture using true facts (only let
people see facts consistent with this false hypothesis).

Reality

Propaganda: Don't you see? That's a triangle! Go fact-check our reporting!


Another propaganda: Don't you see? That's a rectangle! Go fact-check our reporting!



The Scientific Method allows to consciously navigate the information space,
instead of passively consuming the huge flood of data poured to us
by media, advertisement and one's social media circle.

Maybe the general public does not need to know everything,
if there exists a good leadership with adequate world view and values,
who then uses media for the "Manufacture of consent"
(https://en.wikipedia.org/wiki/Public_Opinion_(book))
based on false explanations, that are easier for public to accept.

But how to be sure the leaders are acting in public interest?

In addition, we know cases in history when the rulers where promoting
crazy ideas and actions.

And I have impression the today's top politicians and decision makers
often start to believe their own propaganda and loose
connection with reality.

In history we also know cases when masses taken by ideas overthrow their rulers.

And today's information technologies allow to amplify false
ideas in social networks, and increase people ability for collective action
and coordination, in particular when pursuing wrong goals.
So the danger of self-induced mass hysteria, xenophobia, instability and violent
uprisings is increasing.


-----------
todo (nuances to cover):
- The most efficient, but still difficult. At least helps to quickly reject many wrong hypotheses,
  and stay questioning the remaining ones.
- general public probably does not need to know everything.
  wise leadership, consent formation.
- should all children know Santa Claus does not exist?
- Mass hysteria, lynch courts
- collecting ideas
- collaborative medium
- shortcomings
  - Pure rational approach
  - "showing the instruments of control"
  - core, fundamental values
  - proportion of attention to different topics
  - are human societies suited to live in knowledge of truth? Or we are better suited to live with illusions?
- irrefutable hypotheses
- genetic algorithms, diversity.
- double blind method for courts







Monday, 26 September 2022

Only Persistent LogIn

Google login dialog does not have the "stay signed-in" checkbox anymore - you session is always persistent. Once logged in, this web browser will have access to your account even after reboot.

 

Monday, 15 August 2022

A robots.txt Problem

To prevent a part of our web application from being scanned by search engines and other web crawlers, we add a robots.txt like

    User-agent: *
    Disallow: /path


It's so simple, what can go wrong?

A real story happened to me.

Turns out, my cloud platform - Google App Engine - has a caching and compression layer between the application and the Internet. It can gzip content for one client, cache it, and then return the same gzipped responses to other clients, even if they haven't specified the Accept-Encoding: gzip header; or even explicitly requested uncompressed content.

This unwise, in my opinion, behaviour is documented here: https://cloud.google.com/appengine/docs/legacy/standard/java/how-requests-are-handled#response_caching

Example:

# Force a gzipped response

$ curl -v -H 'Accept-Encoding: gzip' -H 'User-Agent: gzip' https://yourapp.appspot.com/robots.txt
...
content-encoding: gzip
...
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.


# Now explicitly request uncompressed robots.txt

$ curl -v -H 'Accept-Encoding: identity' https://yourapp.appspot.com/robots.txt
...
content-encoding: gzip
...
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.


(BTW, despite the doc says the default caching duration is 10 minutes, I observed Google App Engine returning gzipped responses for at least 30 minutes).

A web crawler (Dotbot from moz.com) has encountered such a gzipped robots.txt response and was unable to parse it, so considers all the URLs in the app domain as allowed for crawling. Moreover, the crawler caches this gzipped response. All its subsequent requests to robots.txt are conditional (ETag based, I think), and result in 304 Not Modified, thus the crawler continues relying on the gzipped version it cannot parse, and regularly visits the unwanted URLs.

Luckily, the Dotbot clearly identifies itself in the User-Agent header, and they have a working support email, so after a five month communication in a ticket I discovered the reason.

Fixed the Google App Engine behaviour by adding an explicit configuration to the appengine-web.xml:

  <static-files>
    <include path="/**">
      <http-header name="Vary" value="Accept-Encoding"/>
    </include>
    <exclude path="/**.jsp"/>
  </static-files>

Also made a little modification to the robots.txt, to be sure the ETag changes.


Saturday, 1 May 2021

Stackoverflow cookies

Stackoverflow and related sites repeatedly display their cookie confirmation dialog.


So many times I had to press "Customize settings", and there select "Confirm my choices".

 

This time I mistakenly pressed "Accept all cookies".

How to undo that? Why do they show it every time? Will they continue showing it, or it only annoys you until you press "Accept all cookies"?


Monday, 25 March 2019

- Что делал слон, когда пришел Наполеон?
- Уху ел.

Friday, 21 July 2017

копичатка

копичатка - мелкая ошибка, сделанная в результате копи-паста. Напрмер, забыли заменить всё, требующее замены.

Скопировали сточку кода. Переменную, передаваемую в параметре, заменили на нужную, а объект, у которого вызывается метод, забыли заменить.

Monday, 26 September 2016

Partitioned, Available and Consistent?

A database storing a bank account with balance $100 is replicated to two nodes. A network partition happens between the nodes. I am withdrawing $20 cash from an ATM which has access to a replica on one side of the partition. At the same moment I'm charged $10 for hosting and the payment gateway has access the replica on the other side of the partition.

That's OK, the DB nodes process the withdrawal as long as the amount is under a half of the balance at the time of their last synchronization. (Each node can spend up to $50).

Does it qualify as simultaneous availability, consistency and partition tolerance?

Account log says, depending on the node you're connected to, either:

4. final balance: between $30 and $80
3. spent $20
2. network partition 50 / 50
1. initial balance: $100

or

4. final balance: between $40 and $90
3. spent $10
2. network partition 50 / 50
1. initial balance: $100

Thursday, 10 March 2016


Мчится тройка удалая,
Тройка сталинская


Friday, 2 October 2015

Новости международной политики


(Предлагаю смотреть 40 секунд начиная с этого места.)

Saturday, 5 September 2015

Imperative to Functional

How to automatically transform any imperative code into a functional code?

Think for a moment before opening the answer: >> <<

Copy all the data before invoking the imperative code.

That's not a joke. There are techniques (e.g. copy on write) and data structures which can make it efficient. I have no time to write-down all the existing analogies coming to my mind now. What I want to say, the imperative and functional approaches in many important aspects are not that different. We can think of imperative assignment to a variable as of pure function which computes new world where this variable has new value.

Wednesday, 12 August 2015

Javascript: LexicalEnvironment vs VariableEnvironment

One javascript feature I learned while working on POCL.

Consider:

function foo() {
    try {
        throw 'hello'
    } catch (e) {
        return function inner() {return e}
    }
}

foo()()
=> 'hello'

function foo2() {
    try {
        throw 'hello'
    } catch (e) {
        function inner() {return e}
        return inner;
    }
}

foo2()()
Uncaught ReferenceError: e is not defined

Why do the results differ?

In the first case INNER is a function expression, while in the second case INNER is a function declaration.

According to the standard:

Javascript execution context has two Lexical Environments. One is called LexicalEnvironment (he-he), and another is VariableEnvironment [§10.3].

The CATCH statement introduces new LexicalEnvironment where E is defined, and leaves the VariableEnvironment untouched.

Function expression receives current LexicalEnvironment as its scope, while function declaration receives current VariableEnvironment as its scope [§13], therefore E is invisible for the function declaration.

Why everything is so complicated, why two Lexical Environments exist?

My guess it's because we can call function before its declaration:

(function () {
    return inner();
    function inner() {return 1};
})()

=> 1

(function () {
    var x = inner();
    return x;
    function inner() {return 1};
})()

=> 1

(function () {
    var x = inner();
    try {
       throw 'error'
    } catch (unused) {
       function inner() {return 1};
    };
    return x;
})()

=> 1;

To support this behavior, the variable INNER is created and bound to the function object early, when control enters the surrounding function; before any CATCH is executed; and therefore it doesn't see the variable introduced by the CATCH clause.

Unlike normal variables, visible in the scope of the whole enclosing function, the variable introduced by CATCH is only visible inside the CATCH clause:

(function() {
    var x = 1;
    console.log(x);
    try {
        throw 2
    } catch (x) {
        console.log(x)
    }
    console.log(x)
})()

// log output:
1
2
1


That's why (I guess) javascript needs to distinguish LexicalEnvironment and VariableEnvironment.

During years of JS programming I never hit this difference in practice - never captured a catch variable by a closure. I only detected this strange thing when reading the standard in order to implement some code transformations needed for POCL, and was wondering, why does it need to distinguish LexicalEnvironment from VariableEnvironment.

BTW, another case when names are bound within a scope smaller that the enclosing function is the WITH statement. So it has the same problem regarding function expression / function declaration:

var o = {x: 1, y: 2}
(function() {with (o) {return function inner() {return x}}})()()
=> 1

(function() {with (o) {function inner() {return x}; return inner}})()()
Uncaught ReferenceError: x is not defined


Blog Archive