Web Protocol Idea
Fri Nov 27 10:41:31 AM UTC 2020
Introduction
I have been reading and exploring the WWW alot since recently, and learning Erlang gave me ideas to write server/client applications as soon as I am done with the learning phase of Erlang. It's been quite a while I thought about this project so I thought I should write a blog to speak out all my design ideas.
I will try to not get too ambitious and limit myself to what I can probably do. The point of this project is to learn more about network (especially TCP) protocols by making my own like I have learned how CPUs and VMs works with the VICERA project.
My goal is to make this protocol simple, as well to understand as to parse by any programming language (as a reference of limitation, shell script. It is an extreme limitation but will help me making something good out of that).
Everthing that I will say here are just design ideas that I wanted to speak out. I didn't plan at all to make this project yet and these design ideas may probably never become a real thing.
This blog will go through 3 parts, the first one will be all about the protocol itself and the second one will be all about the language to format pages. The last will be about some standards I thought about and also about the possible implementations of this protocol.
Have a good read!
Web Protocol Design
KISS: This protocol will be dead simple to parse as well as dead simple to read. The header will stand in only one line (or maybe two, will see later as the design is not completely clear yet). The current format is
-- Request --
PROTOCOL HOSTNAME URL KEYWORDS...
[ content... ]
-- Response --
PROTOCOL/VERSION STATUS HOSTNAME URL KEYWORDS...
[ content... ]
PROTOCOL/VERSION
will describe the name of the protocol and the version. This
can help in case the browser is outdated and do not support the latest protocol.
STATUS
is like in HTTP, an integer defining the current status. However, to
simplify everything, the status will not be followed with a message like in
HTTP like this: 200 OK
or 502 Bad Gateway
. I will also try and reduce the
number of error codes as much as I can.
To provide a convenient way to include multiple subdomains without having to
have a different IP address on each subdomain (or using some proxy software or
whatever), I have included HOSTNAME
as it will specify the hostname that the
client would like to request to.
URL
is like on HTTP, the path to the file you want to request.
KEYWORDS...
are specifications of the request, letting the client know what
are they currently having. See the example below.
For example, the client requests a page at test.h3liu.ml/test.bin
. The server
will have to specify to the client that test.bin
is not a webpage but a
compressed octet stream. The request response will then look like this:
NOTE In this example we will assume that the status code equivalent to HTTP's "OK" is 200.
PROTOCOL/1.0 200 test.h3liu.ml /test.bin binary compressed
[ Yum data... ]
For now I only thought about a few keywords, which are
binary page text compressed api post get
tui
All unknown keywords must be ignored. If keywords don't provide enough information about the request/response, abort.
API system: HTTP APIs are widely used and I find it very convenient to make
all kind of online applications. First off, I am getting rid of this kind of
ugly format: /url/page?blah=blahblah
. All the data will be specified inside
the request using plain text or using the Markup language designed for this
protocol (We will talk about it later).
For instance, let's say we have made an awesome search engine using this new protocol and one person could request a search query like this:
PROTOCOL awesomesearch.com /search api get page
[[ "Cheap VPS" ] search ] body
And then the search engine returns what you wanted to have
PROTOCOL/1.0 awesomesearch.com /search api page
[
[ Result 1 data... ] result
[ Result 2 data... ] result
...
] body
By the way, api
specifies it's an API and page
specifies the markup
language. You could for instance make a plain text API and specify it like that
PROTOCOL/1.0 awesomesearch.com /search api text
Result 1 data
Result 2 data
...
Pages What would be a web protocol without any web pages? Nothing much. For pages, like the API, we provide a markup language, stylesheet and simple preprocessor using one language. Like in HTML, a page will contain a body and a head, the head will contain meta data and the body will contain the content ready to be displayed.
Here is an example response for a web page:
PROTOCOL/1.0 h3liu.ml /index page
[
[ "Welcome!" ] title
[ "Welcome to my website" ] description
] head
[
[ "Hiya!" ] big-text
[ "Welcome to my awesome page" ] paragraph
( Example of a preprocessor )
[ [ "This browser is TUI" ] paragraph ] if-tui
] body
The preprocessor will be simple and turing-incomplete, allowing to do some basic conditional stuff and such. Client-side scripting is not going to be a thing, so no JavaScript or client-side scripting. A client could still implement client-side scripting out the language as it is totally possible to do so but will result to be a non-standard client.
I won't prevent anyone from non-standard use of the protocol or the language as people do whatever they want out of it. It is just not guaranteed that one page will work on every client.
File transfer There are keywords for that: image
and binary
(for now).
To reduce download time, the keyword compressed
will probably be a thing. This
keyword tells the client that the current response is gzip-compressed.
Language Design
As I have said earlier, the protocol comes with a language that works as API data structure, markup language, stylesheet and preprocessor.
You probably have seen earlier that the preprocessor looks like a stack-based language due to it's reverse polish notation. Well this early design of the language will be inspired of Forth and Adobe's PostScript. Everything will be stack-based as I find a convienient use for it in this case.
My goal is to make a language that can fill the purpose in one language the purpose of 3 different languages in HTTP (PHP, HTML and CSS).
At the first place I thought about making it a Lisp dialect but eh, turns out it's better as a stack-based Forth-like language (Everything is seperated by blanks!).
The language has different data types: Datapacks, words, numbers and strings.
Datapacks are between [ and ], I took this from the FALSE esoteric programming language. It can contain data as well it can contain code (like a lambda). Then, a word could just pop a datapack from the stack to get what it needs.
Example : [ "Hello" [ ", World!" ] bold ] body
Words are like in Forth, definitions. There will have a few primitives that will be included in the standard but rest will be defined using the language.
Words can contain any type of data except words. Which can become handy for repetitive tasks. For instance, let's say I have to make a site that says "Hello, (name)" for multiple names, I would do
[ "Hello, " pop cr ] "hello-name" define
[
"John" hello-name
"Matthilde" hello-name
...
] body
pop
is one of the primitives, allowing you to pop something from the stack
inside a datapack. This can become handy to pass arguments inside user-defined
words.
Numbers are going to be handy for a few preprocessor words and stylesheet.
You can also use them in pages to display, but I recommend using a string
instead if it doesn't require to do any arithemtics. For instance, let's say
the word code
is a formatting word for pages to display code in a monospace
font. It would look like this :
[ 12 "monospace" font ] "code" style-def
Strings are just string, handy to input content in pages or define a word.
Uses for page formatting
As I said earlier, this language is an all-in-one language for markup, stylesheet and preprocessor. This part will cover the use for markup and page formatting.
All pages are divided in two parts, head and body. The head is not necessary but recommended to provide metadata to the client such as the title or the description of the page. The client must receive a program containing only "style words" and stylesheet. Preprocessing words and unknown words should be ignored at this point.
Here are some words I thought about for now:
bold italic underline title description meta
link code h1 h2 h3
Uses for stylesheet
The stylesheet will allow us to make our page look more beautiful. The stylesheet system should be able to provide a maximum of cross-compatibility between GUI and TUI clients, providing the same comfort on a GUI as well as on a TUI application because terminals are cool.
As mentionned before, the principal word will be style-def
, allowing you to
provide a style definition. It will also provides other words as well to tell
the client how to format one style word (obviously). Here are a few:
font decoration margin padding
Some words will obviously be ignored by TUI clients but it shouldn't change much from GUI.
Preprocessing
The preprocessing is run server-side which will output static markup and stylesheet. It will allow to run finite loops and if conditions, the preprocessor will be designed to be Turing incomplete for the simple reason that it shouldn't be stuck in an infinite loop.
Nothing is much clear so I can't tell more about it.
Implementation and Standards
The server-side will probably be written in Erlang as this lang has is really good at server-side and because Erlang is a cool programming language.
The client-side is not planned yet.
There will also have some standards regarding the use of the protocol so clients can use it and not be confused about what to implement or not implement. Will also help webmasters to know what to do and know what not to do. Here are a few I thought about.
- The client should be able to handle a few primitives and pre-defined words specified by this standard.
- The webmaster should maximize as much as they can the cross-compatibility between TUI and GUI clients and minimize the rendering difference between these two.
- The server should handle the whole preprocessor, there is no way that there is garbage preprocessor words inside the client response, the client should only receive static content.
- NO CLIENT-SIDE SCRIPTING. It is not required at all.
- Try to avoid providing heavy pages, keep it simple and lightweight.
Conclusion
This is a few ideas I got about a protocol project I thought about because I find the current one kinda complicated when it could be kept simple and lightweight, pages are way too heavy nowadays.
I don't want to replace the current web, just wanna make this project for fun.
If ever I get to implement that, The project will probably be licensed under a FOSS license as well as the standards and pull requests will be open.
Anyways that's all I had to say, see you for another blog!