CS/Web

HTTP 완벽 가이드 (O'Reilly - HTTP: The Definitive Guide Part 1 - 5)

daylee de vel 2021. 6. 27. 23:03

HTTP: The Definitive Guide 를 읽고 핵심 요약한 포스팅입니다.

Index
• Chapter 1 is a rapid-paced overview of HTTP.
HTTP, Clients and Serves
Resources: MIME, URIs, URLs, URNs
Transactions
Messages
Connections
Architectural Components of the Web: Proxies, Caches, Gateways, Tunnels, Agents

• Chapter 2 details the formats of uniform resource locators (URLs) and the various types of resources that URLs name across the Internet. It also outlines the evolution to uniform resource names (URNs).
URL, URI, URN
URL Syntax
Relative URLs
Expandomatic URLs
Shady Characters (encoding)
A Sea of Schemes
The Future

• Chapter 3 details how HTTP messages transport web content.
The Flow of Messages
The Parts of a Message
Start Lines
Headers
Entity Bodies
Start Lines: Methods Status Codes
details of Headers

•Chapter 4 Connection Management
TCP Connections
TCP Performance Considerations
HTTP Connection Handling


Part I: HTTP: The Web's Foundation

The world's web browsers, servers, and related web applications all talk to each other through HTTP, the Hypertext Transfer Protocol. HTTP is the common language of the modern global Internet. Because HTTP uses reliable data-transmission protocols, it guarantees that your data will not be damaged or scrambled in transit

HTTP clients and HTTP servers make up the basic components of the World Wide Web. The most common client is a web browser.

신뢰할 수 있는 데이터 전송 HTTP 규약을 이용해 브라우저, 서버, 웹 앱들이 통신할 수 있다. 클라이언트와 서버는 월드 와이드 웹 (WWW)의 기본 구성요소이다.

Resources

Web servers host web resources. A web resource is the source of web content. The simplest kind of web resource is a static file on the web server's filesystem such as HTML files, JPEG image files... However, Resources can also be software programs that generate content on demand. They can show you a live image from a camera, or let you trade stocks, search real estate databases, or buy gifts from online stores.

In summary, a resource is any kind of content source. A file containing your company's sales forecast spreadsheet is a resource. A web gateway to scan your local public library's shelves is a resource. An Internet search engine is a resource.

리소스는 웹 콘텐츠 소스를 제공하는 모든 것이다: 정적인 파일, 실시간으로 데이터를 만들어내는 소프트웨어 프로그램 등 (주식 트레이딩, 이커머스 등등)

MIME (Multipurpose Internet Mail Extensions)

A MIME type is a textual label, represented as a primary object type and a specific subtype, separated by a slash. For example: • An HTML-formatted text document would be labeled with type text/html. • A JPEG version of an image would be image/jpeg.

URIs(a uniform resource identifier)

URIs are like the postal addresses of the Internet, uniquely identifying and locating information resources around the world. For example: http://www.joes-hardware.com/specials/saw-blade.gif shows how the URI specifies the HTTP protocol to access the saw-blade GIF resource on Joe's store's server.

URIs come in two flavors, called URLs and URNs

URI는 인터넷에서의 우편주소와 같다. 리소스를 식별하고 위치를 알려준다.

Figure 1-4. URLs specify protocol, server, and local resource

URLs(uniform resource locator)

URLs describe the specific location of a resource on a particular server. They tell you exactly how to fetch a resource from a precise, fixed location.

Most URLs follow a standardized format of three main parts: • The first part of the URL is called the scheme, and it describes the protocol used to access the resource. This is usually the HTTP protocol (http:// ). • The second part gives the server Internet address (e.g., www.joes-hardware.com)..) • The rest names a resource on the web server (e.g., /specials/saw-blade.gif ).

URL은 특정 리소스의 장소를 표현한다. 일정한 규칙을 따르는데, 프로토콜을 표현하는 scheme, 서버 인터넷 주소, path: 웹서버에서의 리소스 주소로 이루어져있다.

URNs(uniform resource name)

A URN serves as a unique name for a particular piece of content, independent of where the resource currently resides. These locationindependent URNs allow resources to move from place to place.

For example, the following URN might be used to name the Internet standards document "RFC 2141" regardless of where it resides (it may even be copied in several places)

URNs are still experimental and not yet widely adopted. To work effectively, URNs need a supporting infrastructure to resolve resource locations; the lack of such an infrastructure has also slowed their adoption. But URNs do hold some exciting promise for the future

URN은 유니크한 이름으로 장소의 제약이 없으나, 리소스의 위치를 알기 위해서는 특별한 인프라스트럭쳐를 필요로 한다.

Transactions

An HTTP transaction consists of a request command (sent from client to server), and a response result (sent from the server back to the client). This communication happens with formatted blocks of data called HTTP messages

Every HTTP response message comes back with a status code. HTTP also sends an explanatory textual "reason phrase"

A "web page" often is a collection of resources, not a single resource.

HTTP 트랜잭션은 HTTP 메세지를 통하며, 요청request와 응답response으로 이루진다. 모든 response는 상태코드status code를 포함한다. 웹 페이지는 많은 리소스의 콜렉션/모음이다.

HTTP Messages

HTTP messages have a simple, line-oriented text structure

HTTP messages consist of three parts:

Start line indicating what to do for a request or what happened for a response.

Header Each header field consists of a name and a value, separated by a colon (:) for easy parsing.

Body containing any kind of data. Request bodies carry data to the web server; response bodies carry data back to the client. the body can contain arbitrary binary data (e.g., images, videos, audio tracks, software applications). Of course, the body can also contain text.

HTTP 메세지는 텍스트이고, 스타트라인, 헤더, 바디로 구성된다. 바디에는 이진 데이터, 텍스트 등의 데이터가 담긴다.

Connections

how messages move from place to place, across Transmission Control Protocol (TCP) connections.

TCP/IP

HTTP is an application layer protocol; instead, it leaves the details of networking to TCP/IP, the popular reliable Internet transport protocol.

The Internet itself is based on TCP/IP, a popular layered set of packet-switched network protocols spoken by computers and network devices around the world. Once a TCP connection is established, messages exchanged between the client and server computers will never be lost, damaged, or received out of order.

In networking terms, the HTTP protocol is layered over TCP. HTTP uses TCP to transport its message data. Likewise, TCP is layered over IP

인터넷 네트워크 연결은 패킷 스위칭 네트워크 프로토콜인 TCP/IP을 이용한다. HTTP는 TCP를 이용해 메세지 데이터를 전송한다.

Connections, IP Addresses, and Port Numbers

HTTP 클라이언트가 TCP를 이용해 메세지를 보내기 위해서는, 서버 컴퓨터의 IP 주소와 서버에있는 소프트웨어 프로그램의 TCP port number가 필요하다. URL에서 두 정보를 찾을 수 있다.

Before an HTTP client can send a message to a server, it needs to establish a TCP/IP connection between the client and server using Internet protocol (IP) addresses and port numbers.

In TCP, you need the IP address of the server computer and the TCP port number associated with the specific software program running on the server.

URLs are the addresses for resources, so naturally enough they can provide us with the IP address for the machine that has the resource. Let's take a look at a few URLs:

http://207.200.83.29:80/index.html - The first URL has the machine's IP address, "207.200.83.29", and port number, "80".

http://www.netscape.com/index.html- ; it has a textual domain name, or hostname ("www.netscape.com"). Hostnames can easily be converted into IP addresses through a facility called the Domain Name Service (DNS)

Here are the steps:
URL: host name -> IP address/ port number - TCP Connection - HTTP Transaction - close, display the document
(a) The browser extracts the server's hostname from the URL.
(b) The browser converts the server's hostname into the server's IP address.
(c) The browser extracts the port number (if any) from the URL.
(d) The browser establishes a TCP connection with the web server.
(e) The browser sends an HTTP request message to the server.
(f) The server sends an HTTP response back to the browser.
(g) The connection is closed, and the browser displays the document

You can use the Telnet utility to talk directly to web servers. Telnet lets you open a TCP connection to a port on a machine and type characters directly into the port. The web server treats you as a web client, and any data sent back on the TCP connection is displayed onscreen.

current Protocol Versions : HTTP/3

Architectural Components of the Web: Proxies, Caches, Gateways, Tunnels, Agents

In this overview chapter, we've focused on how two web applications (web browsers and web servers) send messages back and forth to implement basic transactions. There are many other web applications that you interact with on the Internet.

Proxies HTTP intermediaries that sit between clients and servers. Proxies are often used for security, acting as trusted intermediaries through which all web traffic flows. Proxies can also filter requests and responses; for example, to detect application viruses in corporate downloads or to filter adult content away from elementary-school students.

Caches (A web cache or caching proxy) HTTP storehouses that keep local copies of popular documents close to clients to improve performance

Gateways Special web servers that act as intermediaries for other servers. They are often used to convert HTTP traffic to another protocol. A gateway always receives requests as if it was the origin server for the resource. The client may not be aware it is communicating with a gateway.

For example, an HTTP/FTP gateway receives requests for FTP URIs via HTTP requests but fetches the documents using the FTP protocol (see Figure 1-13). The resulting document is packed into an HTTP message and sent to the client.

Tunnels Special proxies that blindly forward HTTP communications. One popular use of HTTP tunnels is to carry encrypted Secure Sockets Layer (SSL) traffic through an HTTP connection, allowing SSL traffic through corporate firewalls that permit only web traffic.

Agents Semi-intelligent web clients that make automated HTTP requests. So far, we've talked about only one kind of HTTP agent: web browsers. Plus, there are machine-automated user agents "web robots" and automated search engine "spiders" are agents, fetching web pages around the world.

Chapter 2. URLs and Resources p.25

Uniform resource locators (URLs) are the standardized names for the Internet's resources. URLs have provided a means for applications to be aware of how to access a resource.

a more general class of resource identifier called a uniform resource identifier, or URI. URLs identify resources by describing where resources are located, whereas URNs identify resources by name, regardless of where they currently reside.

URL은 앱이 자원에 접근할 수 있는 방식을 알려준다. URI는 URL, URN을 포함하는 넓은 개념이다.

"scheme://server location/path" structure.

1) Scheme: http part. Tells a web client how to access the resources by using appropriate protocol to access a server e.g. use the HTTP protocol

2) Server Location|Host: where the resource is hosted e.g. www.joes-hardware.com

3) Resource Path: what particular local resource on the server requested e.g. /seasonal/index-fall.html

URL Syntax

resources can be accessed by different schemes (e.g., HTTP, FTP, SMTP)

Most URL schemes base their URL syntax on this nine-part general format:

Parameters:

to specify input parameters, seperated by ';'

This component is just a list of name/value pairs in the URL, separated from the rest of the URL (and from each other) by ";" characters.

http://www.joeshardware.com/hammers;sale=false/index.html;graphics=true

In this example there are two path segments, hammers and index.html. The hammers path segment has the param sale, and its value is false. The index.html segment has the param graphics, and its value is true.

Query Strings:

to pass parameters to active apps(DB, boards, search engines, internet gateways), **seperated by'?'**

Some resources, such as database services, can be asked questions or queries to narrow down the type of resource being requested.

http://www.joes-hardware.com/inventory-check.cgi?item=12731

What is new is everything to the right of the question mark (?). The query component of the URL is passed along to a gateway resource, with the path component of the URL identifying the gateway resource. Basically, gateways can be thought of as access points to other applications

many gateways expect the query string to be formatted as a series of "name=value" pairs, separated by "&" characters: http://www.joes-hardware.com/inventorycheck.cgi?item=12731&color=blue In this example, there are two name/value pairs in the query component: item=12731 and color=blue.

query는 파라미터를 앱(DB, 보드, 서치엔진, 인터넷 게이트웨이 등)에 넘겨준다.

query 부분은 gateway 리소스에 전해진다. 앞의 path 부분은 gateway의 리소스를 확인한다. gateway는 다른 앱의 접근 포인트이다. (문지기?)

Fragments '#'

To allow referencing of parts or fragments of a resource, URLs support a frag component to identify pieces within a resource. A fragment preceded by a # character. For example: http://www.joes-hardware.com/tools.html#drills

The URL fragment is used only by the client, because the server deals with entire objects

URL의 구성은 크게 9가지로 나눌 수 있다. 스킴, 호스트, 포트, 패쓰 이외에 파라미터, 쿼리 스트링, 프래그먼트 등으로 구성된다.

Relative URLs

Compared to an absolute URL(full URL), To get all the information needed to access a resource from a relative URL, you must interpret it relative to another URL, called its base.

Applications that process URLs (such as your browser) need to be able to convert between relative and absolute URLs.

Expandomatic URLs

Some browsers try to expand URLs automatically, either after you submit the URL or while you're typing. This provides users with a shortcut

*1)Hostname expansion *

For example if you type "yahoo" in the address box, your browser can automatically insert "www." and ".com" onto the hostname, creating "www.yahoo.com".

2)History expansion

URLs that you have visited in the past. Be aware that URL auto-expansion may behave differently when used with proxies.

브라우저는 URL을 자동완성 시킬 수 있는데, 1) 호스트 이름만 적으면 자동으로 www, .com을 붙여주거나 2) 과거 히스토리를 바탕으로 제안한다.

Shady Characters p.35

This section summarizes the universal alphabet and encoding rules for URLs.

URLs are permitted to contain only characters from a relatively small, universally safe alphabet.

In addition to wanting URLs to be transportable by all Internet protocols, designers wanted them to be readable by people. So invisible, nonprinting characters also are prohibited in URLs

So, an escape mechanism was added, allowing unsafe characters to be encoded into safe characters for transport.

이 섹션은 URL을 구성하는 글자와 암호화 방식을 설명한다. URL은 안전한 형식, 즉 알파벳으로 구성되어야한다. URL이 모든 프로토콜에 의해 전송되고, 사람이 읽을 수 있도록 하기 위해 보이지 않거나, 프린팅되지않는(?) 문자들은 사용하지 않는다. 하지만 이를 피해서 허용하지 않는 문자도 안전하게 사용할 수 있도록 암호화 하는 escape 방식이 고안되었다.

The URL Character Set

Historically, many computer applications have used the US-ASCII character set. US-ASCII uses 7 bits to represent most keys available on an English typewriter; However, it doesn't support the inflected characters common in European languages or the hundreds of nonRomanic languages

Furthermore, some URLs may need to contain arbitrary binary data. Recognizing the need for completeness, the URL designers have incorporated escape sequences. Escape sequences allow the encoding of arbitrary character values or data using a restricted subset of the US-ASCII character set, yielding portability and completeness.

대부분 컴퓨터에서 사용되어온 영어 알파벳 표기법 US-ASCII 문자열은 다른 언어를 표기할 수 없는 한계점이 있다. URL에서 이진데이터를 사용해야 한다면 Escape sequence를 이용해 사용이 제한된 문자들을 사용할 수 있다.

Encoding Mechanisms

To get around the limitations of a safe character set representation, an encoding scheme was devised to represent characters in a URL that are not safe. The encoding simply represents the unsafe character by an "escape" notation, consisting of a percent sign (%) followed by two hexadecimal digits that represent the ASCII code of the character

암호화는 ASCII 코드로 제한된 문자를 치환해서 사용하는 방식이다. e.g. % -> 25

Character Restrictions

Several characters have been reserved to have special meaning inside of a URL. Others are not in the defined US-ASCII printable set. And still others are known to confuse some Internet gateways and protocols, so their use is discouraged.

A Bit More (? - confusing part)

http://www.joes-hardware.com/~joe and not encode the "~" character. For some transport protocols this is not an issue, but it is still unwise for application developers not to encode unsafe characters.

each component of the URL may have its own safe/unsafe characters, and which characters are safe/unsafe is scheme-dependent, only the application receiving the URL from the user really is in a position to determine what needs to be encoded.

Sometimes, malicious folks encode extra characters in an attempt to get around applications that are doing pattern matching on URLs—for example, web filtering applications. Encoding safe URL components can cause pattern-matching applications to fail to recognize the patterns for which they are searching. In general, applications interpreting URLs must decode the URLs before processing them.

A Sea of Schemes p.38

e.g

The Future

URLs are really addresses, not true names. This means that a URL tells you where something is located, for the moment. The downfall of this scheme is that if the resource is moved, the URL is no longer valid. And at that point, it provides no way to locate the object.

The Internet Engineering Task Force (IETF) has been working on a new standard, uniform resource names (URNs). URNs provide a stable name for an object, regardless of where that object moves (either inside a web server or across web servers).
Persistent uniform resource locators (PURLs) are an example of how URN functionality can be achieved using URLs. . PURLs use a resource locator server to name the current location of a resource

https://archive.org/services/purl/

URL은 주소이지 이름이 아니다. 따라서 리소스가 이동하면 더이상 주소는 유효하지 않다. 이 문제를 해결하기 위해서 URN과 PURL이 대두되고있다. URN은 장소와 상관없이 이름으로 리소스를 찾는 방식이고, PURL은 URN방식을 URL을 이용해 달성하고자 한다.

If Not Now, When?

Support for URNs will require many changes—consensus from the standards bodies, modifications to various HTTP applications, etc.

Currently, and for the foreseeable future, URLs are the way to name resources on the Internet. it is likely that new standards (possibly URNs) will emerge and be deployed to address some of these limitations.

URN으로 변화하면서 위에서 살펴본 URL의 제한점을 보완해줄 수 있을 것으로 기대한다.

Chapter 3. HTTP Messages (p. 43)

HTTP messages are the blocks of data sent between HTTP applications. These blocks of data begin with some text meta-information describing the message contents and meaning, followed by optional data. These messages flow between clients, servers, and proxies. The terms "inbound," "outbound," "upstream," and "downstream" describe message direction.

HTTP 메세지는 HTTP 어플리케이션 간에 보내진 데이터 블록이다. 클라이언트, 서버, 프록시 간에 흐르는 방향으로 inbound, outbound, upstream, downstream으로 구분한다.

The Flow of Messages

Messages Commute Inbound to the Origin Server

Messages travel inbound to the origin server and outbound back to the client

메세지는 서버로 inbound하고 클라이언트에게 outbound된다.

All messages flow downstream, regardless of whether they are request messages or response messages. The sender of any message is upstream of the receiver.

모든 메세지는 아래로 흐른다. 보낸 사람은 받은 사람의 upstream이다.

The Parts of a Message

The start line and headers are just ASCII text, the body can contain text or binary data or can be empty.

All HTTP messages fall into two types: request messages and response messages

** Message Syntax**

Note that a set of HTTP headers should always end in a blank line (bare CRLF), even if there are no headers and even if there is no entity body.

All HTTP messages begin with a start line. The start line for a request message says what to do. The start line for a response message says what happened.

메세지의 문법은 header는 항상 마지막에 빈 라인으로 끝난다. (bare CRLF)

start line: 요청 메세지는 뭘할건지, 응답 메세지는 무슨일이 일어났는지 말해준다.

Start Lines

Methods

Status code

If you receive a status code that you don't recognize, chances are someone has defined it as an extension to the current protocol. You should treat it as a general member of the class whose range it falls into. For example, if you receive status code 515 (which is outside of the defined range for 5XX codes listed in Table), you should treat the response as indicating a server error

알아보지 못하는 상태코드를 받았다면, 누군가가 현재 프로토콜에 확장해서 정의한 경우일 것이다.

Reason phrases

a human-readable version of the status code that application developers can pass along to their users to indicate what happened during the request.

Version numbers

Version numbers are intended to provide applications speaking HTTP with a clue about each other's capabilities and the format of the message. An HTTP Version 1.2 application communicating with an HTTP Version 1.1 application should know that it should not use any new 1.2 features, as they likely are not implemented by the application speaking the older version of the protocol.

when comparing HTTP versions, each number must be compared separately in order to determine which is the higher version. For example, HTTP/2.22 is a higher version than HTTP/2.3, because 22 is a larger number than 3.

버전은 capapbilities와 메세지 포맷을 알려준다.

버전을 비교할 때 온점으로 나뉜 각각의 숫자를 따로 비교해야한다. 2.22 는 2.3 보다 높은 버젼이다. 22>3

Headers

Each HTTP header has a simple syntax: a name, followed by a colon (:), followed by optional whitespace, followed by the field value, followed by a CRLF.

Entity Bodies

HTTP messages can carry many kinds of digital data: images, video, HTML documents, software applications, credit card transactions, electronic mail, and so on.

Entity Bodies는 디지털 데이터를 담고있다.

------------------------------------------------------------------------------------------------------------------------------

Start Lines: Methods

the methods most likely have restricted uses. servers that support DELETE or PUT (described later in this section) would not want just anyone to be able to delete or store resources. These restrictions generally are set up in the server's configuration, so they vary from site to site and from server to server.

Safe Methods: GET, HEAD

By no action, we mean that nothing will happen on the server as a result of the HTTP request

There is no guarantee that a safe method won't cause an action to be performed (in practice, that is up to the web developers). Safe methods are meant to allow HTTP application developers to let users know when an unsafe method that may cause some action to be performed is being used. In our Joe's Hardware example, your web browser may pop up a warning message letting you know that you are making a request with an unsafe method and that, as a result, something might happen on the server (e.g., your credit card being charged).

안전한 method는 요청의 결과로 서버에 아무일도 일어나지 않는 것이다. 디벨로퍼는 유저에게 unsafe한 (요청의 결과로서버에 변경되는 사항이 있을 때) 요청에 대해 알려줘야 한다. (예시, 이 요청의 결과로 당신의 신용카드에서 결제를 할 것이다.)

GET method

It usually is used to ask a server to send a resource.

HEAD method

The HEAD method behaves exactly like the GET method, but the server returns only the headers in the response. No entity body is ever returned

use case: • _Find out about a resource (e.g., determine its type) without getting it. • See if an object exists, by looking at the status code of the response. • Test if the resource has been modified, by looking at the headers. _

Server developers must ensure that the headers returned are exactly those that a GET request would return. The HEAD method also is required for HTTP/1.1 compliance.

HEAD method는 GET의 Header부분만 보내준다. 리소스가 있는지 확인하거나, 객체가 있는지 상태 코드를 확인하거나, 리소스가 변경됐는지 여부를 간단히 확인할 때 사용한다.

PUT method

The PUT method writes documents to a server

The semantics of the PUT method are for the server to take the body of the request and either use it to create a new document named by the requested URL or, if that URL already exists, use the body to replace it. Because PUT allows you to change content, many web servers require you to log in with a password before you can perform a PUT.

새로운 문서를 만들거나, 이미 존재하는 문서를 replace 교체 하기 위해 사용한다.

POST method

The POST method was designed to send input data to the server. In practice, it is often used to support HTML forms. The data from a filled-in form typically is sent to the server.

주로 HTML form 을 전달하는데 쓰인다. input 데이터를 서버에 보낸다.

TRACE method

When a client makes a request, that request may have to travel through firewalls, proxies, gateways, or other applications. Each of these has the opportunity to modify the original HTTP request. The TRACE method allows clients to see how its request looks when it finally makes it to the server.

The TRACE method is used primarily for diagnostics; i.e., verifying that requests are going through the request/response chain as intended. It's also a good tool for seeing the effects of proxies and other applications on your requests.

request가 방화벽, 프록시, 게이트웨이, 다른 앱을 통해 전달될 때 TRACE method를 사용한다. 주로 진단을 위해 사용한다.

OPTIONS method

The OPTIONS method asks the server to tell us about the various supported capabilities of the web server.

This provides a means for client applications to determine how best to access various resources without actually having to access them.

웹 서버가 지원하는 capabilities 를 물어보기 위해 쓴다. 클라이언트 앱이 실제로 리소스를 요청하지 않고도, 어떻게 리소스에 접근하는게 좋을지 결정할 수 있도록한다.

DELETE method

it asks the server to delete the resources specified by the request URL. However, the client application is not guaranteed that the delete is carried out. This is because the HTTP specification allows the server to override the request without telling the client.

DELETE method는 클랑이언트에게 실제로 삭제가 이루어졌는지 보장하진 않는다. HTTP specification은 서버가 클라이언트에게 알리지않아도, 요청을 덮어쓸 수 있도록 허용하기 때문.

Extension Methods

They provide developers with a means of extending the capabilities of the HTTP services their servers implement on the resources that the servers manage.

It's important to note that not all extension methods are defined in a formal specification. If you define an extension method, it's likely not to be understood by most HTTP applications. Likewise, it's possible that your HTTP applications could run into extension methods being used by other applications that it does not understand.

In these cases, it is best to be tolerant of extension methods. Proxies should try to relay messages with unknown methods through to downstream servers if they are capable of doing that without breaking end-to-end behavior. Otherwise, they should respond with a 501 Not Implemented status code. Dealing with extension methods (and HTTP extensions in general) is best done with the old rule, "be conservative in what you send, be liberal in what you accept."

Extension method는 디벨로퍼에게 HTTP 서비스의 능력을 확장할 수 있도록 해준다. 하지만 일반적으로 쓰이지 않는 method들은 통신에 혼란을 초래할 수 있다.

*Rule of thumb: 보낼 때는 보수적으로, 받을 때는 자유롭게

WebDAV HTTP extension

Status Codes p.59

100-199: Informational Status Codes

It's intended to optimize the case where an HTTP client application has an entity body to send to a server but wants to check that the server will accept the entity before it sends it. it tends to confuse HTTP programmers.

200-299: Success Status Codes

300-399: Redirection Status Codes: If a resource has moved, a redirection status code and an optional Location header can be sent to tell the client that the resource has moved and where it can now be found

400-499: Client Error Status Codes: Sometimes a client sends something that a server just can't handle, such as a badly formed request message or, most often, a request for a URL that does not exist. 404 Not Found error

500-599: Server Error Status Codes: the server itself has an error. Proxies often run into problems when trying to talk to servers on a client's behalf. P

Headers

General headers

Request headers : For example, the following Accept header tells the server that the client will accept any media type that matches its request: Accept: */*

Response headers

Entity headers: Because both request and response messages can contain entities, these headers can appear in either type of message.

For example, the following Content-Type header lets the application know that the data is an HTML document in the iso-latin-1 character set: Content-Type: text/html; charset=iso-latin-1

Chapter 4. Connection Management p.74

• How HTTP uses TCP connections • Delays, bottlenecks and clogs in TCP connections • HTTP optimizations, including parallel, keep-alive, and pipelined connections • Dos and don'ts for managing connections

TCP Connections

Web browsers talk to web servers over TCP connections

Just about all of the world's HTTP communication is carried over TCP/IP, a popular layered set of packet-switched network protocols spoken by computers and network devices around the globe. A client application can open a TCP/IP connection to a server application, running just about anywhere in the world. Once the connection is established, messages exchanged between the client's and server's computers will never be lost, damaged, or received out of order

TCP/IP는 패킷 스위치 네트워크 프로토콜이다. 클라이언트 어플리케이션은 서버 어플리케이션에게 TCP/IP 커넥션을 열수있다. 일단 커넥션이 만들어지면 메세지는 안전하게 순서대로 전달된다.

Steps TCP 커넥션 만드는 순서

When given a URL, your browser performs the steps shown in Figure 4-1.
In Steps 1-3, the IP address and port number of the server are pulled from the URL. 브라우저는 DNS를 이용해 URL에서 IP주소와 포트넘버를 가져온다.
A TCP connection is made to the web server in Step 4, 브라우저는 웹서버에 TCP 연결을 만든다.
and a request message is sent across the connection in Step 5. 브라우저는 요청 메세지를 커넥션을 통해 보낸다.
The response is read in Step 6, 응답을 받으면 커넥션은 종료된다.
and the connection is closed in Step 7.

4-1

TCP Reliable Data Pipes

HTTP connections really are nothing more than TCP connections. To send data accurately and quickly, you need to know the basics of TCP. (If you are trying to write sophisticated HTTP applications, and especially if you want them to be fast, you'll want to learn a lot more about the internals and performance of TCP than we discuss in this chapter. We recommend the "TCP/IP Illustrated" books by W. Richard Stevens (Addison Wesley). )

TCP gives HTTP a reliable bit pipe. TCP carries HTTP data in order, and without corruption

TCP는 HTTP 데이터를 순서대로, 오염없이, 안정적으로 전송한다.

TCP Streams Are Segmented and Shipped by IP Packets

TCP sends its data in little chunks called IP packets (or IP datagrams). In this way, HTTP is the top layer in a "protocol stack" of "HTTP over TCP over IP," as depicted in Figure 4-3a. A secure variant, HTTPS, inserts a cryptographic encryption layer (called TLS or SSL) between HTTP and TCP (Figure 4-3b).

TCP는 데이터를 보낼 때 잘게 잘라보낸다. IP 패킷(또는 IP데이터그램)은 여러 TCP segments을 포함한다. 자세한 구조는 아래 참고.

프로토콜 스택은 HTTP, TCP, IP가 차곡차곡 쌓인 형태이다. HTTPS는 HTTP와 TCP중간에 SSL을 추가해 데이터 안전성을 높인다.

IP packets carry TCP segments, which carry chunks of the TCP data stream

*IP 패킷은 TCP 세그먼트들로 이루어져있고, TCP 세그먼트는 TCP 데이터 스트림 조각을 포함한다. *

figure 4-4

When HTTP wants to transmit a message, it streams the contents of the message data, in order, through an open TCP connection. TCP takes the stream of data, chops up the data stream into chunks called segments, and transports the segments across the Internet inside envelopes called IP packets (see Figure 4-4). This is all handled by the TCP/IP software; the HTTP programmer sees none of it. Each TCP segment is carried by an IP packet from one IP address to another IP address.

포함관계: <---인터넷--- IP packets (봉투)> segments(데이터 조각들로 쪼갠다) >(open TCP connection) HTTP messages

Each of these IP packets contains: • An IP packet header (usually 20 bytes) • A TCP segment header (usually 20 bytes) • A chunk of TCP data (0 or more bytes)

The IP header contains the source and destination IP addresses, the size, and other flags. The TCP segment header contains TCP port numbers, TCP control flags, and numeric values used for data ordering and integrity checking.

IP헤더에는 출발, 도착 IP주소, 사이즈, flags를 표시한다. TCP segments 헤더에는 TCP 포트넘버, 컨트롤 flag, 데이터 순서 정렬을 위한 ordering과 데이터 무결성 체크integrity check를 위한 숫자 값이 적혀있다.

Keeping TCP Connections Straight p.77

A computer might have several TCP connections open at any one time. TCP keeps all these connections straight through port numbers. the IP address gets you to the right computer and the port number gets you to the right application.

A TCP connection is distinguished by four values:

TCP connection values

컴퓨터는 여러개의 TCP 커넥션을 동시에 처리할 수 있다. IP주소로 컴퓨터를 찾을 수 있고, port number로 애플리케이션을 찾는다. TCP connection은 4가지 값으로 구별된다: 소스 IP 주소, 소스 port, 도착지 IP주소, 도착지 port

Programming with TCP Sockets

This sockets API hides all the details of TCP and IP from the HTTP programmer. The sockets API was first developed for the Unix operating system, but variants are now available for almost every operating system and language.

The sockets API lets you create TCP endpoint data structures, connect these endpoints to remote server TCP endpoints, and read and write data streams. The TCP API hides all the details of the underlying network protocol handshaking and the segmentation and reassembly of the TCP data stream to and from IP packets.

sockets API는 TCP, IP의 세세한 정보를 HTTP프로그래머로부터 숨긴다. sockets API는 TCP엔드포인트 데이터 스트럭쳐를 만들어 서버 TCP 엔드포인트에 연결하고 데이터 스트림을 읽고 수정한다. TCP API 기저에는 네트워크 프로토콜 핸드쉐이킹, 세그멘테이션, 재조립 등의 모든 사항이 숨겨져있다.

Establishing a connection can take a while, depending on how far away the server is, the load on the server, and the congestion of the Internet.

TCP Performance Considerations

Because HTTP is layered directly on TCP, the performance of HTTP transactions depends critically on the performance of the underlying TCP plumbing. and you'll be able to design and implement higher-performance HTTP applications.

HTTP Transaction Delays

Unless the client or server is overloaded or executing complex dynamic resources, most HTTP delays are caused by TCP network delays.

1. . If the hostname in the URI was not recently visited, it may take tens of seconds to convert the hostname from a URI into an IP address using the DNS resolution infrastructure

2. Connection setup delay occurs for every new TCP connection. This usually takes at most a second or two, but it can add up quickly when hundreds of HTTP transactions are made.

3. It takes time for the request message to travel over the Internet and get processed by the server.

The magnitude of these TCP network delays depends on hardware speed, the load of the network and server, the size of the request and response messages, and the distance between client and server. The delays also are significantly affected by technical intricacies of the TCP protocol. and The web server then writes back the HTTP response

Performance Focus Areas

If you are writing high-performance HTTP software, you should understand each of these factors. If you don't need this level of performance optimization, feel free to skip ahead.

1. TCP Connection Handshake Delays

When you set up a new TCP connection, even before you send any data, the TCP software exchanges a series of IP packets to negotiate the terms of the connection

1. To request a new TCP connection, the client sends a small TCP packet (usually 40-60 bytes) to the server. The packet has a special "SYN" flag set, which means it's a connection request. SYN= synchronize

2. If the server accepts the connection, it computes some connection parameters and sends a TCP packet back to the client, with both the "SYN" and "ACK" flags set, indicating that the connection request is accepted

3. Finally, the client sends an acknowledgment back to the server, letting it know that the connection was established successfully (see Figure 4-8c). Modern TCP stacks let the client send data in this acknowledgment packet.

The SYN/SYN+ACK handshake (Figure 4-8a and b) creates a measurable delay when HTTP transactions do not exchange much data, as is commonly the case

The end result is that small HTTP transactions may spend 50% or more of their time doing TCP setup.

데이터를 보내기 전에도 IP패킷을 교환하며 새로운 TCP 커넥션의 셋업 조건을 확인한다.

3way handshaking이 작은 HTTP transaction을 하기위한 시간의 50%를 차지하며 딜레이만듬

1)클라이언트가 SYN flag 보냄 - 2)서버가 SYN+ACK 으로 응답 - 3)클라이언트가 ACK와 데이터를 함께 보냄.

2. TCP's delayed acknowledgment algorithm for piggybacked acknowledgments

Because the Internet itself does not guarantee reliable packet delivery (Internet routers are free to destroy packets at will if they are overloaded), TCP implements its own acknowledgment scheme to guarantee successful data delivery.

Each TCP segment gets a sequence number and a data-integrity checksum. The receiver of each segment returns small acknowledgment packets back to the sender when segments have been received intact. If a sender does not receive an acknowledgment within a specified window of time, the sender concludes the packet was destroyed or corrupted and resends the data.

TCP는 데이터의 안정적인 배송을 보장하는 시스템을 가지고있다. 받는이가 잘 받았다고 acknowledgement 안보내면, 일정 시간 후에 보낸이는 데이터가 안보내졌다고 생각하고 다시 데이터를 보낸다.

Because acknowledgments are small, TCP allows them to "piggyback" on outgoing data packets heading in the same direction.

To increase the chances that an acknowledgment will find a data packet headed in the same direction, many TCP stacks implement a "delayed acknowledgment" algorithm. There just aren't many packets heading in the reverse direction when you want them. Frequently, the disabled acknowledgment algorithms introduce significant delays.

acknowledgments는 크기가 작다. 따라서 TCP는 같은 방향으로 나가는 데이터에 acknowledgements가 엎혀갈 수 있도록 한다. (piggyback) 그러나 보통은 같은 방향으로 가는 데이터가 많지 않다. 조금 기다리다가 같은 방향의 데이터가 없으면 acknowledgements를 보내는 딜레이 알고리즘을 적용했는데, 이것이 TCP커넥션 딜레이를 만든다.

3. TCP slow-start congestion control

new connections are slower than "tuned" connections that already have exchanged a modest amount of data. it is used to prevent sudden overloading and congestion of the Internet.

each time a packet is received successfully, the sender gets permission to send two more packets. If an HTTP transaction has a large amount of data to send, it cannot send all the packets at once. It must send one packet and wait for an acknowledgment; then it can send two packets, each of which must be acknowledged, which allows four packets, etc.

새로 만들어진 커넥션은 이전에 성공적으로 데이터를 주고받았던 old 커넥션보다 느리다. TCP가 인터넷 혼잡을 막기 위해 slow-start congestion control을 적용하기 때문이다. 성공적으로 데이터를 받을 때마다 다음번에는 더 많이 보낼 수 있도록 허가를 해준다.

**4. Nagle's algorithm for data aggregation (TCP_NODELAY)**

because each TCP segment carries at least 40 bytes of flags and headers, network performance can be degraded severely if TCP sends large numbers of packets containing small amounts of data.

Nagle's algorithm (named for its creator, John Nagle) attempts to bundle up a large amount of TCP data before sending a packet, aiding network efficiency. Nagle's algorithm discourages the sending of segments that are not full-size (a maximum-size packet is around 1,500 bytes on a LAN, or a few hundred bytes across the Internet).

BUT Nagle's algorithm causes several HTTP performance problems. First, small HTTP messages may not fill a packet, so they may be delayed waiting for additional data that will never arrive. Second, Nagle's algorithm interacts poorly with disabled acknowledgments

HTTP applications often disable Nagle's algorithm to improve performance, by setting the TCP_NODELAY parameter on their stacks. If you do this, you must ensure that you write large chunks of data to TCP so you don't create a flurry of small packets.

인터넷 혼잡 컨트롤을 위해 Nagle's 알고리즘을 사용한다. 패킷을 보내기 전에 최소한의 TCP데이터를 모아서 보내는 방안이다. 하지만 이 방식이 성능에 딜레이를 줄수 있다. 따라서 HTTP애플리케이션에 TCP_NODELAY 파라미터를 세팅해서 Nagle's algorithms을 피해가는 방식을 쓸 수 있다.

5. TIME_WAIT Accumulation and Port Exhaustion

(이해하기 어려움 ㅜㅜ p.84)

TIME_WAIT port exhaustion is a serious performance problem that affects performance benchmarking but is relatively uncommon is real deployments -> what is performance benchmarking ??? -> Benchmarkingis aprocessof measuring theperformance and comparing to ...

Even if you do not suffer port exhaustion problems, be careful about having large numbers of open connections or large numbers of control blocks allocated for connection in wait states. Some operating systems slow down dramatically when there are numerous open connections or control blocks.

대략적인 이해로는... TCP 커넥션을 클로즈 할 때 메모리에 최근 IP 주소와 포트넘버를 기록해두기 때문에 2분동안 또는 더 적은 시간에("2MSL") 같은 IP, port number에 두개의 커넥션이 생기지 않도록 한다. 보통은 문제가 되지 않지만, benchmark test를 할 때 같은 port넘버를 쓸 수 없고, 소스 포트의 숫자는 한정되어있어서 TIME_WAIT port exhaustion 문제를 겪을 수 있다.

HTTP Connection Handling p.85

this chapter explains the HTTP technology for manipulating and optimizing connections.

The Oft-Misunderstood Connection Header

The HTTP Connection header field has a comma-separated list of connection tokens that specify options for the connection that aren't propagated to other connections. For example, a connection that must be closed after sending the next message can be indicated by Connection: close.

The Connection header sometimes is confusing, because it can carry three different types of tokens: • HTTP header field names, listing headers relevant for only this connection • Arbitrary token values, describing nonstandard options for this connection • The value close, indicating the persistent connection will be closed when done

HTTP 커넥션 헤더는 커넥션 토큰을 가지고 있는데 3가지 다른 종류라서 헷갈린다. close를 포함한다.

Serial Transaction Delays

TCP performance delays can add up. If each transaction requires a new connection, the connection and slow-start delays can add up too. HTTP transaction이 새로운 커넥션을 필요로하면 딜레이가 가중된다.

techniques are available to improve HTTP connection performance.

1. Parallel connections: Concurrent HTTP requests across multiple TCP connections 여러 TCP 커넥션을 한번에 연결함

2. Persistent connections: Reusing (already opened)TCP connections to eliminate connect/close delays

기존에 만들어둔 TCP 커넥션을 재활용함

1)HTTP/1.0+ Keep-Alive Connections: deprecated

2) persistent connections.: client assumes an HTTP/1.1 connection will remain open after a response, unless the response contains a Connection: close header.

3. Pipelined connections: Concurrent HTTP requests across a shared TCP connection p.97 and skipped

4. Multiplexed connections: Interleaving chunks of requests and responses (experimental)

Part II: HTTP Architecture

• Chapter 5 gives an overview of web server architectures.

Chapter 7 delves into the science of web caches—devices that improve performance and reduce traffic by making local copies of popular documents.

Chapter 5. Web Servers

Web Servers Come in All Shapes and Sizes

A web server processes HTTP requests and serves responses. The term "web server" can refer either to web server software or to the particular device or computer dedicated to serving the web pages.

웹서버는 소프트웨어나 물리적인 컴퓨터 형태를 가질 수 있다.

Web Server Implementations

Web servers implement HTTP and the related TCP connection handling. They also manage the resources served by the web server and provide administrative features to configure, control, and enhance the web server.

*1) General-Purpose Software Web Servers *
You can choose open source software (such as Apache or W3C's Jigsaw) or commercial software (such as Microsoft's and iPlanet's web servers). Web server software is available for just about every computer and operating system.

2) Web Server Appliances
Web server appliances are prepackaged software/hardware solutions. The vendor preinstalls a software server onto a vendor-chosen computer platform and preconfigures the software. e.g. IBM Whistle web server appliance (GONE)

3) Embedded Web Servers
Embedded servers are tiny web servers intended to be embedded into consumer products (e.g., printers or home appliances).

What Real Web Servers Do

Steps of a basic web server request

Steps of a basic web server request

1. Set up connection—accept a client connection, or close if the client is unwanted.
2. Receive request—read an HTTP request message from the network.
3. Process request—interpret the request message and take action.
4. Access resource—access the resource specified in the message.
5. Construct response—create the HTTP response message with the right headers.
6. Send response—send the response back to the client.
7. Log transaction—place notes about the completed transaction in a log file

Step 1: Accepting Client Connections 클라이언트 커넥션 수락

1) Handling New Connections

When requested by a client, the web server establishes the connection and determines which client is on the other side of the connection, extracting the IP address from the TCP connection. Once a new connection is established and accepted, the server adds the new connection to its list of existing web server connections and prepares to watch for data on the connection.

The web server is free to reject and immediately close any connection. Some web servers close connections because the client IP address or hostname is unauthorized or is a known malicious client.

새로운 커넥션이 요청되면 TCP커넥션에서 IP주소를 추출한다. 요청을 수락하면 서버는 기존의 웹 서버 커넥션 리스트에 새 커넥션을 추가한다.

2) Client Hostname Identification

Most web servers can be configured to convert client IP addresses into client hostnames, using "reverse DNS." Web servers can use the client hostname for detailed access control and logging. Be warned that hostname lookups can take a very long time, slowing down web transactions. Many highcapacity web servers either disable hostname resolution or enable it only for particular content.

서버는 클라이언트 IP주소를 hostname으로 변환할 수 있다. Reverse DNS를 이용한다. hostname으로 자세한 access contorol과 logging이 가능하다. 하지만 hostname 찾기는 시간이 오래걸린다. 많은 고성능 웹서버는 hostname lookup resolution을 꺼둔다.

3) Determining the Client User Through ident

Some web servers also support the IETF ident protocol. Use the ident protocol to determine HTTP client username

Step 2: Receiving Request Messages p.112 요청 메세지 받기

As the data arrives on connections, the web server reads out the data from the network connection and parses out the pieces of the request message. parse= analyse

When parsing the request message, the web server: request 메세지를 분석한다 startline, header, body

•Parses the request line looking for the request method, the specified resource identifier (URI), and the version number, each separated by a single space, and ending with a carriage-return line-feed (CRLF) sequence[4] ([4] Many web servers support LF or CRLF as end-of-line sequences, because some clients mistakenly send LF as the end-of-line terminator.)

STACKOVERFLOW- historical explanation of CRLF(end of line sequences) -CR =Carriage Return (moves the cursor to the beginning of the line without advancing to the next line.)and LF =Line Feed (moves the cursor down to the next line without returning to the beginning of the line)

• Reads the message headers, each ending in CRLF

• Detects the end-of-headers blank line, ending in CRLF (if present)

• Reads the request body, if any (length specified by the Content-Length header)

Internal Representations of Messages

Some web servers also store the request messages in internal data structures that make the message easy to manipulate.

Connection Input/Output Processing Architectures

High-performance web servers support thousands of simultaneous connections. Web servers constantly watch for new web requests, because requests can arrive at any time. Different web server architectures service requests in different ways

1) Single-threaded web servers: process one request at a time until completion

2)Multiprocess and multithreaded web servers

dedicate multiple processes or higher-efficiency threads to process requests simultaneously

A process is an individual program flow of control, with its own set of variables. A thread is a faster, more efficient version of a process. Both threads and processes let a single program do multiple things at the same time.

3) Multiplexed I/O servers

all the connections are simultaneously watched for activity. When a connection changes state (e.g., when data becomes available or an error condition occurs), a small amount of processing is performed on the connection; when that processing is complete, the connection is returned to the open connection list for the next change in state.

4) Multiplexed multithreaded web servers

Some systems combine multithreading and multiplexing to take advantage of multiple CPUs in the computer platform.

Step 3: Processing Requests

it can process the request using the method, resource, headers, and optional body.

Step 4: Mapping and Accessing Resources

Web servers are resource servers. They deliver precreated content, such as HTML pages or JPEG images, as well as dynamic content from resource-generating applications running on the servers.

웹 서버는 리소스 서버이다. HTML, JPEG 등을 전달한다.

Before the web server can deliver content to the client, it needs to identify the source of the content, by mapping the URI from the request message to the proper content or content generator on the web server.

리소스를 전달 하기 전에 request 메세지의 URI를 서버에 있는 content랑 매핑하면서 콘텐츠 소스를 확인해야한다. -> 콘텐츠가 웹서버의 어디에 있는지 찾아야 한다.

Docroots

Web servers support different kinds of resource mapping. Typically, a special folder in the web server filesystem is reserved for web content. This folder is called the document root, or docroot.

1)Virtually hosted docroots

A virtually hosted web server identifies the correct document root to use from the IP address or hostname in the URI or the Host header.

2)User home directory docroots

Directory Listings

A web server can receive requests for directory URLs, where the path resolves to a directory, not a file. Most web servers can be configured to take a few different actions when a client requests a directory URL:

• Return an error.
• Return a special, default, "index file" instead of the directory.
• Scan the directory, and return an HTML page containing the contents.

Most web servers look for a file named index.html or index.htm inside a directory to represent that directory.

Dynamic Content Resource Mapping

Web servers also can map URIs to dynamic resources—that is, to programs that generate content on demand

A web server can serve static resources as well as dynamic resources

CGI is an early, simple, and popular interface for executing server-side applications. Modern application servers have more powerful and efficient server-side dynamic content support, including Microsoft's Active Server Pages and Java servlets.

 

Server-Side Includes (SSI)

If a resource is flagged as containing server-side includes, the server processes the resource contents before sending them to the client.

The contents are scanned for certain special patterns (often contained inside special HTML comments), which can be variable names or embedded scripts. The special patterns are replaced with the values of variables or the output of executable scripts. This is an easy way to create dynamic content.

Access Controls

sWhen a request arrives for an access-controlled resource, the web server can control access based on the IP address of the client, or it can issue a password challenge to get access to the resource. Refer to Chapter 12 for more information about HTTP authentication.

 

Step 5: Building Responses p.120

Once the web server has identified the resource, it performs the action described in the request method and returns the response message

the content is sent back with the response message.

• A Content-Type header, describing the MIME type of the response body

• A Content-Length header, describing the size of the response body

• The actual message body content

 

MIME Typing

The web server is responsible for determining the MIME type of the response body.

A web server uses MIME types file to set outgoing Content-Type of resources

Redirection

Web servers sometimes return redirection responses instead of success messages. A web server can redirect the browser to go elsewhere to perform the request.  a 3XX return code. The Location response header contains a URI for the new or preferred location of the content.

 

Step 6: Sending Responses

For persistent connections, the connection may stay open, in which case the server needs to be extra cautious to compute the Content-Length header correctly, or the client will have no way of knowing when a response ends

 

Step 7: Logging

Finally, when a transaction is complete, the web server notes an entry into a log file, describing the transaction performed

 

 

 

 

 

 

~123 (5 web servers ) 까지 정리하고 싶음. 하루 10 페이지 읽고 정리

출처: http://www.staroceans.org/e-book/O'Reilly%20-%20HTTP%20-%20The%20Definitive%20Guide.pdf


무엇을 읽고 무엇을 안 읽을 것인가

I. HTTP: 웹의 기초
Part I은 모두 읽는 것이 좋다. 기초가 튼튼해야 이후의 내용을 잘 이해할 수 있다.

1장 “HTTP 개관”은 HTTP에 대해 개략적으로 설명해주므로, 이후의 내용들을
이해하는데 도움이 된다.

2장 “URL과 리소스”와 3장 “HTTP 메시지”는 반드시 읽어야 한다. 이 장들을 읽지
않으면 이후 내용들도 이해하기 어렵다. 특히 3장은 이름대로 HTTP 그 자체를
설명하는 장이다.

4장 “커넥션 관리”도 읽는 것이 좋다. HTTP 기저에서 TCP가 어떻게 동작하는지
설명한다. 읽고 나면 HTTP가 왜 느린지 이해할 수 있게 될 것이다.

II. HTTP 아키텍처

Part II는 유용한 내용이 많다. 특히 5장이 가장 중요하고 그 다음은 7장이 유용하다.

5장 “웹 서버”는 웹 서버가 어떻게 동작하는지 설명한다. 웹 프로그래머라면 반드시
이해해야 할 것이다.

6장 “프락시”도 읽는 것이 좋다. 프락시에 대한 이야기는 이후에도 계속 나오게
된다. 또한 네트워크 엔지니어들과 대화하려면 프락시 정도는 이해하는 것이 좋다.

7장 “캐시”도 읽어 두자. 제목은 캐시지만 캐시 뿐 아니라 조건부 요청(304로
응답하는 그거)도 다룬다. 웹 프로그래머라면 반드시 써먹게 될 것이다. 15장에서도
같은 내용을 다루기는 하지만 7장이 더 자세하다.

8장 “통합점: 게이트웨이, 터널, 릴레이”는 꼭 읽어야 하는 건 아니다. 나중에
궁금해지면 읽어도 별 상관은 없다.

9장 “웹 로봇”은 로봇이나 검색엔진에 관심이 있다면 읽어보자. 웹 서비스를
운영하게 된다면 웹 로봇이 무슨 원리와 규칙으로 동작하는지 궁금해질 것이다. 혹은 그냥 웹에 대한 교양이라는 느낌으로 읽어도 좋다.

10장 “HTTP/2.0″는 HTTP/2가 궁금하다면 읽어보자. HTTP/2의 목적은 성능 개선이니, HTTP/1이 느린 것이 불만인 사람도 읽어보자. HTTP/2가 완성되기 전에 쓴
것이긴 하지만 최신 명세와 크게 다른 점은 없을 것이다.

III. 식별, 인가, 보안

Part III는 13장 빼고는 대체로 유용하다.

11장 “클라이언트 식별과 쿠키”를 읽으면 쿠키에 대해 올바르게 이해할 수 있게 된다.

12장 “기본 인증”도 읽으면 좋다. 기본 인증(Basic Authentication)은 여전히 종종
쓰이기 때문이다. 그리고 매우 쉬워서 읽기도 좋다.

13장 “다이제스트 인증”은 읽을 필요가 없다. 다이제스트 인증 쓰는 것은 거의 본
일이 없다. 이걸 공부해도 써 먹을 일은 아마 없을 것이다. 심지어 내용도 복잡해서
읽어도 이해가 잘 안된다. 여럿이 같이 스터디 중이라면 이 장은 그냥 제끼자.

14장 “보안 HTTP”은 HTTPS를 다루고 있다. 읽는 것이 좋다. 몰라도 HTTPS를 쓸 수는
있겠지만, 왜 HTTPS가 안전한지 이해하고 싶다면 읽는 것이 좋다.

IV. 엔터티, 인코딩, 국제화

파트 I가 HTTP에 대한 기본적인 이해를 위해 필요했다면, 파트 IV는 HTTP를 제대로
쓰기 위해서 필요하다. 여기서 다루는 내용들은 대체로 웹 서버나 웹 프레임워크가
알아서 처리해주지 않아서 웹 프로그래머가 이해해야 하는 것들이 많다. 가급적 모두
읽도록 하자.

그 중에서도 16장 “국제화”에서 다루는 내용은 비단 웹 프로그래밍이나 HTTP에만
적용되는 내용이 아니라 다국어를 다루는 모든 프로그래머가 알아야 할 내용이므로 활용 범위가 매우 넓다.

V. 콘텐츠 발행 및 배포

이 파트의 내용은 선택적으로 필요에 따라 읽으면 된다. 스터디를 하고 있다면 이
파트 전체를 생략해도 괜찮다.

18장 “웹 호스팅”은 웹 서비스 운영을 시작하게 되면 그때 읽어도 무방하다.

19장 “배포 시스템”은 FrontPage와 WebDAV을 다루는데, 지금 FrontPage나 WebDAV을 쓸 일이 없다면 읽지
않아도 무방하다.

20장 “리다이렉션과 부하 균형”은 앞부분만 좀 읽고 넘어가도 된다. 이 장에서
다루고 있는 캐시 배열 라우팅 프로토콜 같은 거 나는 써본 일이 없다. 혹시
네트워크 엔지니어 역할까지 겸하고 있다면 알 필요가 있을지도 모르겠는데, 나는
그런 경험을 해 본 일이 없어서 잘 모르겠다.

21장 “로깅과 사용추적”도 역시 필요에 따라 읽으면 된다. 로그 포맷에 대한 내용은
사용하고 있는 웹 서버의 매뉴얼을 읽어도 충분할 것이고, 사용 추적은 필요하지
않을 수도 있다.

매우 바쁘다면 1-3장만 읽자. 그 정도만 읽어도 큰 도움이 된다.
조금 바쁘다면 1-5, 7, 11, 12, 14, 15, 16, 17장을 읽자.

출처:웹 프로그래머를 위한 HTTP 완벽 가이드 읽는 법