Designing Data-Intensive Applications

Synthetic Data Generation Join Blog

Join Discord

Created at 9pm, Feb 24

furkan

Software Development

Designing Data-Intensive Applications

Contract ID

M3Mlh0XD_IIgzVnMyqLwhIZRc5sK4S1cU10-4_HDpo0

File Type

PDF

Entry Count

1821

Embed. Model

jina_embeddings_v2_base_en

Index Type

hnsw

Martin Kleppmann - Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalable, and Maintainable Systems 2017

An emerging idea is to treat GC pauses like brief planned outages of a node, and to let other nodes handle requests from clients while one node is collecting its garbage. If the runtime can warn the application that a node soon requires a GC pause, the application can stop sending new requests to that node, wait for it to finish process ing outstanding requests, and then perform the GC while no requests are in progress. This trick hides GC pauses from clients and reduces the high percentiles of response time [70, 71]. Some latency-sensitive financial trading systems use this approach.

id: 54895827f02c4bc8483e25b76a715961 - page: 321

A variant of this idea is to use the garbage collector only for short-lived objects (which are fast to collect) and to restart processes periodically, before they accumu late enough long-lived objects to require a full GC of long-lived objects [65, 73]. One node can be restarted at a time, and traffic can be shifted away from the node before the planned restart, like in a rolling upgrade (see Chapter 4). These measures cannot fully prevent garbage collection pauses, but they can usefully reduce their impact on the application.

id: 6a7c1d78c3330a0bf5ab7cbb4f72d297 - page: 321

Unreliable Clocks | 299 Knowledge, Truth, and Lies So far in this chapter we have explored the ways in which distributed systems are dif ferent from programs running on a single computer: there is no shared memory, only message passing via an unreliable network with variable delays, and the systems may suffer from partial failures, unreliable clocks, and processing pauses. The consequences of these issues are profoundly disorienting if youre not used to distributed systems. A node in the network cannot know anything for sureit can only make guesses based on the messages it receives (or doesnt receive) via the net work. A node can only find out what state another node is in (what data it has stored, whether it is correctly functioning, etc.) by exchanging messages with it. If a remote node doesnt respond, there is no way of knowing what state it is in, because prob lems in the network cannot reliably be distinguished from problems at a node.

id: 7f59c324935ee9d4f334317905d92190 - page: 321

Discussions of these systems border on the philosophical: What do we know to be true or false in our system? How sure can we be of that knowledge, if the mechanisms for perception and measurement are unreliable? Should software systems obey the laws that we expect of the physical world, such as cause and effect? Fortunately, we dont need to go as far as figuring out the meaning of life. In a dis tributed system, we can state the assumptions we are making about the behavior (the system model) and design the actual system in such a way that it meets those assump tions. Algorithms can be proved to function correctly within a certain system model. This means that reliable behavior is achievable, even if the underlying system model provides very few guarantees.

id: 1f5f63e4167e81a015a5c65a8275fd42 - page: 322

How to Retrieve?

# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "M3Mlh0XD_IIgzVnMyqLwhIZRc5sK4S1cU10-4_HDpo0", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "M3Mlh0XD_IIgzVnMyqLwhIZRc5sK4S1cU10-4_HDpo0", "level": 2}'