Networking notes - the lower three levels

From Helpful
Jump to: navigation, search

For other network related things, see:

Also:

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

See also


  • 802.3u is Fast Ethernet (100Mbit), 802.3ab is gigabit ethernet over copper, 802.3ah added gigabit over fiber


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Layers, frames, and packets

Networking is modelled using layers.

Analogies

Consider that when you send paper mail or packages, you only have to care about addresses, and where to drop them off.

You don't have to care about how about sorting centers, how it gets moved, about redelivery of packages when people are not home, about strange numbering on streets, about the barcode systems used and changes in that over time, about international transfer, about returning to sender, about what boxes it is transported in, about the agreements in the sorting center that mean it won't lie around for too long.


Those are controlled processes that other people do have to worry about.

And because they do, you rarely need to care about more than addresses. You see a mailbox and the rest can be done by gnomes for all we care - as long as they're responsible gnomes.


Networking does something similar with responsibilities. (If a bit more strictly)

So, when you know a website's address, all the transport is done for you. You don't have to care that it goes through fourteen routes and a trans-atlantic ocean.

Just moving it around already involves a lot of details. Roughly speaking,

layer 1 (physical) deals with how to put bits on copper or fiber
layer 2 (data link) deals with moving structured data between devices (that can directly see each other)
layer 3 (network) deals with the same if devices if they are part of a larger and changeable network and not.
layer 4 (transport) deals with segmentation, acknowledgement, traffic control


In analogy, layer 1 and 2 give you envelopes and postmen in cars carrying them, layer 3 gives you country-wide addresses and sorting centers that understand them, layer 4 gives you things on top like trying to deliver the next day if you weren't at home, "sign for this please" and why things don't fall over at christmas (but you have to accept ).


The layering means that you don't have to worry about how the lower layers actually do their job.

It also means that things can be interchangeable, can be replaced with different technologies not only without complete overhaul, but transparently (if they live up to the same responsibilities).


You can put things on various types of Layer 2, including Ethernet, FDDI, Fiber Channel, InfiniBand, DSL, ATM, T1, pigeon, or what have you. Well, the last one has horrible latency, but you know.


The higher you go in levels, the more you start dealing in abstractions - do you want to guarantee the data is verified? Guarantee it will arrive in order? Try for the lowest latency? These sorts of properties are also at odds, which is e.g. why within IP you have both TCP (ordered, verified) and UDP (lower latency, but fewer guarantees, so more rough edges).


Network people often stop talking at some layer. If we're network support then we care about topology and stability and bandwidth. We're here to help you move data, not to tell you what to use that for. When you have working IP (TCP/IP and UDP/IP) then its up to your your computer to figure out what to slap on top of that.

In fact, that's a reason why the original seven layers (OSI model) is also seen simplified to four (DOD model), which basically says "how you use it is up to you."


Encapsulation

In terms of transmitted bytes of data, each networking layer is generally only concerned with the layer directly under it. As long as that lower layer lives up to its promises, the higher layer doesn't have to care about how it does, and can be very useful by just fulfilling its own.


Each later is said to encapsulate the layer under it. For example, visiting a web page is data transferred by HTTP, which is wrapped in TCP, in IP, and (at your end) most likely Ethernet or WiFi for delivery to your broadband modem, which may then go via one of the various DSL or cable technologies, when then probably goes via copper and fiber at different stages, and whatnot.

In terms of protocols, the encapsulation at your end, assuming you're using TCP/IP and have Ethernet hardware, goes something like:

Ethernet header (14 bytes) Ethernet payload data (0-1500 bytes1), in this case:
IP header (20 bytes) IP payload (0-1480 bytes), which in the case of TCP/IP is:
TCP header (20 bytes) TCP application data (HTTP)
(0-1460 bytes)
Ethernet checksum (4 bytes)


...where Ethernet refers to IEEE 802.3, (although recently WiFi, 802.11 is also a common encapsulator). The 1518 bytes is the maximum frame size imposed by Ethernet, which implies a maximum of 1500 byte payload in IP-over-Ethernet (or anything else embedded in Ethernet), and to at most 1460 bytes per packet in TCP/IP-over-Ethernet


1 - actually, there is a a minimum size for Ethernet frames: 64 bytes. If smaller, the frame will be considered a framing error called a runt, and discarded. Protocols like ARP have to pad their frames with nonsense data to avoid this. Also, with jumbo frames[1], the maximum size can be larger.

Cooperation

In practice, layers may cooperate somewhat. One of the more basic examples is ARP, which connects link layer addressing (which is fairly local) with network layer addressing: It maps MACs to IP addresses (and in many uses also remembers what port/interface it's on, for routing/switching practice). Yes, such cooperation means more work, and is often specific to specific combinations, but is sometimes very useful and sometimes simply necessary.

Note that the IP suite describes mainly layer 4 and 5. From its perspective, lower is the stuff that gets its messages delivered, and higher is the data that applications choose to interchange via IP. Since IP is practical enough to span the globe, most programmers can choose to deal with just one one of these two layers, depending on what amount of control they want. Layer 3 is useful to some, for example to dig around for network troubleshooters.


Packets and frames (also datagrams)

In practice, 'packet' and 'frame' are used interchangably in the loose meaning of 'a chunk of data', and the meaning usually comes only from context.

Somewhat more strictly, frames refer to low levels only concerned with delivery, while packets are often semantically significant units of data to be moved. In the OSI/DoD layer model, you can say that layer 1 deals in bits, layers 2 deals with frames, layer 3 with packets.

'Ethernet frame', 'IP packet', 'TCP/IP packet'.

And e.g. 'UDP/IP datagram.' - datagrams are almost synonymous with packets, but the word is used to suggest cases of not-strictly-reliable delivery.


Packets are handed to lower layers as units, to be delivered as units. The lower layers will end up using frames for this. A packet could be larger than a frame allows, which means a packet is delivered using multiple frames.


Strictly speaking, you can still encapsulate packets in packets, and also frames in frames. The latter occurs because that which we refer to as an 'Ethernet frame,' is what we see at link layer, while on the wire (which is also often called a frame) it will be encapsulated in whatever the physical protocol dictates, which may add things like synchronization bytes.

Layer models

There are actually two model layers, the OSI model being the formal one, and the one from the IP suite (quite similar to the DoD model) the practical one - for IP use. It mostly ignores higher layers such as OSI's 6 and 7 - which are relatively rare to see used anyway.


Layer 1 (physical) and 2 (data link):

  • Handled by hardware protocols, which for users is usually Ethernet, or more specifically, various sections of the IEEE 802.3 standard.
  • Layer 1 and 2 are often linked because hardware and its management are often designed together for efficiency.
  • implicit data overhead at these levels (e.g. 14 bytes for the Ethernet II header, 4 for the ethernet checksum) are usually not counted in packet sizes, since they are unavoidable.


Layer 3 (network): (in the IP suite also called 'internet layer')

  • Networking logic: node addressing and basic/local routing are usually addressed here, often with some basic error control and segmentation
  • IP Suite protocol: IP (20-byte header)
  • Other ptotocols: IPX, ARP (IP-address-to-Ethernet-address lookups), RARP (its reverse), IGMP (multicast support protocol), RIP and other routing-related protocols


Layer 4 (transport):

  • Decisions such as whether to be connection-based or not, and may introduce concepts such as ports, in-order guarantees, flow control, more error handling and such.
  • IP Suite protocols: TCP (20-byte header), UDP (8-byte), ICMP (4-byte).
  • Other protocols: SPX (resembling TCP), NetBEUI, SCTP


Layer 5 is seen as 'application' in the IP suite, which there is the highest layer. That makes it it everything you as a coder work with, and anything higher-level is just a program doing its thing. UDP and particularly TCP are rather practical for networking, so applications often build straight onto them, including everything from HTTP to SIP to SSH to DHCP to FTP to RTP to SMTP to SNMP to DNS to SOAP.



OSI, however, goes on up to seven layers and calls layer 5 session. In its view:

OSI layer 5 (session)

  • is usually not necessary, but protocols at this level tend to be protocols that add some functionality for a program to use, dealing with concepts like that of (simultaneous) streams, or:
  • describes tunneling and VPN provisions, such as in PPTP
  • may provides security and sessions by SSL and SSH
  • may do low-latency provisions for multimedia, such as in RTP / RTCP
  • handle easier-to handle session setup (more sobust, more featured), such as in SIP
  • includes NetBIOS, which adds its own host naming, name service, connectionful and connectionless transfers. (though not so much large-scale routing, so IP is categorically more interesting)


OSI layer 6 (presentation)

  • Seems intended as a canonicalization step between applications on different platforms, such as in:
  • NCP (Netware Core Protocol) which drives most netware applications; it provides file access, access control, printing, statistics, and more


OSI layer 7 (application)

  • In the OSI view, this is all applications. That is, those that in the IP view would have been called 'layer 5, application.'

Bridging notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Bridging interfaces in a node means things attached to either end will look, to each other, to be part of the same Ethernet segment.

Bridging can be contrasted with routing. Routing means two separate segments, a node that has an interface on both, and and some explicit routing-rule reason to send each packets between them (or not to).

Since in a bridge all Ethernet traffic makes it back and forth (though you can add filtering of ethernet frames), it's nontrivial to determine that there is a bridge in a network at all.


Bridges speak STP to other bridges, so that you don't get weird loops and such.

What bridging code does with each frame depends on its ARP table. Basically, if the bridge knows a packet is already on the right side, it won't copy it to the other:

  • if the MAC is on the same side of the bridge, it ignores the frame
  • if the MAC is on another side of the bridge, it sends it there
  • if the MAC is unknown, it's sent to all parts of the bride
...and we would usually get some ARP packets that lets us do better soon
  • if the MAC is us, it's passed to our own network stack (usually IP)

(You can inspect details: brctl showmacs br0)


The bridge itself does not need to participate in the network, but it can if you see some reason. For example, in linux I usually configure the bridge device for DHCP, so that I can log in remotely via SSH.


The effects of using linux as a bridge (rather than choosing a hardware bridge):

  • you can use firewalling
  • you can monitor traversing traffic
  • you can use traffic shaping ((verify))
  • the bridge can participate in the network (however...)
  • the bridge can eat CPU on busy networks, because it handles pretty much all traffic it sees (NICs have to be in promiscuous mode)


Manual setup:

brctl addbr br0           # create bridge interface, arbitrary name (this is just a convention)
brctl addif br0 eth0      # add interfaces you want to bridge
brctl addif br0 eth1

# in case your node was not set up for routing before, you may want some of:
echo 1 > /proc/sys/net/ipv4/ip_forward
for devfile in br0 eth0 eth1;
do
  echo 1 > /proc/sys/net/ipv4/conf/${devfile}/proxy_arp;
  echo 1 > /proc/sys/net/ipv4/conf/${devfile}/forwarding;
done;
# if running multiple or redundant bridges on the same network, you likely want STP:
brctl stp br0 on
# you can inspect STP details with:
brctl showstp br0
# As for IP configuration, you likely want none on the bridged interfaces
ifconfig eth0 down
ifconfig eth1 down
ifconfig eth0 0.0.0.0 up
ifconfig eth1 0.0.0.0 up
# If you want to be able to connect to the bridge via SSH (or anything else),
#  then configure an IP on the bridge interface itself.
# In practice it may be preferable to configure DHCP. For manual config:
ifconfig br0 192.168.1.222 broadcast 192.168.1.255 netmask 255.255.255.0 up
# and possibly:
route add default gw 192.168.1.222


Automatic, debian/ubuntu /etc/network/interfaces

auto lo br0
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

# Bridge setup
iface br0 inet dhcp
    bridge_ports eth0 eth1


Notes:

  • The bridge will drop all packets for (by default) the first 30 seconds. This is meant for complex topologies, to avoid creating loops before STP can resolve them. (note: forwarding delay is enabled even when STP isn't) (verify)
    • When you know this won't be a problem (e.g. when you make a single bridge on your home LAN), you can lower or disable that delay. For example, to disable (for the debian-style config, look at up (post-up):
brctl setfd br1 0
  • when there is any possibility of creating loops, enable STP:
brctl stp br0 on
  • sethello 10


  • bridging with WiFi devices is possible but apparently somewhat complex
  • The bridge will take the MAC of one of its members (presumably the first?(verify))


Inspection

MAC knowledge of the bridge:

brctl showmacs br0


http://ebtables.sourceforge.net/br_fw_ia/br_fw_ia.html#section2

http://www.tldp.org/HOWTO/Ethernet-Bridge-netfilter-HOWTO-3.html

-->

Fancier setups

https://wiki.debian.org/NetworkConfiguration#Bridging


Monitoring

Errors

Add bridge failed: Package not installed

You don't have kernel support for bridging.

Compile it in. It should be in 'Networking' → 'Networking options'.

can't add ppp0 to bridge br0: Invalid argument

Typically means that interface can't carry Ethernet.


See also


Semi-sorted

Glossary

  • NIC - Network Interface Card. Used whenever it's not a given that a single computer has only one connection (and in documentation).
  • interface usually refers to how drivers provide (and how software uses) a NIC. Regularly used synonymously with NIC.
  • adapter - Regularly used synonymously with NIC.
  • node - usually means 'a single box of hardware', or something else coherent enough to be seen as a single actor. May have multiple NICs (that are or aren't related to each other). Examples include computers, routers, and more.
  • frame, packet, segment, PDU
    • Theoretically/formally, frame means layer 2, packet means layer 3, segment means layer 4, and the generalizing term is PDU)
    • In practice, the terms are fuzzy and there is no hard difference, except that 'frame' suggests lower layers (physical and link) and 'packet' suggests higher levels (layer 3 and above).

On MACs and EUIs

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

MAC address (Media Access Control address), is a 48-bit address, now called MAC-48, or rather EUI-48.

Also commonly called something like hardware address, adapter address, physical address, Ethernet Hardware Address (EHA) and probably some others.

Mostly known for its use in Ethernet cards, and actually somewhat broader: some other IEEE 802 networking requires use of MACs in the frames, including 802.11 (WiFi), IEEE 802.5 (token ring). Others use it, including Bluetooth, FDDI, Fibre Channel. ATM also uses it.



EUIs are a somewhat wider concept.

They are used by used by software and non-networking hardware.

From that perspective, EUIs are also MACs when they are used by networking hardware as node identifiers. Since they are assigned from the same numbering space, not many people or situations really care for the distinction.


The newer 64-bit addressing is usually called EUI-64 (not MAC-64).

EUI-64 is used by FireWire, ZigBee/802.15.4, can be used in IPv6.


Most real-world MACs are enumerations within company-specific ranges, and companies assign each within their range at most to one physical device, which makes MACs globally unique.

There are also some MACs that are purely local - much like IP private net addresses are local (and independent from the same used in an unrelated network).


Further notes:

  • MAC-48 - the original scheme (IEEE considers MAC-48 an obsolete term)
  • EUI-48 can be seen as a later standardization(verify)
  • The 48-bit addresses is split either into:
    • 24 and 24: a 24-bit OUI (Organizationally Unique Identifier) with a 24-bit identifier from the organization
    • 36 and 12: a 36-bit IAB (Individual Address Block) or OUI-36, with a 12-bit identifier from the organization
  • EUI-64 is a larger address space
    • ...also a superset of EUI-48: there is a simple rule for converting existing EUI-48s to EUI-64 (in which you can actually distinguish between EUI-48 and MAC-48 -- middle octets are FFFE and FFFF, respectively -- though RFC5342 notes that IETF uses FFFE for both sources.


In the 48-bit addresses (what about EUI-64?) there are two special bits:

  • bit 0 (highest bit)
    • 0 means universal - assigned though OUI/IAB system
    • 1 means local - if MAC was assigned based on administration on the local network
  • bit 1 (second-highest bit)
    • 0 means unicast
    • 1 means multicast (special handling, used in ethernet, FDDI)


See also:

Wake On Lan

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
Hardware-wise
  • On-motherboard Ethernet that supports WOL should just work
  • PCI cards
    • may need a cable, which signals the motherboard to boot. Not all motherboards have such a connector.
    • PCI 2.2 can send it via power management events (support may vary?(verify))


Configuration:

  • PCs: Your BIOS should mention Wake on Lan (or some similar wording), usually in a power-related part of the BIOS.
  • Macs: ?


Triggers

Include...

  • Magic Packet (most common)
    • Contains the MAC address of the computer to wake, typically broadcast on the subnet that host is on
    • When WOL configuration doesn't mention ant of this list, it'll likely be Magic Packet.
  • Specified pattern within packet
    • like magic packet, but configurable
  • SecureOn(verify), which is like Magic Packet, but adds 4 or 6 extra bytes to match
    • that password (-of-sorts) is not secure against eavesdropping. It seems mostly intended to make accidental waking and brute force waking harder(verify)
  • Link Change (e.g. "computer tuns on when you plug its LAN cable into a switch"(verify))
  • There may be further options, e.g. reacting to specific types of traffic, but this is often impractical.


On Magic Packet, and the variant with password
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

A magic packet is any layer-2 packet that contains:

  • a synchronisation stream, meaning six 0xFF bytes, directly followed by...
  • sixteen copies of the target NIC's MAC, in binary form (6 bytes each)
    • I've seen code with more than sixteen copies of the MAC, which works just as well (though may conflict with the password feature).
  • Optional: If both the NIC and the packet sender support passwords, this means that 4 or 6 pre-determined bytes should follow the above sequence.

The above sequence may appear anywhere in the packet (presumably to be robust to varying protocol encapsulation), though in practice it's often much simper - for example, in most cases it's a simple UDP/IP/Ethernet thing so the placement is very predictable (and local broadcast tools may specifically use EtherType 0x0842).


While NICs only look at all layer 2 packets they get, it is

  • usually Ethernet frames, because that's what most end networks use
  • usually IP, convenient for its routing
  • often UDP/IP, because it's a connectionless fire-and-forget protocol
(regularly UDP/IP broadcast -- related to routing behaviour, see below)
  • regularly on port 7 or 9 (or sometimes 0), primarily because they are unused ports you can use (forward) for this specific functionality (and even if they are used it's fine, because historically port 7 is echo, 9 is discard)


Networking / routing, in general and particularly for web based WOL

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Broadcasting is simple enough for same-subnet WOL, but things get more complex when you send WOL packets between subnets, which includes using web-based tools for WOL.


In general, your options for getting WOL packets where you want them to go, depending on your networking details, include:

  • broadcast the packet within the subnet
    • usually the easiest option within the same subnet
    • however
      • most network stacks will only create such a packet so if they are themselves (currently) configured to be on the subnet mentioned in the packet (for fairly good reason)
      • devices acting as routers (and, in some cases, bridges) tend to filter out broadcast packets; broadcast packets won't work between (sub)nets (again, for fairly good reason)
  • Get a host on the target subnet to generate a broadcast packet for you. A number of modems/APs can do this, or be easily made to. Other always-on hosts (e.g. LAN servers) can be helpful too, and sometimes more flexible.
  • using a routable unicast packet for a specific host (unicast)
    • requires the router to have a static ARP entry for the target (because without it, that router will consider the packet non-deliverable when the host is off)
    • for LANs with private IP addresses, this also means a forwarding rule (which itself means one port can serve at most one host)
  • use VPN as a way of belonging to the specific subnet - which can in some situations be an easy solution, but requires the VPN endpoint to always be active. If that's your modem/AP that's easy, but that may be harder to configure than VPN in general.


Slightly more concretely, assuming you want to use IP for routing between subnets, you have roughly these options:

  • route as unicast to the intended host
    • depends on a static ARP entry (one for each WOL-wakeable host) so that the host is considered routeable on that subnet when the host is off
    • you must know the IP at sending time (not all WOL clients let you do this)
    • if you are on a private-IP subnet (10.*, 192.168.*, 172.16-31.*) this won't work as-is; you'ld need to...
  • route as unicast, forward (DNAT) all use of a single port to a specific host
    • doesn't require you to know the IP, works for private-IP subnets
    • requires a forwarding route per host
    • requires that host having a non-changing IP (usually; depends on WOL-related features of forwarder)
    • needs a static ARP entry (per host) so that the host is considered routable on that subnet even when/though it is off
  • route as unicast until you get to the target network, then rewrite into a broadcast
    • Most devices and network stacks won't forward (DNAT) to a broadcast address, so you can rarely do this using just port forwarding (It's too easy to do bad/stupid things this way. A few devices are not clever enough to realize the risk, and allow it)(verify)
    • ...so usually needs a program to be the endpoint for WOL, and broadcast them locally - fairly easy to write, but it needs to be on an always-on host


WOL senders

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

It's easy generate magic packets - given socket-level networking. It takes at most a few dozen lines of python, perl, ruby, PHP, etc.

For example, in python:

def broadcast_wol(mac, password_bytes='', ports=9):
    ''' Locally broadcasts a Wake-on-LAN packet (Magic Packet) with the given MAC.
        Arguments: 
         mac             should be a string. Will be stripped of everything not [0-9A-Za-f], 
                         so you can use most any form of MAC strings.
         password_bytes  if used, should be a bytestring
         ports           can be an integer, or iterable of integers, e.g. [7,9]
    '''
    import socket,struct,re
 
    #Remove all but hex so that we're robust to MACs with separator characters
    mac=re.sub(r'[^0-9A-Za-z]','',mac)
    if len(mac)!=12:
        raise ValueError('%r does not look like a valid MAC address'%mac)
 
    # Construct packet in hex form. It's easier and a little more readable
    hex_data = 'FF'*6
    hex_data += mac*16
    packet_data = hex_data.decode('hex_codec')  # ...then turn into the bytestring we want
    packet_data += password_bytes
 
    # Use low-level socket interface to broadcast the result via UDP
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)    # UDP socket
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1) # Broadcast socket
    if type(ports) in (int,long):
        ports=[ports]
    for port in ports:
        sock.sendto(packet_data, ('<broadcast>', port)) # '<broadcast>' refers to INADDR_BROADCAST, 255.255.255.255


Apps include:

Web services include:

See also

NIC, protocol address, and name resolution

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

There are different things that resolve.

Ethernet and IP; ARP

Say our LAN is Ethernet, as most are.

The unit of communication at ethernet level is the ethernet frame, which has a source MAC and destination MAC, where MAC is unique identifier for a physical network port.

So when you want to send to another ethernet node, you must first learn that unique identifier. There were a few different ways of doing this (all basically lookups of something more convenient than MACs), but we generally landed on doing everything with IP.


So say we want IP on top of it.

This means ARP, essentially the glue between ethernet (layer 2) and IP (layer 3). Conceptually, ARP consists mostly of two messages:

"hey guys, what's the MAC for this IP address?"
"that would be me - this IP and this MAC"


For hosts

The above are broadcast messages, meaning everyone gets it - at least within the same Ethernet segment (which typically corresponds to an IP subnet). This means

you don't get all the world's ARP messages, which is sane and good
that you need a way to send messages to other ethernet segments
which is mostly solved at IP level -- but relevant in some of the below, means you'll have an IP (and soon the according MAC) usually used for "all other traffic".


For networking hardware

An ethernet node isn't necessarily aware of IP.

In particular for networking hardware. Before switches were the norm, you had hubs.

Hubs were Ethernet-only and not IP-aware - and didn't need to be, remember how layers make life simpler. But because it doesn't know on which port the target host is connected, it just duplicated all incoming ethernet frames on all other ports.

Hubs aren't made anymore, because IP-aware ethernet switches are much more efficient, and very little extra effort.

These will listen to ARP responses, and can ignore the IP part, because they just want to learn which physical port answered "that would be me" with the relevant MAC.

Switches act like hubs only for packets where the destination MAC isn't yet known to be on a port. Which is fine, both because that's a small part of all traffic, and because they will typically learn it very soon.


Uses and abuses

Beyond making IP over ethernet work, there are ARP-related tricks and attacks, such as

  • having multiple IPs on a physical port
  • having multiple hosts have the same IP
though only under very specific conditions
  • claiming all unused IPs for sniffing purposes
  • sending many random ARP responses, implicitly flushing real entries out of the limited-size ARP tables
bsically reducing the switch to a hub, making it easier snoop on traffic from hosts that are not the packet's actual target (but only from another host on the same switch/segment).
  • learning the IP of the gateway, then sending around ARP asnwers with incorrect MACs
so that you're the only device left for which IP routing still works.


A lot of these are possible because ARP is connectionless and stateless, implying answers without questions are trusted. ARP spoofing is abusing that fact.

There are switches that alleviate many of these attacks, because some things (e.g. a port basically claiming to have thousands of hosts) are pretty easy to detect.



Server names; DNS

Name to IP address resolution that particularly enables you to use human-rememberable names in your browser address bar (but can potentially serve anything IP-based), is actually fairly unrelated to basic networking. IP numbers are used for actual communication, and this service simply tells you the IP(s) for a name.

In fact, there are tricks that allow one IP to have multiple names, one name to cycle though pointing to various nodes (a type of load balancing),

There are other ways to do the same resolution step, for example the hosts file on at least unices (/etc/hosts) and windows (I'm not sure the location is the same, search for it) which essentially hardcodes a name to an IP address. This can interfere when incorrectly used, though. Some protocols, such as SMB (windows sharing / CIFS / samba) also allow netbios, wins and lmhosts to provide adresses for names - IP or netbios.


Netbios names; WINS

In some way analogous to both of the above are netbios names. Netbios was used by the early windows versions to provide naming on local networks without DNS. NetBIOS names work something like ARP, but a level higher.

NetBIOS could also be used on top of IPX/SPX, but is used with IP the resolution system is WINS, which resolves NetBIOS names to IP addresses. There is an lmhosts file that allows you to hardcode these. This is used in file sharing based on UNC paths, in which you can use IP address as well as netbios names.

Packet size, MTU and MSS, fragmenting

On the wire, the packet/frame's size is everything together, though people often list a size at the layer they are dealing with, or most commonly deal with.


Classical Ethernet's frames are at most 1518-byte with 18 bytes for its own purposes, so each IP packet can be at most 1500 bytes.


MTU (Maximum Transferable Unit) and MSS (Maximum Segment Size) refer to upper limits to sizes. MTU refers to the encapsulated whole, while MSS refers to how much payload is in it.

Strictly speaking, neither term is tied to a layer, so ideally it should be mentioned at what layer it reports. In practice, you often have to guess from context.

A MTU of 1500 suggests an Ethernet device.

For example, TCP (20 byte header) over IP (20 byte header) over Ethernet (IP MTU 1500) means the TCP MSS for this case is 1460 bytes.


While we often think of IP having an MTU of 1500, this is more about typical backwards-compatible setups and the internet than about the IP protocol.

Technically, IPv4 has a maximum packet size of 65535, though no one really wants to use that, in part because over almost any real network it will get fragmented, and partly because the effectiveness of IP's CRC checksum becomes low when you go over ~11K (see [2]).

On a gBit LAN, though, jumbo frames of something like 9000 bytes can mean your practical speed gets a lot closer to 1gBit than without it (at best, it also depends on other efficiency issues, including the program/protocol's cleveness).


Using 1gBit at full speed with 1.5k frames means ~80kpackets/sec, which means a lot of driver overhead, and in one-packet-per-interrupt setups is actually quite heavy (interrupts by their nature have to pause other work, so make for general sluggishness). Jumbo frames are one way to lessen that, but in the real world often help only a few percent. Other offloading methods tend to work better (and most are standard now).


See also:


Fragmenting means data packets (usually network layer) are too big for a lower layer's frames (usually link and/or physical layer, since they are most likely to impose size limits) and need to be split up before they can be sent.


Usually there is a common maximum transmission unit (the MTU, for ethernet it's 1500) that cannot be exceeded at all, or can't be without all parties agreeing about it. That is, when packets are too large for a particular link, packets are automatically fragmented into smaller ones (or just rejected) and reassembled on the other end of that link. The data arrives, but more work has to be done, and transmission time is wasted.

Fast Ethernet can be set up to 1546, gBit ethernet up to 9000. The internet used to have 576 in the modem days, but is mostly at 1500 now(verify). And, frankly, all of those are rather low for anything faster than slow broadband. (But unless raising it is standardized across the board, it is pointless as it would just lead to fragmentation) Incidentally, the overhead involved in decoding and ACKing more small frames rather than fewer big frames has been shown to be considerable; one of the reasons for larger MTUs. See e.g. [3]


Fragmenting also happens when you tunnel protocols via higher level layers: what you do is wrap lower-layer packets into a higher layer packets, adding some extra size through the encapsulation. If the packet was exactly right for the MTU before, it will now fragment into one fully used frame and one almost empty frame.

In this sort of situation, it's best to set the MTU for the interface that is in reality tunneled lower (say, 1450), to avoid that inefficiency.


On jumbo frames

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Jumbo packets / jumbo frames mean using frames of ~4KByte or ~9KByte

...instead of 1.5KByte, which is much internet-friendlier. In that using larger frames would mean every single one would get fragmented on the network.


Jumbo frames upsides:

  • lower CPU use
  • lower interrupt load
  • both of which can lead to closer-to-theoretical performance

Downsides:

  • slightly higher latency (a packet takes a little longer to transmit)
    • at 100mBit (and lower) this is more noticeable

Pragmatics:

  • use jumbo packets only if all sides support it (and are using same-sized jumbo packets)
    • including managed switches, and you may wish to pay attention to VLAN tag size and such

On congestion

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


QoS

In a broad sense, Quality of Service (QoS) is the concept of trying for certain types of guarantees, in telephony as well as in data networking.

Apparently the term QoS was first used in ITU X.902, but the term is used in various standards now, with varying and sometimes broad definitions.


The guarantees are often something like low latency, low latency jitter (e.g. for VoIP, videoconferencing), guaranteed bandwidth (e.g. for TV over IP), and such.


Approaches

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


TOS

A common trick when using IPv4 is to use the TOS byte in its header, which was never used much. It is now used to encode data to support:

  • Differentiated services (also 'DiffServ'); the QoS needs map onto this fairly decently
  • Explicit Congestion Notification (ECN)

Fancy(-ish) routers support this, as do modern unices and some recent windows implementations.


See also


TODO: read:


VPN

See VPN notes


Spanning trees

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The Spanning Tree Protocol (STP) is a layer 2 protocol that ensures a loop-free topology.

In general, if a packet passes the same place twice, you have a problem. One possible cause of this is redundant cabling (for fallover) that is bridged together, since that creates a switching loop: data incoming on one will be sent out on the other, which effectively duplicates it and effectively floods the network.

Some switches may disable ports that seem to send back everything they get, to avoid this problem, but this is not a generic solution.


STP and its variants collect and update information about all all the links in the (mesh) network, calculates spanning tree (connects everything without loops), so that it can disable any links that are not part of that spanning tree.

This lets you connect things whatever way you want, including bridge-style redundant links with automatic fallbacks onto the spare/redundant link, without causing the flooding problem.


The fact it works transparently means data centers can (re)organize their cabling without causing downtime when you yank a cable.

Variants include the Rapid Spanning Tree Protocol (RSTP), Per-VLAN Spanning Tree (PVST), Multiple Spanning Tree Protocol (MSTP), Rapid Per-VLAN Spanning Tree (R-PVST), and others.


Note there are other ways of having two cables to a host. For example, you could try to double the bandwidth to a server, by treating its two NICs as separate (and unbridged) which just happen to be on the same network.


See also:


Link aggregation

Multicast, anycast, broadcast

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Unicast means data should go to one specific target.

Multicast means data should go to multiple endpoints

Broadcast means addressing of all (local) endpoints

Anycast means routing to the (topologically) closest net with the target address, which can be useful for load balancing, decentralized servicing, redundancy.

See also:


On encryption

Things like encryption can be done at every level above physical. TLS/SSL security, in the IP sense, is snuck in between layer 4 and layer 5 by making it part of the layer 5 protocol, under the real protocol. This usually means there is often a non-secure and a secure version of the same protocol defined at layer 5. In the 7-layer view on IP, this can be said to be in multiple cooperating layers, mostly layer 6.



IPX, SPX, NWLink

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

IPX (Internetwork Packet Exchange) was a layer 3 protocol fairly commonly used up to the mid-nineties (which is approximately when the IP stack started replacing it, partly just because of convenience. (Note that while IP has a similar role to IPX, the two names and technologies are not particularly related).

SPX is a layer 4 protocol used on top of IPX.

Windows later added NWLink, which was an implementation of SPX/IPX, and NetBIOS on top of that.


IPX was commonly used for Netware, and by various DOS-era games.

NetBIOS was often transferred over IPX/SPX, and more recently also by TCP/IP (and more recently used primarily for SMB file sharing, which is currently done mostly over IP instead).


These days, IPX mostly matters to people wishing to play old games. IPX in DOS was a somewhat complex matter to set up. NWLink was supported from Win95 up to WinXP, but was dropped in Vista.

People report that in 32-bit vista, you can copy in XP's (32-bit) drivers (see things like [4]).

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Unsorted


Where to find nice images like: http://www.javvin.com/links.html ?


An IP-stack socket is effectively a 5-tuple: (protocol, local_address, local_port, remote_address, remote_port)




physically identify a card you have the name of

ethtool -p eth1

Steadily blinks LEDs on port


(in my case also used a different color -- ports tend to be two two-color LEDs)


Channel bonding