Caching on Named Data Network: a Survey and Future Research

ABSTRACT


INTRODUCTION
The era of telecommunications began in 1876, where a network was built that enabled 2 parties to transmit their voice and communicate.The Internet began in 1969s, funded by Advanced Research Project Agency [1]. Using Internet Protocol (IP) as an address, the request from the user will be forwarded to the server, through other nodes within the network. The replies tothe request will be sent to the user through a particular path that has been formed by routing process in the network. If any user requests the same data, then the packet will be sent again from the server to the user. This causes inefficient packet delivery process because the packet is always sent from a server that is far from the user. To solve this problem, the concept of Content Distribution Network [2] was proposed. A replica server is created contains all the data as in the main server, placed at a fixed location, closer to the user. So that, the request for certain content will be redirected to the replica server and it is no need to be served by an origin server that is farther away.
The replica server is updatedperiodically or when any content changes on its original server. However, this system will be difficult to support mobility and dynamic changing content request from consumers. When the consumer away from the replica server, it leads to the possibility that a consumer can no longer be served efficiently by the replica server. Content Distribution Network that is still based on Internet Protocol (IP) causes the request process from the user is always addressed to a particular server. Consequently, anadditional process is still needed tomapping the intended IP with the server position that is Start There is a content that is requested in Content Storage Send the content to the consumer

Check in the PIT
There is an informatioan about the content that is requested Update the PIT

Check in the FIB
There is an informatioan about the content that is requested

Content Storage
Content Storage (CS) is one of the important components in the NDN router node. CS is essential to allow the data to be stored in NDN router nodes so that if the consumer request for a content, it is not necessarily served by the certain server, but can be served by a router node that has the content in its CS. CS is one of the limited resources on NDN routers. Therefore it should be utilized as efficiently as possible in order to improve NDN performance. The size of the content store affects the delay and number of hops that packets must take to go to consumer [26]. This condition affects the overall network load due to the circulation of data in the network [4], [8]. CS also performs different effect with the various cache policy implemented in the node [27]. In this paper, the caching strategies are classified as cache placement, cache content selection, cache policy design and caching for themobile node.Each group described, including its advantages and drawbacks in sections 3.3 to 3.6.

Cache Placement
Cache placement focuses on determining which nodes will store a data packet. In the Publisher/Subscriber network, it has been proposed a methodto choose a node to store packets based on local content popularity and content storage capacity per node [28]. In NDN networks, packets are initially placed on every node in the network so that the consumer can directly access the content to the closestnode.
Paper [11] proposes a packet data flooding mechanism, where data packets are stored in all nodes that are in the best path but limited to the maximum number of hops for the spread of the packet. In paper [12] the package is deployed to be stored in network nodes but still maintained to ensure there are no redundant packets, to save resources, using bloom filter. The lack of bloom filter technique relatedto false positive problems corrected by A. Hidayat et al [13]. In this technique, bloom filter is combining with sequential search algorithm.
Paper [29] proposes a mechanism that combines the technique of packet insertion and packet deletion by adding a Caching Contribution parameter in the interest packet. The node will decide whetherit will cache the data packet or not. If the data packet cannot be cached on the certain node, then it will be forwarded to another node. A trail mechanism is built to stores information about the path to the next node that can store the content.In paper [10], nodes that often get a content request from consumer have a high contribution value. A node will store a content that has high contribution value if storage capacity is available. Paper [9] proposes the movement of data toward the edge router closest to the consumer for every specific content request. The cache placement can be resume into 3, i.e. function based, diversity and flooding as shown in Figure 2. Comparison of the three techniques, including the technical focus, the advantages, and drawbacks described in Table 1.

Cache Content Selection
The cache content selection techniques focus on determining which content will be cached and which content should be removed from the cache. Some of the content selection techniques to cache are Caching Everything Everywhere (CEE) [3], [23], where each node stores all of the data from the producer and it means no content selection and Prob (p) [3], [5], [23] where data is cached with probability p and not cached with probability 1-p. As a result, data packets that are cached by one router may be different from the other routers. Paper [30] proposes the concept that every router cache the data with the probability determined by the number of hop between producer and the router.Selection of content to be cached based on the prediction that the content will be requested by the local consumer proposed by paper [31]. Related to the cache content selection, content centric network performance is also affected by CS replacement rules and user localization [32]. The cache content selection can be resume as in Table 2. Emphasize the selection of packages/content with a certain probability. A packet can be cached with a certain probability.
More fair in determining the package to be cached or deleted.
-Some nodes may store the same content -Need specific strategies to determine the probability Predictionbased [34] [31] Tighten the selection of packages based on predictions whether the selection of content will provide the target value set.
-Avoidstoring unnecessary content -Accommodate the future needs of the user -The prediction may be incorrect if the condition of the network or user changes. -Internal calculation of the router is more complex.
Another technique related to the cache content selection is Prediction-based caching [33]. The content will be decided to be cached by router based on the number of requests. In this scheme, it is added a new table in the router, named the Pending Species Interest Table (PSIT). This table stores the list of the most requested content based on data in the PIT. Suppose there is content that is regularly requested by the consumer every Monday, but there is also non-regular content, for example, the contents of the World Cup event. After that, Dynamic Cache Adjustment algorithm is used to decide a package that will be cached or not based on its wastage value. A content will be viewed in size. If the CS is still sufficient, the package is stored. If the CS is full then the packet in the CS will be select randomly and then compared it with the new data packets. If they are both same, the value of the hit parameter will increase. Re-testing is done by comparing the hit parameter with the amount of data that has been sorted. If the hit value is higher, the packet is given allocation in the buffer, and otherwise, the content is not allocated in CS. Selection of a content can also be calculated based on local popularity and hop count reduction gain that can be given by the packet [29].
Another content selection technique is Max-Gain In-network Caching (MAGIC) [34]. The proposed method aims to reduce bandwidth consumption and consider content popularity as well as hop reduction. When receiving the interest packet, each router will calculate the Local Gain and compare it with the value stored on the MaxGain field. If the local cache of the router gain is greater than the MaxGain value, then the router will update the MaxGain value in the interest packet. This MaxGain value will be copied on additional fields in the data packet. Along the packet delivery path, if the Local Gain value is the same as the MaxGain value in the data packet, it will be cached in the data packet.
If a data packet enters the router node and the router didn't have it in its Content Storage, then the node will check its Content Storage condition. If it is full, then it will be selected which packet will be deleted from Content storage to provide space to store the new packet. Techniques that are commonly used in the NDN system to select which packets will be deleted in CS is Least Recently Used (LRU) and Least Frequently Used (LFU) [3], [29], [22]. Deghgan et al in the paper [2] proposed another technique to give a timer to a package. The timer is used to determine how long a packet may be in the content storage before it is finally deleted. Paper [35] proposed the Recent Usage Frequency (RFU) algorithm, which determined the popularity of content within a limited time range. The lowest popularity value will cause a content to be removed from the content store.

4461
According to the paper [24], the performance of caching can be improved by using efficient caching replacement methods. In mobile networks, this is a challenge, becausethe environment is different from the fixed network conditions. The parameters used by the replacement rule include recency, popularity, message size, cost to achieve objects, and access delay [24]. The cache content selection techniques can be resume as in Figure 3 and the comparison of cache content selection techniquesas in Table 2.

Cache Policy Design
Cache policy focuses on techniques how content is stored in nodes. One of the cache policy related techniques is Utility-driven caching [8]. This technique is a utility-driven caching technique in which a utility value is linked to a content. Utilities are a function of a hit possibility of content. The goal is to maximize the total amount of utility content in content storage.
Paper [38] modeled the cache on its system into 2 layers. The first layer is the individual caching in each node and layer 2 is the accumulation of all the cache on the network. The study analyzes how much storage content should be provided in the system to meet the performance of 4 applications, i.e. web traffic, file sharing, and video traffic that are distinguished into user-generated content (UGC) and video on demand (VoD).
Assantachai et al [14] proposed a hybrid caching scheme. If any new content is requested by the consumer and not exist yet on the router node, then the new content will be saved. The content replacement scheme used is a combination of the concept of a cooperative approach and distributive approach. Cooperative caching is a scheme in which each node makes a replacement decision based on the knowledge received from other nodes residing in the same region. Distributive caching is used to make decisions independently using internal knowledge to achieve local maximum performance. In paper [14] the network is divided into 2 parts, that is the normal region (region on the edge) and the backbone region (the region that connects the normal regions). In the normal region, if there is a cache hit interest, the content is moved to the front of the sequence, and when the cache misses then the data at the tail of the sequence is removed. The backbone region follows the normal region pattern, only the backbone nodes work with other nodes in the same region to decide to cache. Cooperative caching policy design is also used in [39] with areas divided into clusters Paper [15], [40] described that the mechanism to cache a content has a crucial impact on the efficiency of content delivery and utilization of CS. Paper [9] proposes the mechanism to divide files into smaller packets called chunk. The amount of chunk disseminated depends on the popularity of the content. The number of chunks is determined by the Chunk Marking Window (CMW) which exponentially enlarges every number of chunks successfully delivered In [41] Content-Centric network is implemented using two types of applications. For each application, it is created a separate list and each identified with a unique ID. The CS is separated and each application can only be stored in its own content store. The storage content partition mechanism is tested with two methods: static cache partitioning and dynamic cache partitioning. In static partitioning, the cache can only be used as specified. While in dynamic cache partitioning, unused cache by an application can be shared with other applications. Cache with splitting technique also proposed in [42]. The content storage is divided into two part, one part for a popular content and the other for less popular content. Paper [43] split the content storage into three regions. The data is categories as a self-data, friends data, and stranger data. Paper [16] more specific on caching management in memory where multiprocessor is used with certain interconnect mechanisms to reduce power usage.
Caching techniques that coupling data cache placement, replacement, and location was proposed by Xiaoyan Hu, et al. [29]. To set the packet to be cached, it is defined a caching value for each packet o that can be cached at node v. This caching value involves multiplication of local popularity value and hop count reduction gain of the item, then divided by cache space contention which is the same value in all routers. If an interest goes to node v, and the item wasnot cached yet on the node v before, node v will calculate the approximate potential value of the caching contribution of the item. The data will be cached at node v if the maximal value of caching contribution is positive. If the content storage is full, it will select a package with the least contribution caching value to delete. Related to the caching location determination, the cache location component will maintain the trail to guide the content. This trail is only created if the content is not cached on the local node. The cache policy design can be resume as in Figure 4. The comparison of cache policy design techniques as in Table 3.  Caching-related decisions are performed by the node regardless of information from other nodes -There is no need for additional mechanisms for monitoring and sharing information with other nodes -Can not do resource sharing Hybrid [14] Merging between cooperative and independent techniques.
-Can more efficiently apply certain mechanis ms to specific conditions -Need to be defined about the specific conditions for a mechanism -Add the computation process

Caching for Themobile Node
Generally, caching techniques for mobile nodes have a basic idea for subscribing a user to a content producer [28], [47], prefetching content to other router that will handle consumer [19], [20], collaborate the data transmission mode for VANET [44], and mobile node support techniques that consider an energy [48]. In the mobile environment, the problems are NDN nodes always move, including routers, producers, and consumers [21]. The producer movement causes a greater problem than the movement of the consumer node or router node. Problem-solving related to producer movements is presented by paper [28].
The publish/subscribe system is the mechanism by which the subscriber can receive messages from the publisher. This relationship is governed by the manager so auser who subscribes to certain content will always get the content they want when publisher generates the content [47]. In the pre-existing pub/subsystem, theproducer does not store messages that have been published before. In this case, if new subscribers join the system, they could not get the content that has been published before theyenter the system. To solve the problem, [28] proposed storage mechanism and replication algorithm with differentiated content class. In this new system, storage can convert the content classes they store. The proposed replication algorithm is to select M storage points from N points that are available in the network based on locality and popularity, target replication degree of each topic, and storage capacity A technique for accommodating consumer mobility in wireless networks is Proactive Multilevel Cache Selection (PMCS), proposed by paper [18]. In this scheme, if the consumer will switch coverage or handoff, the consumer will send a notification about which router to go to. The currently used router will select a subset of neighboring routers to receive content that has been requested by the consumer but has not yet been sent to it. When a handoff occurs, the consumer will stop requesting to send data. During this handoff process also, the destination router will cache the data packets from the old router, which has not been received by the consumer up to a certain limit. Once the connection to the new router has been established, then the data transmission will be served by the new router. Another technique is proposed by [19] to predict node mobility and provide the best prefetching node. Paper [20] explain the mechanism to support producer mobility, such as push to send the data, make some copies of data, determine the content placement, and re-announce if they move to another area. Paper [44] propose VANET's communication mode switching, Vehicle-to-vehicle (V2V) and vehicle-toinfrastructure (V2I), depending on the popularity of downloaded content. Mobile node has the limited power, so the caching process has to consider the energy consumption in the node, due to green NDN as explained in [49]. Paper [48] proposed an energy efficient techniques for MANET. The network is divided into groups, managed by a Master Node. Paper [50] proposed a technique with optimal selection of cluster head in Wireless Sensor Network to improve efficiency.

CHALLENGE AND OPEN ISSUES 4.1. QoS-based Caching
In all caching techniques, either cache placement, cache content selection, or cache policy design that has been developed mostly have not considered the different treatment for different services. In studies that have been done, the data usually only differentiated based on content popularity, content recently, the estimated benefits of content storage, etc. There are only afewof studies that take into account the treatment differentiation based on service requirements or user requirements. In fact, different users may subscribe to different privileged services. So far, not much research has been done related to QoS-based caching on NDN. Paper [45] is one of the papers that discuss this distinction using classes.
The concept follows the Differentiated Service (DiffServ) concept that was previously used in the IP network. Further development is needed for caching mechanisms that can meet different requirements for services and users. These techniques include how to choose content and where tocache them in the network. The decision can be taken independently or cooperatively with other nodes in the network.

Caching for Mobile Node
Node mobility must be supported to provide the flexibility of the system. Generally, mobility characteristic is divided into producer mobility dan consumer mobility. Router mobility is similar to the consumer mobility. Consumer mobility is naturally supported by NDN, but not so with producer mobility. So, the area of the producer-mobility support technique is one of research opportunities. Several techniques are presented related to cache in the mobile node to support producer mobility. For Example, in the paper [18] pre-fetching content is proposed. This scheme was done when the mobile node moves to the new coverage router. Another proposed method is to prefetch a group of content, not just 1 content, which is usually requested by the consumer [31]. Pre-fetching causes additional time needed to move content to a new router. Further investigation of other techniques related to node mobility support for NDN is required to ensureuninterrupted data communicationseven if the user switches coverage by considering the expected delay, cache load, and the complexity of the algorithm that must be executed.

Energy-aware Caching
NDN routers in themobile wireless network will have power restrictions. Caching techniques that consider the availability of power on the node also need to be explored further. This process may include selecting nodes to place content based on position, distance, energy availability at the node, resource availability and other important things that should consider process efficiency. Covering a technique that can reduce the number of replacements that occur. If a content is too frequently removed from the cache, it will not be efficient.

Type of Data on Content Store
Currently, the cached content on the NDN router can be either a file or smaller, called chunk [9] [44]. Chunk-based systems will make the transmission process more efficient because if a chunk is lost during transmission or it is deleted in CS, it only needs to be replaced with a new chunk without having to replace the whole file. However, the division of the file into chunk causes the user's queries to be generated chunk-based. This means that in the chunk-based system, the interest packet for a complete file more than the file-based system. Further exploration of caching procedures and mechanisms regarding this form of data should be explored.

CONCLUSION
In this paper, we have explicate the advantages of NDN network architecture compared to traditional IP network and Content Distribution Network, and excess caching on NDN compared to its predecessor system. The development of various caching techniques has been mapped out.In this paperalso explained the advantages and drawbacks of each group. Finally, it has been suggested the research opportunities related to caching on NDN that can be investigated in the future, i.e. caching mechanisms that involve differences in QoS requirements for data and users, caching that supports mobility nodes, and caching that considers energy.

ACKNOWLEDGMENT
This work was supported by Telkom University and Ministry of Research Technology and Higher Education Republic of Indonesia.