How to Troubleshoot Common Redis Issues?

What is Redis?

Redis is a feature-rich, in-memory data store that can be used as a cache, document database, message broker, and even vector database. You can store all sorts of structured data in Redis using built-in data structures, such as hashes, lists, sorted sets, streams, bitmaps, extensions, and geospatial indexes.

Another standout Redis feature is its easy extensibility. In addition to being open-source, Redis provides a modules API to fast-track the development of custom extensions in Rust, C, and C++. Moreover, you can do server-side scripting with Lua and create server-side stored procedures using Redis functions.

Despite predominantly being an in-memory data store, Redis also allows you to persist data across restarts and crashes. Additionally, its built-in support for clustering and high availability renders Redis an attractive option for building distributed, low-latency, high-performance data platforms.

Startup and connection problems

First, let’s explore some common issues related to startup and connectivity.

Issue # 1 – Redis server won’t start or crashes after starting

Problem: You run the command to start Redis, but it fails.

Detection: On your terminal, you see errors indicating that the service didn’t launch or crashed immediately after starting.

Troubleshooting:

Use the redis-server --test-memory command to rule out any “broken RAM” issues that may be causing Redis to fail. Note that the execution of this command can take several minutes. Upon completion, if you receive the message "Your memory passed this test", it indicates that your memory is functioning correctly.
Review system logs and Redis file logs (default: /var/opt/redislabs/log/) for additional error details that may help identify the root cause.
Ensure that the Redis user (the user issuing the command) has ownership and sufficient permissions for the data directory and configuration file.
Double-check your redis.conf file for syntax errors or invalid settings. Consult the official Redis docs for configurations to rule out inconsistencies or deprecated parameters.
Ensure that the Redis port (default: 6379) is not already in use. You can use the netstat -nlp | grep 6379 command for this purpose.

Issue # 2 – Connection refused or timed out errors

Problem: You are unable to connect to the Redis server using redis-cli or a client application.

Detection: You see error messages saying that the connection was refused or timed out.

Troubleshooting:

Run the netstat -nlp | grep redis command to verify that Redis is actually listening on the port you are using for the connection.
Check if any firewall rules or network configurations may be blocking incoming connections to the Redis port. For testing purposes, you may temporarily disable the firewall and retry connection.
If you are connecting from a remote host, verify that the Redis server is configured to accept external connections. To do this, follow these steps:
- Open the Redis configuration file (redis.conf).
- Check the protected-mode setting. If set to yes, it restricts external connections for security reasons. Change this setting to no to allow external connections.
- Review the bind directive within the Redis configuration file. By default, Redis may only bind to the localhost address (127.0.0.1). Modify this directive to bind to the server's external IP address.
If the connection is timing out, check server load and resource usage. High CPU or memory usage can slow down connections.

Issue # 3 – Authentication failures

Problem: Redis rejects your connection requests due to authentication failures.

Detection: Your client logs show errors related to incorrect password, authentication failed, or insufficient permissions.

Troubleshooting:

Start with the Redis server logs. Look for additional details/clues related to the authentication failure.
If you are using Redis with access control lists (ACLs) enabled, double check that your Redis user has the required permissions to access the data store. To view all active ACL rules, you can issue the ACL LIST command.
Alternatively, if you are using password-based authentication, make sure that the client is sending the AUTH command with the correct password. Note that when the requirepass field is set to true in the configuration file, Redis rejects all queries from unauthenticated clients.
If TLS support is enabled, rule out any issues related to SSL certificates.

Redis configuration issues

Redis, like any highly configurable system, is prone to misconfigurations. Let’s explore some of the common ones.

Misconfiguration # 1 – Weak or no authentication

Problem: You are either using no authentication or the legacy, password-based authentication. Since the password-based method stores the password as plaintext inside the configuration file and uses the same password across all clients, it increases your attack surface.

Detection:

You don’t have ACLs configured and the requirepass line in the configuration file is commented out.
The requirepass line in the configuration file is uncommented and shows the password as plaintext.

Troubleshooting:

If enabled, disable password-based authentication by commenting out the requirepass line.
Use the ACLs feature to configure fine-grained permissions for all users. Redis offers three levels of privileges: full-access, read-write, and read-only.
After configuring ACLs, verify connections from existing clients to ensure functionality remains unaffected.

To further strengthen security, consider enabling TLS. You can do this by issuing this startup command:

./src/redis-server --tls-port 6379 --port 0 \
	--tls-cert-file ./tests/tls/redis.crt \
	--tls-key-file ./tests/tls/redis.key \
	--tls-ca-cert-file ./tests/tls/ca.crt

Misconfiguration # 2 – No inactivity time configured

Problem: When an inactivity timeout isn’t configured, Redis becomes vulnerable to resource exhaustion and potential security risks.

Detection: Run this command to check the configured value of the timeout parameter (where a value of 0 indicates no timeout):

cat /path/to/redis/config | grep timeout

Troubleshooting:

Configure an appropriate inactivity timeout in the Redis configuration file. The definition of appropriate varies based on application requirements and expected usage patterns.
Configure TCP heartbeat between clients and the Redis server by specifying a duration against the tcp-keepalive parameter inside the Redis configuration file. This is crucial to detect and remove dead peers.
Implement monitoring mechanisms to detect and address abnormal connection behavior, such as prolonged idle connections or sudden spikes in connection count.

Misconfiguration # 3 – maxmemory value too low

Problem: A low value for the maxmemory setting can lead to insufficient memory management and performance issues.

Detection:

Monitoring tools or Redis logs indicate frequent memory evictions or warnings related to memory limits.
Performance degradation or instability occurs during periods of high memory usage.

Troubleshooting:

Use the MONITOR command during high memory situations to stream all received commands in real-time. Identify any that may be responsible for high memory usage.
Analyze the historical memory usage patterns and potential scalability needs for your Redis deployment, and adjust the maxmemory setting to a value that accommodates your workload without risking memory exhaustion.
Configure a max memory eviction policy (via the maxmemory-policy parameter) to dictate what Redis does when maxmemory is reached. Available policies include: volatile-lru, allkeys-lru, allkeys-random, allkeys-lfu, and noeviction.
Realize that this may not be a one-time effort. Monitor memory usage regularly and readjust the maxmemory setting as needed to maintain optimal performance.

Redis performance bottlenecks

Here are a few performance bottlenecks that you may encounter when running Redis with a large data set:

Bottleneck # 1 – Inefficient data structures

Problem: Suboptimal choice or usage of Redis data structures can lead to inefficient memory utilization and slower performance.

Detection: You notice excessive memory consumption or slower than expected response times.

Troubleshooting:

Use the most suitable data structure for your access patterns. For example, switch from lists to sets for membership checks, use hash tables for key-value pairs with frequent lookups, and incorporate sorted sets for range queries.
Utilize Redis commands like MEMORY USAGE to review memory consumption by specific keys or data structures and identify opportunities to optimize. For example, MEMORY USAGE foo will reveal the number of bytes taken by the key foo and its value.

Bottleneck # 2 – Inefficient commands

Problem: Using complex or inefficient commands (like SUNIONSTORE) with large data sets can strain server resources.

Detection: Some commands exhibit disproportionately high resource usage or completion times.

Troubleshooting:

Review application code and Redis command usage to identify inefficient command sequences or redundant operations. For example, rather than sending multiple HSET commands for different key-value pairs, consider using a single HMSET command for all key-value pairs.
Use the SLOWLOG command to identify slow running commands and consider optimizing them.
Break down complex commands into smaller, more efficient ones. For example, use SMEMBERS and SADD instead of SUNIONSTORE.

Bottleneck # 3 – Network latency issues

Problem: Network latency between clients and the Redis server impact application performance, especially in distributed environments.

Detection: Network latency metrics between clients and the Redis server are showing higher than expected values.

Troubleshooting:

If you are using a virtual machine (VM), run this command to calculate the minimum expected latency for your machine:
```
./redis-cli --intrinsic-latency 100
```
If the estimated minimum is higher or equal to what you have been experiencing, then the bottleneck is the machine, not Redis. In this case, consider optimizing the VM’s resources. Conversely, if the estimated minimum is lower, then try the tips below.
Optimize network infrastructure for low latency, such as using high-speed network cards and minimizing hops.
Use Redis clustering or replication to distribute workload and reduce the impact of network latency on overall system performance.
Use the SLOWLOG command to identify any long-running commands that may be hogging server resources and optimize them.
Consider enabling the latency monitor using the CONFIG SET latency-monitor-threshold 100 command. This will allow you to track latency in real-time using commands like LATENCY LATEST, LATENCY HISTORY, LATENCY DOCTOR, and more.

Bottleneck # 4 – Data persistence issues

Problem: Frequent persistence operations (such as SAVE) or slow AOF writes can impact performance and responsiveness.

Detection: Data persistence operations are taking too long to complete.

Troubleshooting:

Tune the save and AOF configuration parameters to balance data durability with performance needs.
Optimize disk I/O performance by using SSDs or RAID configurations.

Bottleneck # 5 – High CPU usage

Problem: Excessive CPU usage on the server can lead to degraded performance.

Detection: Monitoring tools are reporting increases in both CPU utilization and average Redis response time.

Troubleshooting:

Use the MONITOR utility to stream all received commands in real-time. Identify any that may be contributing to the high CPU usage.
Utilize Redis modules for specific tasks that might be more CPU-efficient than native commands. For example, you can use RediSearch to achieve optimized full-search capabilities with Redis.
Review and optimize Redis configuration parameters related to threading, expire efforts, rehashing, and defragmentation, as they can all contribute to high CPU usage.
If the above tips don’t work, consider vertical scaling by upgrading the server hardware or horizontal scaling by sharding Redis instances.

Bottleneck # 6 – Concurrency issues

Problem: High concurrency can lead to resource contention and performance degradation.

Detection: Monitoring tools are showing a large number of connected clients and commands per second, along with a drop in average response time.

Troubleshooting:

Use the MONITOR or CLIENT LIST commands to see all executed commands and connected clients in real-time.
If multiple clients are expected to access shared resources, implement locking patterns using purpose-built libraries like redsync.
Use transactions to execute a group of commands in a single step.
Leverage clustering or partitioning strategies to distribute workload. This will mitigate the potential impact of contention on overall system performance.
Optimize application code to minimize unnecessary key lookups and commands.

Redis cluster-related issues

Next, we will dissect issues specific to Redis clusters.

Issue # 1 – Cluster instability

Problem: The cluster is unable to deliver high levels of performance, fault tolerance, and data integrity.

Detection:

Nodes are frequently joining or leaving the cluster.
Cluster logs are showing errors related to MEET and FAIL messages.
High CPU and/or memory usage is observed across multiple nodes.
CLUSTER commands are taking too long or exhibiting abnormal behavior.

Troubleshooting:

Ensure proper network connectivity between all nodes. Check firewalls and network configurations.
Verify that there are sufficient system resources (CPU, memory) on each node to handle cluster operations.
Investigate cluster-related configuration issues like incorrect cluster-announce-ip or cluster-announce-port settings.
Use Redis cluster resharding commands (CLUSTER ADDSLOTS, CLUSTER DELSLOTS) to rebalance data across nodes evenly.
Regularly monitor the output of the CLUSTER INFO and INFO Replication commands to detect any cluster or replication related issues quickly.

Issue # 2 – Network partitioning

Problem: Communication between Redis cluster nodes is disrupted, resulting in split-brain scenarios where nodes become isolated from each other.

Detection:

You are noticing anomalies in cluster communication patterns.
CLUSTER INFO and CLUSTER NODES commands are frequently showing connectivity problems.

Troubleshooting:

Implement auto-ejection of unreachable nodes using cluster-node-timeout to prevent split-brain scenarios.
Consider using Redis Sentinel, a distributed system that supports automatic failover and leader election during network partitions.
Implement/configure quorum-based algorithms and consensus protocols to ensure data consistency and availability, even in the event of network partitions.
Regularly test failover procedures to validate their effectiveness and identify potential weaknesses or bottlenecks.

Redis best practices

Finally, let’s explore some Redis best practices that can help keep an instance running smoothly for extended periods:

Regularly update Redis to receive the latest performance improvements, bug fixes, and security patches.
Choose the right data structure for your access patterns (sets for membership, sorted sets for rankings, hash table for quick access, bitmaps for bit operations on strings, and so on) to optimize performance.
Group multiple commands into transactions or pipelines for increased efficiency and reduced network round trips.
Leverage Redis's INFO, MONITOR, SLOWLOG, and other similar commands to identify performance bottlenecks and slow queries.
Secure your Redis instance with ACLs and network access restrictions. Consider TLS encryption for sensitive data.
Regularly monitor Redis using purpose-built monitoring tools like the Redis Monitoring Tool. The Site24x7 tool lets you track several key metrics, such as CPU and memory utilization, keyspace hits and misses, total connections, connected clients, connected slaves, rejected connections, and more.
Leverage key expiration times for temporary data or cached values to optimize memory usage and data freshness.
Implement connection pooling in client applications to manage connections and reduce overhead efficiently.
Document Redis configuration settings, deployment architecture, and operation guidelines to maintain consistency and facilitate knowledge sharing.

Conclusion

Redis is a scalable in-memory data store that can function as a cache, streaming server, and vector database for AI-powered applications. To maintain peak performance and prevent bottlenecks, it is crucial to detect promptly and resolve any issues. Site24x7's Redis monitoring can help ensure continuous performance optimization, as discussed in this guide.

A comprehensive Redis troubleshooting guide

What is Redis?

Startup and connection problems

Issue # 1 – Redis server won’t start or crashes after starting

Issue # 2 – Connection refused or timed out errors

Issue # 3 – Authentication failures

Redis configuration issues

Misconfiguration # 1 – Weak or no authentication

Misconfiguration # 2 – No inactivity time configured

Misconfiguration # 3 – maxmemory value too low

Redis performance bottlenecks

Bottleneck # 1 – Inefficient data structures

Bottleneck # 2 – Inefficient commands

Bottleneck # 3 – Network latency issues

Bottleneck # 4 – Data persistence issues

Bottleneck # 5 – High CPU usage

Bottleneck # 6 – Concurrency issues

Redis best practices

Conclusion

Related Articles