📜  实现自己的尾巴(读取大文件的最后n行)

📅  最后修改于: 2021-05-30 06:54:02             🧑  作者: Mango

给定一个具有动态数据的巨大文件,编写一个程序以随时读取文件的最后n行而不读取整个文件。问题类似于linux中的tail命令,该命令显示文件的最后几行。它主要用于查看日志文件更新,因为这些更新已附加到日志文件中。

资料来源:微软访谈

强烈建议您最小化浏览器,然后自己尝试。

问题主要集中在以下方面–

1.程序不应读取整个文件。
2.程序应处理传入的动态数据,并在任何时候返回最后n行。
3.在读取最后n行之前,程序不应关闭输入流。

以下是其C++实现

// C++ program to implement your own tail
#include 
using namespace std;
  
#define SIZE 100
  
// Utility function to sleep for n seconds
void sleep(unsigned int n)
{
    clock_t goal = n * 1000 + clock();
    while (goal > clock());
}
  
// function to read last n lines from the file
// at any point without reading the entire file
void tail(FILE* in, int n)
{
    int count = 0;  // To count '\n' characters
  
    // unsigned long long pos (stores upto 2^64 – 1
    // chars) assuming that long long int takes 8 
    // bytes
    unsigned long long pos;
    char str[2*SIZE];
  
    // Go to End of file
    if (fseek(in, 0, SEEK_END))
        perror("fseek() failed");
    else
    {
        // pos will contain no. of chars in
        // input file.
        pos = ftell(in);
  
        // search for '\n' characters
        while (pos)
        {
            // Move 'pos' away from end of file.
            if (!fseek(in, --pos, SEEK_SET))
            {
                if (fgetc(in) == '\n')
  
                    // stop reading when n newlines
                    // is found
                    if (count++ == n)
                        break;
            }
            else
                perror("fseek() failed");
        }
  
        // print last n lines
        printf("Printing last %d lines -\n", n);
        while (fgets(str, sizeof(str), in))
            printf("%s", str);
    }
    printf("\n\n");
}
  
// Creates a file and prints and calls tail() for 
// 10 different values of n (from 1 to 10)
int main()
{
    FILE* fp;
    char buffer[SIZE];
    // Open file in binary mode
    // wb+ mode for reading and writing simultaneously
    fp = fopen("input.txt", "wb+");
    if (fp == NULL)
    {
        printf("Error while opening file");
        exit(EXIT_FAILURE);
    }
    srand(time(NULL));
  
    // Dynamically add lines to input file
    // and call tail() each time
    for (int index = 1; index <= 10; index++)
    {
        /* generate random logs to print in input file*/
        for (int i = 0; i < SIZE - 1; i++)
            buffer[i] = rand() % 26 + 65; // A-Z
        buffer[SIZE] = '\0';
  
        /* code to print timestamp in logs */
        // get current calendar time
        time_t ltime = time(NULL);
  
        // asctime() returns a pointer to a string
        // which represents the day and time
        char* date = asctime(localtime(<ime));
  
        // replace the '\n' character in the date string
        // with '\0' to print on the same line.
        date[strlen(date)-1] = '\0';
  
        /* Note in text mode '\n' appends two characters,
        so we have opened file in binary mode */
        fprintf(fp, "\nLine #%d [%s] - %s", index,
                                    date, buffer);
  
        // flush the input stream before calling tail
        fflush(fp);
  
        // read last index lines from the file
        tail(fp, index);
  
        // sleep for 3 seconds
        // note difference in timestamps in logs
        sleep(3);
    }
  
    /* close the file before ending program */
    fclose(fp);
  
    return 0;
}

注意事项–
1.此代码在在线编译器上不起作用,因为它需要文件创建权限。在本地计算机上运行时,它将生成示例输入文件“ input.txt”,并向其动态写入数据10次,并每次都调用tail()函数。

2.我们应该避免对大型文件(以GB为单位)使用fseek()和ftell(),因为它们在long int类型的文件位置上运行。请改用_fseeki64()和_ftelli64()。

3. unsigned long的最大允许值为2 32 – 1(假设unsigned long占用4个字节)。它可用于文件大小小于4 GB的文件。

4. unsigned long long的最大允许值为2 64 – 1(假设unsigned long long占用8个字节)。它可用于文件大小超过4 GB的文件。

要从最佳影片策划和实践问题去学习,检查了C++基础课程为基础,以先进的C++和C++ STL课程基础加上STL。要完成从学习语言到DS Algo等的更多准备工作,请参阅“完整面试准备课程”