Recognizing SQL files on GitHub
Taking a closer look at how GitHub classifies different languages
In the Data world, it is not uncommon to commit database files and SQL queries. This could be for collaborating within teams, maintaining a queries library or even as simple as tracking for CI/CD purposes.
While committing your SQL files, it must be noted that GitHub doesn’t recognize .sql files automatically. The library responsible for this is the Linguist on GitHub.
The linguist is an extremely powerful tool that classifies your repository to show the languages used and provides syntax highlighting. This is great when it comes to traditional programming languages.
So How does the Linguist work?
Linguist takes the list of languages it knows from
languages.yml
and uses a number of methods to try and determine the language used by each file, and the overall repository breakdown.
The linguist basically runs a low-priority background program that analyzes the repository and caches the results for the lifetime of the repository and is only updated when the repository is updated.
Structured Query Language or SQL is a powerful language used worldwide to access and manipulate databases.
According to the Wikipedia of SQL, SQL ( “sequel”; Structured Query Language) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.
Unfortunately, SQL is one of the languages that the linguist considers as a data language. What this means is that the GitHub linguist excludes .sql files while determining the languages used in the repository.
So How do you overcome this hurdle?
We override the linguist and make .sql files detectable in the repository. We accomplish this by adding a .gitattributes
file to the root of the repository.
A
.gitattributes
file is a simple text file that givesattributes
to pathnames.
The code in the .gittributes
file looks like this:
As the code says, we make the .sql files detectable and tell the linguist to treat the SQL language as text. This in turn makes the language stats bar recognize the .sql files correctly and gives them syntax highlighting.
I hope that this explains why .sql files aren’t recognized right away and I sure hope that you don’t have to spend countless hours researching in the future like I did to make GitHub recognize SQL files.